Monday, September 25, 2017

Selenium WebDriver Support For .NET Core 2.0

Starting with release 3.6.0 of the .NET bindings, Selenium now has the initial support for .NET Core 2.0. The .NET bindings in that release contain versions of the assemblies that are build against the .NET Standard 2.0 platform, which means they're intended to be used with .NET Core 2.0 projects. I know this has been a feature many people have wanted for a long time, and I'm glad the project can now deliver it. However, it does come with some associated costs, and with a few known issues.

The first known issue is that calls to localhost in .NET Core are slower than those in the full .NET Framework. This is due to internal differences in the .NET libraries themselves, and are not the fault of the bindings directly. See this issue in the .NET Core repository for more details

Secondly, attempting to save a screenshot to any graphics file format other than Portable Network Graphics (PNG) will throw an exception. .NET Core does not provide the image manipulation classes that the full .NET Framework does, and there are no production-ready third-party libraries that provide that functionality yet and also only rely on managed code. It's fully possible to save a screenshot when using .NET Core, but you can only save it to the PNG file format within the Selenium libraries. This concern is over and above the difficulties with adding dependencies to the language bindings

Speaking of difficulties with adding dependencies to the Selenium project leads me to the next known issue. When using the bindings against .NET Core, there is no PageFactory class available. This is not an oversight, nor is it a bug. I have long said that the .NET PageFactory implementation is not required for effective implementation of the Page Object Pattern, and the .NET PageFactory does not provide any tangible benefits to the user. Even the argument that the code is easier to read is specious with properly constructed page objects. Moreover, the existing .NET PageFactory implementation requires use of classes that are not available in .NET Core. It is a non-trivial matter to add additional dependencies to the .NET bindings, so simply replacing those classes with a third-party library that is compatible with .NET Core is not a "perfectly obvious" option.

Finally, references to the .NET Standard 2.0 library versions provided in this and future releases are only valid when using NuGet package references. Simply copying the assembly and adding an assembly reference to the .NET Core 2.0 project will not work. This is by design of the .NET Core ecosystem, which is now entirely dependent on NuGet to propertly resolve dependencies.

One last note with the 3.6.0 release of the .NET bindings. Previously, the .zip archives that were provided at the official Selenium release site contained only the assemblies (.dlls) for the various frameworks that we supported. Starting with this release, the downloadable .zip archives contain NuGet package (.nupkg) files inside the .zip. To extract the actual .dlls from the packages, you can use any .zip reader to extract files from a .nupkg file. Yes, this means that we're putting a .zip inside a .zip, which is less than efficient, and we may revisit this mechanism of distributing the binary releases in the future.

Wednesday, March 22, 2017

Announcing Beta Release of Selenium IE Driver

One of the most common question I get asked is, "How can I help contribute to Selenium?" Usually my answer involves pull requests and the like, but today, I can offer a much easier way for people to contribute. A significant part of my attention over the last four years has been thinking about and working on the W3C specification for WebDriver. While the specification codifies many of the things that the open source Selenium project has done for years, there are some significant changes to the wire protocol that the language bindings use to communicate with the drivers themselves. The specification already has an implementation in wide use, in geckodriver, Mozilla's driver implementation for Firefox. In order to move forward, however, the IE driver needs to be updated to follow the specification. Here's where you come in.

I've modified the IE driver to use the W3C dialect of the wire protocol. This modification, while significant internally, shouldn't show any differences in behavior from the existing, shipping IE driver. It currently passes all of the tests in the Selenium project for IE. While these tests are pretty extensive, the permutations available in the DOM and in Selenium WebDriver code used to automate it are nearly infinite. To that end, I'm announcing the availability of a beta version of the IE driver. What am I asking you to do? Simply download the new driver executable, and use it in place of the existing driver you're using in your Internet Explorer automation.

Notes

  • The beta driver should be a drop-in replacement for the existing 3.3.0 IEDriverServer.exe release. It should require no changes in your code, save maybe pointing to the new executable.
  • Having said that, there are some differences that are expected due to spec compliance. Full-page screenshots, for example, are explicitly disallowed by the specification, so are no longer generated by the driver.
  • The beta driver's version number (visible by executing IEDriverServer.exe --version) will be 3.3.99.x. Bug fix releases will increment the "build" (fourth) field of the version number.
  • This executable will only be available via the download site; it will not be available via package managers (Maven, NuGet, npm, etc.). If the beta appears in any of the (unofficial) packages that may be used for IEDriverServer.exe in a package manager, a request will be sent to the package owner to remove it, so please don't rely on those.
  • There have been some extensive internal rewrites due to the nature of the protocol changes. More on what to look for below.
  • Only the 32-bit version of the driver is being provided for the beta.

Areas of Concern

We want to know if there are any differences between the shipping 3.3.0 version of IEDriverServer.exe and the beta version. You should see the same behavior, including bugs; do not expect the beta driver to magically fix issues you may have experienced with IE in the past. Updating to support the specification wire protocol has required extensive rewrites, but these should all be transparent to the language bindings. The biggest changes have happened in the areas of element interactions, so you should pay special attention to things like WebElement.click() or WebElement.sendKeys(). There is one known issue that if you call WebElement.submit(), and the onsubmit event handler throws up a JavaScript alert(), the driver will hang. This issue won't be fixed until after the merge back to master. Also note that the beta has to date only been tested against IE 11, and per the driver's current support policy, only officially supports IE 9, 10, and 11.

Reporting Issues

Issues with the beta can be reported to the Selenium project's issue tracker. However, we have to set some ground rules for the issues that you submit. Here they are:
You'll need to provide the following information with any issue report:
  • Language bindings (Java, .NET, Ruby, Python, JavaScript) and version number you're using
  • The specific version of the beta you're using
  • The WebDriver code that behaves differently
  • An HTML page (or link to one) that the WebDriver code can be run against
Lack of any of this information will cause the issue to be closed immediately, without action or investigation! There are simply too many other potential issues with the existing IE driver, and the timeline for getting this merged into the main code line is simply too short to be able to go back and forth with issue reporters trying to set up a reproducible case. Moreover, here are some further guidelines about submitting issues.
  • Prefixing your issue title with "IE Driver Beta" will get it processed more quickly than if you don't.
  • The beta has only been tested with 3.3.x versions of any language bindings. It should still work with any language bindings of the 3.x vintage, but if you haven't tried your code with at least 3.3.x, you will be asked to do so before further investigation can continue on your issue.
  • You should be able to concretely demonstrate a difference in behavior from IEDriverServer.exe 3.3.0 and the beta you're using. If you cannot, you will be asked to do so before investigation can continue.
  • If you are using a test framework, and your sample code cannot be extracted to simple, straightforward WebDriver-only code, your issue will be closed. Developer bandwidth is just too narrow to wade through tons of framework code to get to the single few lines of WebDriver code that are exhibiting different behavior.
  • If you omit an HTML page that can be tested against, your issue will be closed. Again, this may seem overly restrictive, but without this caveat, it will be nearly impossible to debug the issue with the beta driver.
This is pretty time-sensitive, so if you'd like to give this a try, the Selenium project developers would really appreciate it.

Monday, February 13, 2017

Announcing End of Life of .NET Selenium RC Language Bindings

This post will serve as the official announcement that version 3.1 of the Selenium .NET language bindings will be the last to provide a Selenium RC library. Users still relying on the RC API will be able to continue to do so using WebDriverBackedSelenium, or by converting your code to use WebDriver proper. Selenium RC has been deprecated for over six years, and the .NET Selenium RC language bindings have not been updated with a code change other than a version bump in nearly that long. This change isn't likely to affect many users at this point, and the 3.1 versions of the language bindings will continue to be available more-or-less indefinitely, but there will be no further changes to the .NET RC library or releases of it.

Let me restate again so that it's blatantly obvious. This does not affect the .NET language bindings for WebDriver, and WebDriverBackedSelenium will remain a viable path forward for some time. It only affects Selenium RC in the .NET language bindings.

Tuesday, August 23, 2016

Polyamory, Pride Flags, and Patterns of Feedback

Warning: For those of you who come here looking for technical advice and inside information about the Selenium project, WebDriver, or browser automation, this post isn't about any of those. You might just want to skip this one altogether.

One thing about me I'm not really sure how many people are aware of is that I'm polyamorous. That means that I am comfortable being in simultaneous romantic relationships with multiple partners at once, and that my participation in those relationships is openly known by all people involved. I've been polyamorous, or "poly" for short, for nearly all of my adult life. A little over 20 years ago, I lived in the Pacific Northwest, and for the first time in my life, I experienced first-hand the struggles and celebrations of what is now known as the LGBT community. One thing that struck me was the imagery and symbolism those communities used to rally around, identify other members, and publicly announce their membership in the community. The pride flag was one image that made a huge impression on me. At that time, the poly community didn't really have similar symbols to use, so I took it upon myself to create one. Here's what I made up, and released into the public domain in the late summer or early fall of 1995.



Here's the text I wrote up describing it to the first mailing list I shared it with. It's become the canonical description of this particular flag.
The poly pride flag consists of three equal horizontal colored stripes with a symbol in the center of the flag. The colors of the stripes, from top to bottom, are as follows: blue, representing the openness and honesty among all partners with which we conduct our multiple relationships; red, representing love and passion; and black, representing solidarity with those who, though they are open and honest with all participants of their relationships, must hide those relationships from the outside world due to societal pressures. The symbol in the center of the flag is a gold Greek lowercase letter 'pi', as the first letter of 'polyamory'. The letter's gold color represents the value that we place on the emotional attachment to others, be the relationship friendly or romantic in nature, as opposed to merely primarily physical relationships.
Now, here are some things to understand. Clearly, I'm not a visual artist. My tools for creation at the time were literally limited to Microsoft Paint, running on Windows 3.1. Nevertheless, the flag design managed to limp along, with little fanfare. My friends and I used it, and thought of it as quirky and something that could be used in the way other pride flags were used, as a symbol to rally around and for identification.

Fast forward 20 years. Apparently, this thing called the World Wide Web happened, and let all sorts of people communicate and discover things they'd never known about before. New polyamorous people began to discover the flag existed. One would think that people might think it was an interesting idea, given its intent. One would be wrong. The flag has been called vile, no good, hideous, disappointing, ugly, and many other negative things.

One of the issues frequently brought up is that the color scheme is garish or unpleasing. That's subjective, and I can't argue with their perception. I still think there's value in the color symbology, if not the actual RGB values I used when creating it.

Many people seem to take issue with the pi symbol as obscure. There were specific reasons for choosing it at the time. First, I specifically avoided imagery that included a heart. The leather pride flag, which predates the design of mine, includes a heart, and I was trying to avoid confusion, given that community was there first. The "infinity heart" was not yet as widely accepted a symbol for polyamory, and would have been challenging for me to incorporate given my limited abilities in the visual arts. The letter pi was readily available on computer typographic platforms even in those days, so I chose that.

Also, at the time, I was more concerned with "in the closet" polyfolk, and was far more in the closet myself than I am these days. I wanted a symbol that could be used relatively anonymously, that could let people who were in on the symbology connect, without it being too specific.

Additionally, there was already a rich history of existing pride symbols using Greek letters, the use of lambda as an LGBT symbol, being a concrete example. I was hoping to evoke similarity and solidarity without being too explicit or derivative. Finally, the fact that the "poly" in polyamory is a Greek root seemed to indicate that would be a natural choice. In retrospect, perhaps a lemniscate ("infinity symbol") would've been a better choice, but nobody spoke up then.

Poly people coming to read this full story for the first time, welcome. Glad to meet you. If you don't care for the flag, I'm sorry to have offended your sensibilities. Today, there are a number of alternative symbols you can rally around. Use mine, don't use it, I'm just glad some people found a banner to rally around in the late '90s. Feel free to leave comments, but dismissive and abusive comments will be removed.

Thursday, July 23, 2015

Using the Microsoft Driver for Microsoft Edge




As the release of Windows 10 has approached, I've been seeing more and more questions about a WebDriver implementation for Microsoft Edge (nee "Spartan"). So far, I've only been able to say that there isn't a driver implementation for that particular browser, and that there is no work being done in the open-source arena on such an implementation. Furthermore, that Microsoft has acknowledged the need for such a driver and has committed to providing one, but has provided no timetable for a release of such an implementation.

As of today, I can say much more than that.

Today, Microsoft has announced the availability of a WebDriver implementation for Microsoft Edge. This implementation is released as an installable application for Windows 10, and you can download the installation package here. While the requirement to run an installer might be off-putting to some users, it appears that the installation package merely installs a standalone executable to the restricted "Program Files" location. As near as I can tell by examining the installer package, the executable can be freely copied to other locations on the machine after installation. Also, the use of an installable package means the executable can be serviced and updated by the standard, automatic Microsoft Update mechanism.

The last point is incredibly important. This first release of a driver from Microsoft is most emphatically not a finished release, and as such it lacks functionality, some of it rather basic. To wit, finding an element within the context of another element (i.e., WebElement.findElement) is not yet implemented in the current release. Finding an element via XPath is not implemented in this first release either. Switching to frames or iframes (i.e., driver.switchTo().frame()) is likewise not implemented in this initial release, and neither is the advanced user interactions API (i.e,. the Actions class). In fairness, Microsoft has been completely forthcoming in what functionality the driver has implemented and what is missing. I fully expect that the driver will be receiving regular updates, and that the missing functionality will be added in the coming weeks and months.

So, how does one use the Microsoft Edge driver? Well, thanks to our good friends in Redmond, the release of the driver implementation was accompanied by a pull request to the Selenium project that enables the existing language bindings to use it seamlessly. The Edge driver as currently released uses the open-source project's dialect of the WebDriver JSON wire protocol, allowing us to use the driver right now. My guess would be that a future release of the driver will use the W3C specification version of the protocol. The pull request has been merged, so all you'll need is an updated language binding release, and you'll be able to use it directly. If you're itching to use it in the meantime, you can manually launch MicrosoftWebDriver.exe, and use the RemoteWebDriver class in your current language bindings to connect to it on your local Windows 10 machine.

Congratulations to the Microsoft Edge development team for releasing a standard tool for automating Microsoft Edge. I, for one, appreciate the hard work and efforts you've put forth to make this happen. I look forward to future enhancements and features, and I stand ready to help any way I possibly can.

Monday, December 22, 2014

Windows Update KB3025390 for IE 11 Breaks IE Driver

Update (10 February 2015): Microsoft has released a fix as part of the February 2015 Cumulative Update to Internet Explorer. Installing this update appears to resolve the issue with the IE driver.

On 16 December 2014, Microsoft released update KB3025390 via Windows Update as part of its normal "patch Tuesday" update cycle. For most users, this update is downloaded and installed without user interaction. This update breaks the IE driver when using it with IE11.

As part of this update, attempting to use the COM method IHTMLWindow2::execScript returns an "access denied" result.  This renders the driver unable to execute JavaScript in the page bring browsed. However, given that large portions of driver functionality are implemented using JavaScript, this effectively renders the driver all but unusable with IE11.

There is no known workaround for this issue. At this time, Microsoft's WebDriver implementation for IE is still incomplete, lacking basic functionality required to make it usable, so it cannot be recommended. Uninstalling the update is reported to restore IE driver functionality, but this is hardly ideal.

While the execScript method is marked as deprecated for IE11, the driver had heretofore been able to use that method successfully, and it was hoped that it would remain useful throughout the IE11 life cycle. We now know this not to be the case. Additionally, attempts to use the Microsoft-suggested replacement, eval, have been fruitless thus far.

At the moment, a bug has been raised via Microsoft Connect on this issue, and is being investigated by the Internet Explorer development team.
The issue is also currently being tracked in the Selenium issue tracker. There is no need to post additional comments in that issue report verifying that you are experiencing the issue. Likewise, there is no need to post comments to the issue asking for status updates. Rest assured that the issue in the Selenium tracker will be updated when new information is discovered.

What can you do to help? In the coming days and weeks, I'm sure we will see a large number of people exclaiming that their WebDriver code has mysteriously stopped working against IE11, with no action on their part, and will not have searched for answers. I'm posting this message several places as a public service announcement, and would like all of you to redirect such inquiries to this post.

Tuesday, September 16, 2014

Screenshots, SendKeys, and Sixty-Four Bits

There are a couple of issues with the Internet Explorer driver that have been around since IE10 was released. They're pretty annoying when people encounter them, and the report for the first issue goes something like this:
I'm using Internet Explorer 10 (or 11), and when I call the sendKeys method, the keystrokes happen very slowly. Like one keystroke every 5 seconds. I'm on a 64-bit version of Windows, and I'm using the 64-bit IEDriverServer.exe. If I use the 32-bit version of the driver, the problem doesn't occur, but I really need to test using the 64-bit version because I need to test 64-bit IE. What's the deal?
The report for the second issue usually reads as follows:
I'm using Internet Explorer 10 (or 11), and even though I'm on 64-bit Windows, I'm using the 32-bit IEDriverServer.exe, because I was having problems with sendKeys being slow. Now, though, when I take a screen shot, it only shows the visible portion of the page. How can I take full page screen shots like I could when I use the 64-bit IEDriverServer.exe?
Both of these issues are fully documented in the Selenium issue tracker (#5116 for the sendKeys issue, and #5876 for the screenshot issue). A comment in each issue mentions that any fix would require "a massive rearchitecture of the IE driver's binary components, [so] no timeline is (or will be) available" for the delivery of a fix. What causes these issues? How are they related? Why would a fix be so darned difficult? The answers to those questions can all be summed up with a simple answer: "Windows Hooks." 

What is a Windows Hook?


All Windows applications have a routine in them called a "message loop." The message loop repeatedly calls the GetMessage API function, and processes messages sent to the application as they arrive in its queue. Hooks are a feature of the Windows message handling system that allow a developer to intercept, examine, and modify the message being sent to the application. By installing a hook, a developer could, for example, validate that a certain message was processed by the window being hooked. Or they could modify a message sent to the window to represent that the operating system could do things it actually can't. It's a clever mechanism, but it does have a few requirements.

First of all, the code being run when the hook is called (the "hook procedure") must exist in a dynamic-link library (DLL). That is, it cannot be simply a function exported from a compiled executable. The reason for this is that the code is actually going to be loaded into two applications, the application installing the hook, and the application being hooked. Using a DLL is the only way to avoid certain conflicts that would arise with loading one executable into the process space of another.

Secondly, the DLL must be of the same "bitness" of the process being hooked. In Windows, a 32-bit executable cannot load a 64-bit DLL. The converse is also true, that a 64-bit executable cannot load a 32-bit DLL. Incidentally, this is the root reason that there are two versions of IEDriverServer.exe, but that's another story for another time.

Windows Hooks and the IE Driver


The IEDriverServer.exe uses hooks for its implementation of a couple of features. The first use of a hook is in the processing of keystrokes. By default, the driver uses the Windows PostMessage API function to simulate keystrokes. It does this by sending a WM_KEYDOWN message, followed by WM_CHAR and WM_KEYUP messages for each key. However, PostMessage is asynchronous, so the driver has to wait to make sure that the WM_KEYDOWN message is processed before sending the other messages, otherwise keystrokes could be sent out of order, making key sequences garbled. The driver does this by installing a hook into IE's window procedure, and listening for the WM_KEYDOWN to be processed before proceeding. It also puts in a timeout of about five seconds waiting for the message to be processed to make sure that the driver doesn't wait forever. Note that the code path is slightly different if you're using the requireWindowFocus capability, using the SendInput API function instead, but the driver still uses a hook to make sure messages are processed before moving on.

The second place the driver uses a hook is when taking screenshots. The IE driver takes screenshots using the PrintWindow API function. PrintWindow can only take a screenshot of the visible portion of any given window, which means that in order to get a full-page screenshot (as required by the WebDriver API), the window must be sized large enough to display the entire page without scroll bars. However, Windows does not allow the window to be resized larger than the visible screen resolution. When we ask IE to resize itself, a WM_GETMINMAXINFO message is sent on a resize event so the IE can figure how large a window can be. By intercepting that message with a hook, and modifying the max values, we can trick IE into thinking that a window can be sized greater than the screen resolution would otherwise allow.

Since the IE driver makes use of hook procedures, the bulk of the IE driver is actually implemented in a DLL. So as to avoid having to manage multiple files when using the IE driver, this DLL is embedded as a resource inside the IEDriverServer executable, and extracted to the temp directory at runtime. Once extracted, it's loaded into memory by IEDriverServer, and the main entry point of the DLL is called. This gives the driver a way to inject itself into the IE process using hooks to accomplish what it needs to. This worked great up to and including IE 9.

What Happened in IE10?


When IE 10 was first released, we started to see reports of the two aforementioned issues coming in. Since version 7 of Internet Explorer, there has been the notion of multiple processes for a single "instance" of IE. There was the notion of a "manager" or "broker" process, which managed the outer, top-level window of Internet Explorer. All HTML rendering and ActiveX controls are managed by a "content" process. Through version 9 of Internet Explorer, these processes were the same bitness. That is, running 64-bit IE meant you got a 64-bit manager process, and 64-bit content processes. Running 32-bit IE meant you were using a 32-bit manager process, and 32-bit content processes. This all changed with IE 10.

One major change in IE 10 is that the manager process on 64-bit versions of Windows will always be a 64-bit process. By default, though, content processes remained 32-bit. This allowed the main process to be a 64-bit process, but still allowed the browser to remain compatible with all of the existing browser plug-ins for IE, which are overwhelmingly 32-bit. There are other reasons for this change, and I'm oversimplifying the architecture a bit. If you're truly interested in the deep details of the architecture, I encourage you to read Eric Lawrence's blog post that lays out the implications in great detail. Anyway, there are ways to force 64-bit content processes for IE 10 and above, but these will break the IE driver due to Protected Mode issues.

So for IE 10 and above, the situation is that we have a 64-bit process, which handles the main outer window, and a 32-bit process which owns the inner window where HTML content is rendered. The driver's window hook procedure for taking screenshots must be attached to the main outer window; the driver's hook procedure for verification of message processing while simulating keystrokes must be attached to the content window being automated. Remember that since the driver executable can only be 32-bit or 64-bit, but not both, the DLL in which the window hook procedures reside can only be the same bitness as the executable. Let's explore the implications of this.

For the 32-bit IEDriverServer, the hook procedure can be successfully attached to the (32-bit) content window for use with sendKeys. Attempting to install the window hook for screenshots is attempting to install a window hook into the manager process, which is 64-bit, which can't load the 32-bit DLL into its process space, so the hook installation fails, and screenshots are truncated.

Conversely, for the 64-bit IEDriverServer, the hook can be successfully installed in the top-level window for use in taking screenshots, because the process owning that window is a 64-bit process. However, when the driver attempts to install the hook into the (32-bit) content window to detect message processing during sendKeys, the DLL is 64-bit, and can't be loaded by the 32-bit executable which owns the content window. This means that the timeout is invoked for every keystroke, with sendKeys waiting about five seconds for each key.

Why Would This Be So Hard To Fix?


By now, I hope we have a better understanding of what the root cause of both issues is. What would it take to fix the issue? A naive implementation would just attempt to bundle both 32- and 64-bit DLLs in with a version of the server and be done with it. However, this won't work because the DLL where the hook procedure lives must be loaded by both executables, IEDriverServer.exe, and the IE process owning whichever window is being hooked.

The only way to completely and correctly resolve the issue would be to create a pair of executables, with a related pair of DLLs, and have the two executables establish some way of working together via an interprocess communication channel. With this approach, now we'd be asking a user to download and manage two executables instead of one, or to use an installer of some kind. Since the project aims for an "xcopy deploy" without requiring the use of an installer, that's a larger burden than one can expect users of the IE driver to undertake.

While it's tempting to simply suggest creating a second executable and embedding it as a resource for extraction at runtime just like we do with the DLL, that approach is flawed as well. Many antivirus and malware monitors will happily allow any application to place a DLL in the temp directory and let an executable call LoadLibrary on it, but having an executable file magically appear and attempt to run in the temp directory throws up all kinds of flags. Making it a requirement to disable your antimalware software before using the IE driver is not something I'd be comfortable with.

Creating a second executable, and its attendant DLL for using the hook procedure, and figuring out some way for the two executables to communicate with each other amounts to a massive rearchitecture of the IE driver. With Microsoft's announcement of the sunset of support for all legacy versions of IE in January 2016, and with the creation of their own WebDriver implementation, it's not clear that the benefit of making these intrusive changes will outweigh the cost of doing so.