Thursday, June 28, 2012

Whither NuGet and the WebDriver .NET bindings

It happens every so often. Someone is trying to use the WebDriver .NET language bindings, and it happens that their project also uses the excellent Json.NET library created by James Newton-King. They decide they want to use NuGet for managing their dependencies. Everything moves along swimmingly until there's an update to the Json.NET library. NuGet dutifully updates the dependency, but when they go to run their WebDriver tests, and everything blows up. After a cursory investigation, they see that the NuGet package for the Selenium WebDriver .NET bindings have a dependency on the Json.NET package, but the package declares the dependency on a specific version of Json.NET. When people see this, they inevitably point out that there must be an error in the authoring of the Selenium.WebDriver package. Hopefully, I'll be able to explain exactly why this isn't an error in the package authoring, and why naive proposed solutions won't solve this.

First, a little background. The build process of the .NET bindings is integrated with the build process for the other portions of the Selenium project. This means the released binaries aren't built through Visual Studio, and aren't built via MSBuild. The Selenium build script uses Rake as it's build engine, via the Albacore project. We have some unique requirements that don't allow us to just build the .csproj file and be done with it. As part of the build process, we need to embed the outputs of other targets (which are not built via Visual Studio or MSBuild) as resources into the .NET assembly. To accomplish this, we end up calling csc.exe with the appropriate command-line switches. This is a nicely stable build process, follows the patterns of the build for the other portions of the Selenium project, and doesn't require hand-development and maintenance of a custom MSBuild project file.

As part of the build process, we also sign the assembly to give it a strong name. This was a feature request of users who wished to reference the WebDriver .NET bindings from a project that also had a strong name. You see, when you create a strong-named assembly, any assemblies that you reference must also be strong-named. Part of the strong name of such an assembly is its version number. When your strong-named assembly tries to resolve the reference to another strong-named assembly, but the only copy you have of the referenced assembly is of a different version, the .NET Framework cheerfully tells you, "I don't have a copy of the assembly you're looking for." Which you don't, because the name of the assembly (which contains the version information because it's a strong name) doesn't match.

Complicating this behavior is that if your assembly is unsigned, it is free to reference assemblies with and without strong names. Furthermore, the unsigned assembly will load referenced assemblies matched by file name only, happily ignoring the version information in the referenced assembly. Since many people don't bother giving their assemblies strong names, they never have to think about the versions of the assemblies they reference.

I'm sure those of you who've encountered this before are already far ahead of me. Both the WebDriver .NET bindings assembly and the Json.NET assembly have strong names. That means that the WebDriver assembly can use only the version of the Json.NET assembly that it was compiled against. When building the NuGet package for the .NET bindings, we must restrict the packages we reference to the exact versions we compile against, otherwise the .NET bindings won't be able to load the referenced assembly if NuGet downloads the latest version of Json.NET, and it's not the version we've compiled against. Of course, that's a problem if your project is unsigned and also requires a reference to Json.NET, and your reference isn't similarly scoped with respect to version, because now you have a version conflict. This is why simply changing the version specification for the Json.NET reference in the WebDriver .nuspec file to be a "greater than or equal to version x" vs. "be exactly x" won't solve the problem.

Some people suggest that migrating the project to use NuGet to manage its dependencies is a viable solution to this problem. Given all that we already require to successfully build the project, I remain unconvinced that adding one more technology that prospective contributors have to know about is not the friendliest approach. This is particularly so if the prospective contributor in question has no need or desire to use NuGet in his or her own projects. Getting people to engage and contribute is hard enough without throwing other roadblocks in their way (but that's another rant for another day).

It should be noted that this is a well-known issue with creating NuGet packages with strong-named assemblies. As far as I'm aware, no one has come up with a good solution. The most promising solution would be to change the release of the WebDriver assemblies to be unsigned. However, this would amount to a change in functionality for anyone who is referencing them from a signed assembly, and would likely frustrate them just as much as those who might be frustrated by the current state of affairs in the NuGet package.

Update 6 July 2012: Apparently, James Newton-King, the author of the Json.NET library, has been experimenting with a solution to this problem. I'll be modifying the .NET bindings NuGet package to take advantage of this soon. There's still a fundamental design flaw in the mismatch between packaging and strong-named assemblies, but for a widely-used, frequently-updated library like Json.NET, this is likely the best solution.

Friday, June 22, 2012

What's Wrong With the Internet Explorer Driver?

The Internet Explorer driver in the Selenium WebDriver project has consumed far too much of my life over the last two years, which is when I first undertook investigating rewriting the driver to repair some of its shortcomings. During that time, I've learned far more about COM programming in the C/C++ world than I'd ever known before, probably far more than I ever cared to know. The driver has been in widespread use as part of the regular Selenium releases for nearly 18 months now, and I think it's time to take stock once and for all about the currently known issues with so-called "native events" used to interact with elements in the IE driver.

My purpose in bringing this up is twofold. First, it's a good place to acknowledge where we still have to come in order to make the IE driver as good as possible for the users of Selenium. Secondly, it gives me the chance to reiterate the open invitation that I've always had for anyone who'd like to review the code of the IE driver, make improvements, and submit patches. Before launching into these challenges, I need to cover a few basic assumptions about why the driver is architected the way it is.

There are a few main principles of the WebDriver project that directly impact the decisions of how the IE driver is built. They are, in roughly priority order:
  1. The driver should be installable with an xcopy deployment mechanism, and be capable of execution without elevating to admin permissions. Sadly, a significant percentage of the users of Selenium do not have admin access to their Windows machines, which precludes the use of a plugin (Browser Helper Object or BHO) for IE, as these require admin access to the registry at the very least.
  2. The driver should use "native events" to interact with elements. This means using OS-level mechanisms to simulate user keyboard and mouse inputs. These would be contrasted by "synthetic events" which are JavaScript simulations of those inputs, which have challenges with accuracy of simulation and fidelity of operation.
  3. The driver should not require the browser instance being automated to be the focused window in the OS. It is expressly a goal of WebDriver that developers running WebDriver code will be free to use their machines for other purposes even while that code is running in the background.

Given these three requirements then, there are two major problems with using native events in the IE driver. The first of these is that mouse clicks to the IE window get swallowed up when the browser window does not have the system focus. In this case, the element in question has a focus rectangle around it, but no click appears to have happened.

Screenshot of an element click into a background IE window
Let me say a brief word about how native events are done in the IE driver. The driver currently works by sending Windows messages to the IE window being driven. We know this is not necessarily the best approach, and that the "correct" way to simulate input for the IE window would be using the SendInput() API. However, this would require the browser window to have focus, which opens up all manner of issues, particularly if you have two InternetExplorerDriver instances attempting to manipulate pages at the same time. 

The "flashing hover" problem
The second issue is with mouse hovers using native events. The symptoms of this problem are that when you execute a mouse hover using the Actions class of the IE driver, and your physical mouse pointer is within the bounds of the browser window, the menu will flash and immediately disappear. As near as I've been able to determine, IE is doing some sort of hit-testing (probably calling the WindowFromPoint() API) when it receives a WM_MOUSEMOVE message which redraws the canvas if the location of the physical cursor is detected inside the window boundaries.

Solving the above two problems would go a long way toward the long-term stability of the IE driver. Alternatively, if the IE team at Microsoft took over the maintenance of the driver, as the teams responsible for Chrome, Opera, and soon Firefox have done, then the true experts in the architecture of the Internet Explorer browser could bring all of their experience and expertise to bear, and help make Internet Explorer a first-class citizen in the world of automated web testing.
 

In fact, that would be my challenge to the IE team at Microsoft: step up, and contribute code to the IE driver. Take the existing code, and do something great and awe-inspiring with it. Become a leader in this space, instead of the lagging follower you currently are.