Monday, November 19, 2012

Are you kidding me, IE Driver? Another freaking thing to download?

Every now and again, I come across someone who remembers that the IE driver used to be completely self-contained for all of the supported language bindings, and realizes they need to download IEDriverServer.exe for the driver to work in more recent versions. Often, the question comes up as to why there's now a separate executable. The question is sometimes phrased like this:
You mean I have to download yet another component to use the IE driver now? I hated it when the Chrome driver forced me to do the same thing, and now you want me to go through even more hoops? Man, you Selenium developer guys really suck!
Admittedly, the need to download another component is a slight inconvenience. However, there were a few good reasons why this path was chosen.

The original implementation of the native (C++) code of the IE driver was in a .dll. In most of the client language bindings, this .dll was extracted at runtime from whatever packaging solution was appropriate for the language. For Java, this meant extracting it from the .jar; for .NET, this meant extracting it from a resource packed into the WebDriver.dll assembly. Ruby and Python packaging mostly relied on files laid out on disk, so no extraction was needed, just a reference to the path of the .dll. The language binding would then use its native API (JNI for Java, P/Invoke for .NET, ctypes for Python, FFI for Ruby) to load the .dll and call the exposed API for starting the "server" portion of the IE driver.

This worked alright for a time, especially in simple scenarios, but the development team eventually started noticing subtle differences in behavior between language bindings. For example, because of the way the .NET bindings loaded and managed the native .dll, it was able to support simultaneous multiple instances of IE, while the Java bindings did not. The Ruby bindings should have supported multiple instances, because it modeled its native code management after the .NET mechanism, but the native code interface the bindings used, FFI, didn't allow a loaded native library to be unloaded (i.e., it didn't support a call to the FreeLibrary Win32 API). All of this came down to the fact that each language's native code interaction method had slightly different semantics from the others.

Something had to be done to unify the user experience across languages. As it happens, using a separate process is a much more consistent story, because each language binding can use its process management API. Why does that result in more consistency? Because process management is defined by the operating system, the various versions of Windows in the case of the IE driver.

A happy side effect of moving to a separate executable is in the realm of 32-bit vs. 64-bit versions of the browser. It's a limitation of Windows that a 32-bit process cannot load a 64-bit .dll, and vice versa. So, this meant that if your language runtime was 64-bit, you could only run your WebDriver code against the 64-bit version of IE, and if it was important to you to run your WebDriver code against the 32-bit version of IE, you were out of luck. If this sounds like a far-fetched scenario, I'll point out that this was the default situation for .NET running on a 64-bit version of Windows. However, there is no restriction on a 64-bit process launching a 32-bit process. With the introduction of the standalone executable, it would be possible to run WebDriver code against a 32-bit or a 64-bit version of IE, no matter what the "bitness" of your language runtime.

Also, by moving the executable outside the normal delivery mechanism of the language bindings, delivery of the IE driver core is decoupled from full Selenium releases. That means it's possible to ship fixes in the IE driver without having to wait for a full release of Selenium. Since the introduction of the standalone IEDriverServer.exe executable, we've used this ability to deliver on bug fixes and functionality updates between releases of the language bindings.

Finally, the standalone executable is vastly easier to debug than a .dll loaded by the language bindings. Attaching to the standalone process is dead-simple from a debugger, and eliminates the guesswork of, "which java.exe (or ruby.exe or python.exe) process do I need to attach to?"

Of course, there is the matter of an extra thing that needs to be downloaded before you can run WebDriver code against IE. One might be tempted to try bundling the executable inside the language binding package like the .dll used to be, but extracting an executable at runtime and attempting to start it is what antivirus scanners just live to scream about. Protip: When you're designing your framework around WebDriver, use a command-line download utility like wget or curl to be able to download the IEDriverServer.exe from the web as you're setting up your environments.


  1. Or manage your Se nodes using Puppet or Chef or similar tooling and don't worry about it.

  2. Thanks for the write up Jim. For me it was a huge step up to be able to debug with the IE driver.