Monday, August 20, 2012

You're Doing It Wrong: IE Protected Mode and WebDriver

There's a common problem most people run into with the Internet Explorer driver when they first start using it with IE 7 and above. Most people start by writing code that looks something like this, expecting it to work on a clean installation of Windows, or at least one with the default settings for Internet Explorer:
WebDriver driver = new InternetExplorerDriver();
Imagine their surprise when they get an exception that looks something like this:
org.openqa.selenium.WebDriverException: Unexpected error launching Internet Explorer. Protected Mode must be set to the same value (enabled or disabled) for all zones. (WARNING: The server did not provide any stacktrace information)
A careful reading of the exception's message tells one exactly what the problem is. People who don't bother to read the exception message then turn to their favorite search engine, and after a quick search, they often blindly modify their code to do something like the following:

DesiredCapabilities caps = DesiredCapabilities.internetExplorer();
WebDriver driver = new InternetExplorerDriver(caps);

While this will certainly get them past the initial exception, and will allow the test to run in most cases without incident, it's patently the Wrong Thing to do. The operative questions then are, "Why is it wrong, and what is the right way?" If you don't care about why it's wrong, and just want to know how to fix it correctly, you can skip the historical background by clicking on this handy tl;dr link.

Why does the IE driver require Protected Mode settings changes anyway?

Way back through the mists of time, before 2006, life was easy for automating Internet Explorer. A browser session was represented by a single instance of the iexplore.exe executable. A framework for driving IE could instantiate the browser as a COM object using CoCreateInstance(), or could easily get the COM interfaces to a running instance by using the presence of ActiveAccessibility and sending a WM_HTML_GETOBJECT message to the appropriate IE window handle. Once the framework had a pointer to the COM interfaces, you could be sure that they'd be valid for the lifetime of the browser. It also meant you could easily attach to the events fired by the browser through the DWebBrowserEvents2 COM interface.

Then along came the combination of IE 7 and Windows Vista. In and effort to reduce the attack surface presented by malicious web sites, IE 7 introduced something called Protected Mode, which leveraged Mandatory Integrity Control in Windows Vista to prevent actions initiated IE, usually initiated by JavaScript, from being able to access the operating system the way it could in prior releases. While this was generally a welcome development for most users of IE, it created all manner of problems for automating IE.

When you cross into or out of Protected Mode by, say, navigating from an internal intranet website to one on the internet, IE has to create a new process, because it can't change the Mandatory Integrity Control level of the existing process. Moreover, in IE versions after 7, it's not always obvious that a Protected Mode boundary has been crossed, since IE tries to present a better user experience by seamlessly merging the browser window of the new process with the already opened browser window. This under-the-covers process switching also means that any references pointing to IE's COM objects before the Protected Mode boundary crossing are left pointing to objects that are no longer used by IE after the boundary crossing.

How do I fix it?

One way to solve the problem is to either turn User Access Control (UAC) off, or elevate your privileges to an administrator when running WebDriver code, because all process started by elevated users have a High Mandatory Integrity Control level. Elevating to administrative privileges was not a great option, because the process of elevating couldn't and shouldn't be automated. Similarly, turning off UAC is unacceptable as it leaves the machine in a vulnerable state. Nevertheless, in pre-2.0 versions of the IE driver, that's exactly what one had to do to get code working, and it was one of the primary motivations for rewriting the IE driver in 2010.

Since the tricky bit to solve is when Protected Mode boundaries are crossed, a design decision was made to eliminate the boundary crossings. The simplest way to do that is to change the Protected Mode settings in the browser to be the same, either enabled or disabled, it doesn't matter which, for all zones. That way, it doesn't matter what navigation occurs, it won't cross the Protected Mode boundary, and won't trigger orphaning of the COM objects the IE driver relies on. Moreover, setting the Protected Mode boundaries for the security zones are per-user settings in Windows, and don't generally require elevated privileges to set them.

So what's with the Capabilities hack?

When the rewritten IE driver was first introduced, it was decided that it would enforce its required Protected Mode settings, and throw an exception if they were not properly set. Protected Mode settings, like almost all other settings of IE, are stored in the Windows registry, and are checked when the browser is instantiated. However, some misguided IT departments make it impossible for developers and testers to set even the most basic settings on their machines.

The driver needed a workaround for people who couldn't set those IE settings because their machine was overly locked down. That's what the capability setting is intended to be used for. It simply bypasses the registry check. Using the capability doesn't solve the underlying problem though. If a Protected Mode boundary is crossed, very unexpected behavior including hangs, element location not working, and clicks not being propagated, could result. To help warn people of this potential problem, the capability was given big scary-sounding names like INTRODUCE_FLAKINESS_BY_IGNORING_SECURITY_DOMAINS in Java and IntroduceInstabilityByIgnoringProtectedModeSettings in .NET. We really thought that telling the user that using this setting would introduce potential badness in their code would discourage its use, but it turned out not to be so.

Let me state this now in very clear and unambiguous terms. If you are able to set the Protected Mode settings of IE, and you are still using the capability you are risking the stability of your code. Don't do it. Set the settings. It's not that hard.

How to set Protected Mode settings

In IE, from the Tools menu (or the gear icon in the toolbar in later versions), select "Internet options." Go to the Security tab. At the bottom of the dialog for each zone, you should see a check box labeled "Enable Protected Mode." Set the value of the check box to the same value, either checked or unchecked, for each zone. Here's the dialog for reference:

Note that you don't have to change the slider for security level, and you don't have to disable Protected Mode. I routinely run with Protected Mode turned on for all zones, as I think it provides a more secure browsing experience.