Thursday, July 23, 2015

Using the Microsoft Driver for Microsoft Edge




As the release of Windows 10 has approached, I've been seeing more and more questions about a WebDriver implementation for Microsoft Edge (nee "Spartan"). So far, I've only been able to say that there isn't a driver implementation for that particular browser, and that there is no work being done in the open-source arena on such an implementation. Furthermore, that Microsoft has acknowledged the need for such a driver and has committed to providing one, but has provided no timetable for a release of such an implementation.

As of today, I can say much more than that.

Today, Microsoft has announced the availability of a WebDriver implementation for Microsoft Edge. This implementation is released as an installable application for Windows 10, and you can download the installation package here. While the requirement to run an installer might be off-putting to some users, it appears that the installation package merely installs a standalone executable to the restricted "Program Files" location. As near as I can tell by examining the installer package, the executable can be freely copied to other locations on the machine after installation. Also, the use of an installable package means the executable can be serviced and updated by the standard, automatic Microsoft Update mechanism.

The last point is incredibly important. This first release of a driver from Microsoft is most emphatically not a finished release, and as such it lacks functionality, some of it rather basic. To wit, finding an element within the context of another element (i.e., WebElement.findElement) is not yet implemented in the current release. Finding an element via XPath is not implemented in this first release either. Switching to frames or iframes (i.e., driver.switchTo().frame()) is likewise not implemented in this initial release, and neither is the advanced user interactions API (i.e,. the Actions class). In fairness, Microsoft has been completely forthcoming in what functionality the driver has implemented and what is missing. I fully expect that the driver will be receiving regular updates, and that the missing functionality will be added in the coming weeks and months.

So, how does one use the Microsoft Edge driver? Well, thanks to our good friends in Redmond, the release of the driver implementation was accompanied by a pull request to the Selenium project that enables the existing language bindings to use it seamlessly. The Edge driver as currently released uses the open-source project's dialect of the WebDriver JSON wire protocol, allowing us to use the driver right now. My guess would be that a future release of the driver will use the W3C specification version of the protocol. The pull request has been merged, so all you'll need is an updated language binding release, and you'll be able to use it directly. If you're itching to use it in the meantime, you can manually launch MicrosoftWebDriver.exe, and use the RemoteWebDriver class in your current language bindings to connect to it on your local Windows 10 machine.

Congratulations to the Microsoft Edge development team for releasing a standard tool for automating Microsoft Edge. I, for one, appreciate the hard work and efforts you've put forth to make this happen. I look forward to future enhancements and features, and I stand ready to help any way I possibly can.

Monday, December 22, 2014

Windows Update KB3025390 for IE 11 Breaks IE Driver

Update (10 February 2015): Microsoft has released a fix as part of the February 2015 Cumulative Update to Internet Explorer. Installing this update appears to resolve the issue with the IE driver.

On 16 December 2014, Microsoft released update KB3025390 via Windows Update as part of its normal "patch Tuesday" update cycle. For most users, this update is downloaded and installed without user interaction. This update breaks the IE driver when using it with IE11.

As part of this update, attempting to use the COM method IHTMLWindow2::execScript returns an "access denied" result.  This renders the driver unable to execute JavaScript in the page bring browsed. However, given that large portions of driver functionality are implemented using JavaScript, this effectively renders the driver all but unusable with IE11.

There is no known workaround for this issue. At this time, Microsoft's WebDriver implementation for IE is still incomplete, lacking basic functionality required to make it usable, so it cannot be recommended. Uninstalling the update is reported to restore IE driver functionality, but this is hardly ideal.

While the execScript method is marked as deprecated for IE11, the driver had heretofore been able to use that method successfully, and it was hoped that it would remain useful throughout the IE11 life cycle. We now know this not to be the case. Additionally, attempts to use the Microsoft-suggested replacement, eval, have been fruitless thus far.

At the moment, a bug has been raised via Microsoft Connect on this issue, and is being investigated by the Internet Explorer development team.
The issue is also currently being tracked in the Selenium issue tracker. There is no need to post additional comments in that issue report verifying that you are experiencing the issue. Likewise, there is no need to post comments to the issue asking for status updates. Rest assured that the issue in the Selenium tracker will be updated when new information is discovered.

What can you do to help? In the coming days and weeks, I'm sure we will see a large number of people exclaiming that their WebDriver code has mysteriously stopped working against IE11, with no action on their part, and will not have searched for answers. I'm posting this message several places as a public service announcement, and would like all of you to redirect such inquiries to this post.

Tuesday, September 16, 2014

Screenshots, SendKeys, and Sixty-Four Bits

There are a couple of issues with the Internet Explorer driver that have been around since IE10 was released. They're pretty annoying when people encounter them, and the report for the first issue goes something like this:
I'm using Internet Explorer 10 (or 11), and when I call the sendKeys method, the keystrokes happen very slowly. Like one keystroke every 5 seconds. I'm on a 64-bit version of Windows, and I'm using the 64-bit IEDriverServer.exe. If I use the 32-bit version of the driver, the problem doesn't occur, but I really need to test using the 64-bit version because I need to test 64-bit IE. What's the deal?
The report for the second issue usually reads as follows:
I'm using Internet Explorer 10 (or 11), and even though I'm on 64-bit Windows, I'm using the 32-bit IEDriverServer.exe, because I was having problems with sendKeys being slow. Now, though, when I take a screen shot, it only shows the visible portion of the page. How can I take full page screen shots like I could when I use the 64-bit IEDriverServer.exe?
Both of these issues are fully documented in the Selenium issue tracker (#5116 for the sendKeys issue, and #5876 for the screenshot issue). A comment in each issue mentions that any fix would require "a massive rearchitecture of the IE driver's binary components, [so] no timeline is (or will be) available" for the delivery of a fix. What causes these issues? How are they related? Why would a fix be so darned difficult? The answers to those questions can all be summed up with a simple answer: "Windows Hooks." 

What is a Windows Hook?


All Windows applications have a routine in them called a "message loop." The message loop repeatedly calls the GetMessage API function, and processes messages sent to the application as they arrive in its queue. Hooks are a feature of the Windows message handling system that allow a developer to intercept, examine, and modify the message being sent to the application. By installing a hook, a developer could, for example, validate that a certain message was processed by the window being hooked. Or they could modify a message sent to the window to represent that the operating system could do things it actually can't. It's a clever mechanism, but it does have a few requirements.

First of all, the code being run when the hook is called (the "hook procedure") must exist in a dynamic-link library (DLL). That is, it cannot be simply a function exported from a compiled executable. The reason for this is that the code is actually going to be loaded into two applications, the application installing the hook, and the application being hooked. Using a DLL is the only way to avoid certain conflicts that would arise with loading one executable into the process space of another.

Secondly, the DLL must be of the same "bitness" of the process being hooked. In Windows, a 32-bit executable cannot load a 64-bit DLL. The converse is also true, that a 64-bit executable cannot load a 32-bit DLL. Incidentally, this is the root reason that there are two versions of IEDriverServer.exe, but that's another story for another time.

Windows Hooks and the IE Driver


The IEDriverServer.exe uses hooks for its implementation of a couple of features. The first use of a hook is in the processing of keystrokes. By default, the driver uses the Windows PostMessage API function to simulate keystrokes. It does this by sending a WM_KEYDOWN message, followed by WM_CHAR and WM_KEYUP messages for each key. However, PostMessage is asynchronous, so the driver has to wait to make sure that the WM_KEYDOWN message is processed before sending the other messages, otherwise keystrokes could be sent out of order, making key sequences garbled. The driver does this by installing a hook into IE's window procedure, and listening for the WM_KEYDOWN to be processed before proceeding. It also puts in a timeout of about five seconds waiting for the message to be processed to make sure that the driver doesn't wait forever. Note that the code path is slightly different if you're using the requireWindowFocus capability, using the SendInput API function instead, but the driver still uses a hook to make sure messages are processed before moving on.

The second place the driver uses a hook is when taking screenshots. The IE driver takes screenshots using the PrintWindow API function. PrintWindow can only take a screenshot of the visible portion of any given window, which means that in order to get a full-page screenshot (as required by the WebDriver API), the window must be sized large enough to display the entire page without scroll bars. However, Windows does not allow the window to be resized larger than the visible screen resolution. When we ask IE to resize itself, a WM_GETMINMAXINFO message is sent on a resize event so the IE can figure how large a window can be. By intercepting that message with a hook, and modifying the max values, we can trick IE into thinking that a window can be sized greater than the screen resolution would otherwise allow.

Since the IE driver makes use of hook procedures, the bulk of the IE driver is actually implemented in a DLL. So as to avoid having to manage multiple files when using the IE driver, this DLL is embedded as a resource inside the IEDriverServer executable, and extracted to the temp directory at runtime. Once extracted, it's loaded into memory by IEDriverServer, and the main entry point of the DLL is called. This gives the driver a way to inject itself into the IE process using hooks to accomplish what it needs to. This worked great up to and including IE 9.

What Happened in IE10?


When IE 10 was first released, we started to see reports of the two aforementioned issues coming in. Since version 7 of Internet Explorer, there has been the notion of multiple processes for a single "instance" of IE. There was the notion of a "manager" or "broker" process, which managed the outer, top-level window of Internet Explorer. All HTML rendering and ActiveX controls are managed by a "content" process. Through version 9 of Internet Explorer, these processes were the same bitness. That is, running 64-bit IE meant you got a 64-bit manager process, and 64-bit content processes. Running 32-bit IE meant you were using a 32-bit manager process, and 32-bit content processes. This all changed with IE 10.

One major change in IE 10 is that the manager process on 64-bit versions of Windows will always be a 64-bit process. By default, though, content processes remained 32-bit. This allowed the main process to be a 64-bit process, but still allowed the browser to remain compatible with all of the existing browser plug-ins for IE, which are overwhelmingly 32-bit. There are other reasons for this change, and I'm oversimplifying the architecture a bit. If you're truly interested in the deep details of the architecture, I encourage you to read Eric Lawrence's blog post that lays out the implications in great detail. Anyway, there are ways to force 64-bit content processes for IE 10 and above, but these will break the IE driver due to Protected Mode issues.

So for IE 10 and above, the situation is that we have a 64-bit process, which handles the main outer window, and a 32-bit process which owns the inner window where HTML content is rendered. The driver's window hook procedure for taking screenshots must be attached to the main outer window; the driver's hook procedure for verification of message processing while simulating keystrokes must be attached to the content window being automated. Remember that since the driver executable can only be 32-bit or 64-bit, but not both, the DLL in which the window hook procedures reside can only be the same bitness as the executable. Let's explore the implications of this.

For the 32-bit IEDriverServer, the hook procedure can be successfully attached to the (32-bit) content window for use with sendKeys. Attempting to install the window hook for screenshots is attempting to install a window hook into the manager process, which is 64-bit, which can't load the 32-bit DLL into its process space, so the hook installation fails, and screenshots are truncated.

Conversely, for the 64-bit IEDriverServer, the hook can be successfully installed in the top-level window for use in taking screenshots, because the process owning that window is a 64-bit process. However, when the driver attempts to install the hook into the (32-bit) content window to detect message processing during sendKeys, the DLL is 64-bit, and can't be loaded by the 32-bit executable which owns the content window. This means that the timeout is invoked for every keystroke, with sendKeys waiting about five seconds for each key.

Why Would This Be So Hard To Fix?


By now, I hope we have a better understanding of what the root cause of both issues is. What would it take to fix the issue? A naive implementation would just attempt to bundle both 32- and 64-bit DLLs in with a version of the server and be done with it. However, this won't work because the DLL where the hook procedure lives must be loaded by both executables, IEDriverServer.exe, and the IE process owning whichever window is being hooked.

The only way to completely and correctly resolve the issue would be to create a pair of executables, with a related pair of DLLs, and have the two executables establish some way of working together via an interprocess communication channel. With this approach, now we'd be asking a user to download and manage two executables instead of one, or to use an installer of some kind. Since the project aims for an "xcopy deploy" without requiring the use of an installer, that's a larger burden than one can expect users of the IE driver to undertake.

While it's tempting to simply suggest creating a second executable and embedding it as a resource for extraction at runtime just like we do with the DLL, that approach is flawed as well. Many antivirus and malware monitors will happily allow any application to place a DLL in the temp directory and let an executable call LoadLibrary on it, but having an executable file magically appear and attempt to run in the temp directory throws up all kinds of flags. Making it a requirement to disable your antimalware software before using the IE driver is not something I'd be comfortable with.

Creating a second executable, and its attendant DLL for using the hook procedure, and figuring out some way for the two executables to communicate with each other amounts to a massive rearchitecture of the IE driver. With Microsoft's announcement of the sunset of support for all legacy versions of IE in January 2016, and with the creation of their own WebDriver implementation, it's not clear that the benefit of making these intrusive changes will outweigh the cost of doing so.

Wednesday, September 10, 2014

Using the Internet Explorer WebDriver Implementation from Microsoft

Microsoft recently delivered an implementation of an Internet Explorer driver. This is great news, and should be a real help to users of the Selenium project. Concurrent with that release, the IEDriverServer.exe has been updated to take advantage of this new implementation.

The integration with IEDriverServer has been implemented as a new command-line switch on IEDriverServer.exe. By launching IEDriverServer.exe using the --implementation=<value> switch, you can force the executable to use a specific driver implementation. Valid values for the switch are:
  • LEGACY - Uses the existing open-source driver implementation
  • VENDOR - Forces the driver to use the Microsoft implementation, regardless of whether prerequisites are met, or whether the installed version of Internet Explorer is the proper version (will throw an exception when creating a new session if the prerequisites are not installed and configured properly)
  • AUTODETECT - Uses the Microsoft implementation for IE 11, if the prerequisites are installed, falling back to the open-source implementation if the components are not present (still under development in the IEDriverServer.exe code)
If no value is specified, or if the value passed in is not one of those listed above, IEDriverServer.exe will use the existing open-source implementation. This is only a temporary default; the intent is for the default to shift to be the Microsoft implementation as the specification on which it is based matures.

Prerequisites

In order to use the Microsoft implementation, you'll need a few prerequisites.

Caveats and Provisos

This integration with IEDriverServer.exe should be considered experimental at the time of this writing. First, please realize that the Microsoft implementation only supports IE11. There are no announced plans for Microsoft to support other versions of Internet Explorer with this WebDriver implementation.

Also, the Microsoft implementation strictly follows the W3C WebDriver Specification. Since the spec is currently an editors' draft, it does not completely describe the WebDriver API. In other words, there are some features that are implemented in the open-source implementation that are not documented yet in the spec, which in turn means that they are not implemented in the Microsoft implementation.

Additionally, there are some small differences in the objects sent back and forth across the JSON Wire Protocol between the spec and the open-source implementation. The implications are that there may need to be changes made to the individual language bindings to pass the proper JSON payload across to the Microsoft implementation. There is considerable pressure not to update the language bindings, since, again, the spec is currently an editors' draft, and there have been bugs filed against it to have it's protocol more closely match the existing open-source language bindings' implementations.

In the interest of allowing users to be able to experiment with the Microsoft implementation, the .NET bindings have had all of the necessary protocol changes grafted in, with explicit comments in the source code to have the changes removed when the spec is finalized and all implementations are consistent. The Java bindings have a partial implementation, and can launch IEDriverServer.exe with the proper command-line parameters to enable use of the Microsoft implementation, but the protocol changes have not yet been implemented. Unfortunately, there is no timetable for this work for other language bindings.

Example

So here's an example of what the code looks like to enable and use the Microsoft implementation of the IE driver, using C# code. It's pretty straightforward.

public static void DriveIEUsingMicrosoftImplementation()
{
    InternetExplorerDriverService service =
        InternetExplorerDriverService.CreateDefaultService();
    service.Implementation = InternetExplorerDriverEngine.Vendor;
    IWebDriver driver = new InternetExplorerDriver(service);
} 

Conversely, one could launch IEDriverServer.exe with the appropriate command-line switch and use RemoteWebDriver to talk to that running instance.

Monday, September 16, 2013

Capturing JavaScript Errors in WebDriver - Even on Page Load!



A common question I often hear bandied about with WebDriver is, "How can I capture JavaScript errors on the page?" There is an open issue for this feature in the Selenium issue tracker, but there has been little-to-no development effort expended on solving the problem. One of the major issues is that not all browsers allow WebDriver to hook into the JavaScript execution process in a way that we could retrieve the errors effectively. Internet Explorer is especially bad about this, insisting on not providing COM methods to retrieve the JavaScript errors.

The most common suggestion for making JavaScript errors available to WebDriver code involves installing an event handler to the onerror event, capturing any uncaught errors to a global variable, and using a script execution to retrieve them. If you have access to the source code of the page you're automating, this is easy, as Alister Scott has pointed out in the past. In fact, that's the method I'd strongly prefer if I ever need to capture JavaScript errors on a page. However, many people ask about how to do it if they don't have access to modify the source code, and there, the challenge is that it's very hard to inject such an event handler early enough in the page load process to catch errors that may happen in the onload event.

As we learned in my previous series on retrieving HTTP response codes, using a proxy is an incredibly powerful way to extend the reach of your WebDriver code, working around things the browser won't, by nature, let you have. With that in mind, I've put together a brief example how to retrieve the JavaScript errors on a page, even those occuring during the onload event. Once again, I'll be using Eric Lawrence's (now Telerik's) excellent Fiddler proxy. For the reasons why, you can check out the posts I referred to previously. Also, much of the browser launch code and setup and teardown of the proxy is identical to the previous posts, so I'll omit that for the sake of brevity.

The typical approach for finding JavaScript errors is a two-phase affair. First, we must inject a script into the page to catch all uncaught JavaScript errors. Such a script usually looks something like this:
window.__webdriver_javascript_errors = [];
window.onerror = function(errorMsg, url, lineNumber) {
  window.__webdriver_javascript_errors.push(
    errorMsg +' (found at ' + url + ', line ' + lineNumber + ')');
};
Then, those errors can be retrieved by with WebDriver by using something like this:
string errorRetrievalScript =
    "return window.__webdriver_javascript_errors;";
IJavaScriptExecutor executor = driver as IJavaScriptExecutor;
ReadOnlyCollection<object> returnedList =
    executor.ExecuteScript(errorRetrievalScript)
    as ReadOnlyCollection<object>;
But let's assume that I have a test page with the following HTML:
<!DOCTYPE html>
<html>
  <head>
    <title>Page with JavaScript errors on load</title>
    <script>
      function loadError() {
        var xx = document.propertyThatDoesNotExist.xyz;
      }
    </script>
  <head>
  <body onload="loadError()">
    This page has a JavaScript error in the onload event.
    Usually a problem to trap.
  </body>
</html>
One would expect that, since the aptly-named propertyThatDoesNotExist actually doesn't exist on the document object, a JavaScript error would be produced attempting to access the xyz property. Furthermore, since the function is called during the onload event of this page, the error will occur in that event, and indeed that's what happens. Parenthetically, you can see a page with exactly this structure, as part of Dave Haeffner's super-cool "The Internet" project, which exists to provide sample pages of "stuff you'll probably run into someday when using WebDriver."

So how do we make sure our error-capture script gets injected into the page in time to catch the onload event? Luckily, with a proxy, we can do exactly that. Let's take a look at how we might perform both steps of the process with WebDriver code plus the Fiddler proxy.

Let's start with the navigation portion. Here's the method I created for that:
public static void NavigateTo(this IWebDriver driver,
                              string targetUrl)
{
    string errorScript = 
        @"window.__webdriver_javascript_errors = [];
        window.onerror = function(errorMsg, url, line) {
        window.__webdriver_javascript_errors.push(
            errorMsg + ' (found at ' + url + ', line ' + line + ')');
        };";
    SessionStateHandler beforeRequestHandler = 
        delegate(Session targetSession)
        {
            // Tell Fiddler to buffer the response so that we can modify
            // it before it gets back to the browser.
            targetSession.bBufferResponse = true;
        };

    SessionStateHandler beforeResponseHandler =
        delegate(Session targetSession)
        {
            if (targetSession.fullUrl == targetUrl &&
                targetSession.oResponse
                             .headers
                             .ExistsAndContains("Content-Type", "html"))
            {
                targetSession.utilDecodeResponse();
                string responseBody =
                    targetSession.GetResponseBodyAsString();
                string headTag =
                    Regex.Match(responseBody,
                                "<head.*>",
                                RegexOptions.IgnoreCase).ToString();
                string addition =
                    headTag + "<script>" + errorScript + "</script>";
                targetSession.utilReplaceOnceInResponse(headTag,
                                                        addition,
                                                        false);
            }
        };

    FiddlerApplication.BeforeRequest += beforeRequestHandler;
    FiddlerApplication.BeforeResponse += beforeResponseHandler;
    driver.Url = targetUrl;
    FiddlerApplication.BeforeResponse -= beforeResponseHandler;
    FiddlerApplication.BeforeRequest -= beforeRequestHandler;
}
Looking closely at the code in this method, what we are doing here is attaching event handlers to manipulate the traffic sent over the wire. In the BeforeRequest event handler, we simply tell Fiddler that we want to examine and modify the response before it is sent along to the browser by setting the bBufferResponse property to true. The BeforeResponse event occurs after the response content has been received by the proxy, but before it has been forwarded to the browser. Here, we look for the close of the <head> tag in the response body, and add a <script> tag with our error-handling script immediately following. This ensures that our error-handling script is the first script executed by the browser. Note that this is an extremely crude and naive method of determining where to inject the script tag; in your implementation, you may require something a bit more sophisticated.

Okay, now we have the code in place to capture the errors, we need a method to retrieve them. The earlier fragment gives you the idea how this will look, but here's the more complete version:
public static IList<string> GetJavaScriptErrors(
    this IWebDriver driver, TimeSpan timeout)
{
    string errorRetrievalScript = 
        @"var errorList = window.__webdriver_javascript_errors;
        window.__webdriver_javascript_errors = [];
        return errorList;";
    DateTime endTime = DateTime.Now.Add(timeout);
    List<string> errorList = new List<string>();
    IJavaScriptExecutor executor = driver as IJavaScriptExecutor;
    ReadOnlyCollection<object> returnedList = 
        executor.ExecuteScript(errorRetrievalScript)
        as ReadOnlyCollection<object>;
    while (returnedList == null && DateTime.Now < endTime)
    {
        System.Threading.Thread.Sleep(250);
        returnedList =
            executor.ExecuteScript(errorRetrievalScript)
            as ReadOnlyCollection<object>;
    }

    if (returnedList == null)
    {
        return null;
    }
    else
    {
        foreach (object returnedError in returnedList)
        {
            errorList.Add(returnedError.ToString());
        }
    }

    return errorList;
}
A few features to note here. First, the retrieval script clears the cached JavaScript errors as it retrieves them. That allows you to use the same technique to check for errors after any particular action that might yield JavaScript errors. Secondly, I've added a timeout to this method, just in case no JavaScript can load on the page for some reason. This will return null, and allow us to distinguish between that error condition and legitimately having no JavaScript errors on the page.

One further thing you'll notice is that, as before in other examples, I'm using the "this" keyword as part of the argument for the driver argument. That allows these methods to be seen as .NET extension methods, making the syntax when using them a little cleaner. All that remains is to put these in action, like this:
private static void TestJavaScriptErrors(IWebDriver driver)
{
    string url = "http://path/to/your/jserror.html";
    Console.WriteLine("Navigating to {0}", url);
    driver.NavigateTo(url);
    IList<string> javaScriptErrors = driver.GetJavaScriptErrors();
    if (javaScriptErrors == null)
    {
        Console.WriteLine("Could not access JavaScript errors.");
    }
    else
    {
        if (javaScriptErrors.Count > 0)
        {
            Console.WriteLine("Found the following JavaScript errors:");
            foreach (string javaScriptError in javaScriptErrors)
            {
                Console.WriteLine(javaScriptError);
            }
        }
        else
        {
            Console.WriteLine("No JavaScript errors found.");
        }
    }
}
When run against the test page above, the you will receive output similar to the following (specific error text varies from browser-to-browser, output from Internet Explorer is shown):

Navigating to http://path/to/your/jserror.html
Found the following JavaScript errors:
Unable to get property 'xyz' of undefined or null reference
    (found at http://path/to/your/jserror.html, line 7)

As with previous examples featuring proxies, you can see the full example in the GitHub repository for them. This particular example can be seen in the JavaScriptErrorsExample project within that solution.

Monday, August 26, 2013

Implementing HTTP Status Codes in WebDriver, Part 3: Fit and Finish


This is the final part in my blog series about implementing retrieval of HTTP status codes in WebDriver. In Part 1, I demonstrated the basic premise of enabling use of a proxy to monitor HTTP traffic between the browser and the server providing the pages. In Part 2, I expanded that solution to actually inspect the traffic for the HTTP status codes. In this part, we'll be finishing off the solution by demonstrating how it works cross-browser, and using a few more tweaks to make the solution a little more elegant.

First, let's tackle the cross-browser cases. We'll start by creating a factory and an enum to smooth the creation of browsers of different types. First the enum:

enum BrowserKind
{
    InternetExplorer,
    IE = InternetExplorer,
    Firefox,
    Chrome,
    PhantomJS
}

Now, let's create the factory method which instantiates the browsers. I'm not showing the class declaration to save space, but I'm creating the factory methods in a static class called WebDriverFactory.

public static IWebDriver CreateWebDriverWithProxy(BrowserKind kind,
                                                  Proxy proxy)
{
    IWebDriver driver = null;
    switch (kind)
    {
        case BrowserKind.InternetExplorer:
            driver = CreateInternetExplorerDriverWithProxy(proxy);
            break;

        case BrowserKind.Firefox:
            driver = CreateFirefoxDriverWithProxy(proxy);
            break;

        case BrowserKind.Chrome:
            driver = CreateChromeDriverWithProxy(proxy);
            break;

        default:
            driver = CreatePhantomJSDriverWithProxy(proxy);
            break;
    }

    return driver;
}
Now, I'll list out each of the driver creation methods. These are pretty self-explanatory, but quirks of each driver are noted in the comments in the source code.
private static IWebDriver CreateInternetExplorerDriverWithProxy(Proxy proxy)
{
    InternetExplorerOptions ieOptions = new InternetExplorerOptions();
    ieOptions.Proxy = proxy;

    // Make IE not use the system proxy, and clear its cache before
    // launch. This makes the behavior of IE consistent with other
    // browsers' behavior.
    ieOptions.UsePerProcessProxy = true;
    ieOptions.EnsureCleanSession = true;

    IWebDriver driver = new InternetExplorerDriver(ieOptions);
    return driver;
}

private static IWebDriver CreateFirefoxDriverWithProxy(Proxy proxy)
{
    // A future version of the .NET Firefox driver will likely move
    // to an "Options" model to be more consistent with other browsers'
    // API.
    FirefoxProfile profile = new FirefoxProfile();
    profile.SetProxyPreferences(proxy);

    IWebDriver driver = new FirefoxDriver(profile);
    return driver;
}

private static IWebDriver CreateChromeDriverWithProxy(Proxy proxy)
{
    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.Proxy = proxy;

    IWebDriver driver = new ChromeDriver(chromeOptions);
    return driver;
}

private static IWebDriver CreatePhantomJSDriverWithProxy(Proxy proxy)
{
    // This is an egregiously inconsistent API. Expect this to change
    // so that an actual Proxy object can be passed in.
    PhantomJSDriverService service =
        PhantomJSDriverService.CreateDefaultService();
    service.ProxyType = "http";
    service.Proxy = proxy.HttpProxy;

    IWebDriver driver = new PhantomJSDriver(service);
    return driver;
}
Now that we have the WebDriverFactory class created, we can update our main method to its final form, which is the following:

static void Main(string[] args)
{
    // Note that we're using a desired port of 0, which tells
    // Fiddler to select a random available port to listen on.
    int proxyPort = StartFiddlerProxy(0);

    // We are only proxying HTTP traffic, but could just as easily
    // proxy HTTPS or FTP traffic.
    OpenQA.Selenium.Proxy proxy = new OpenQA.Selenium.Proxy();
    proxy.HttpProxy = string.Format("127.0.0.1:{0}", proxyPort);

    // You can uncomment any of the lines below to verify that the
    // retrieval of HTTP status codes works properly for each browser.
    IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.IE, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.Firefox, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.Chrome, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.PhantomJS, proxy);

    TestStatusCodes(driver);

    driver.Quit();

    StopFiddlerProxy();
    Console.WriteLine("Complete! Press <Enter> to exit.");
    Console.ReadLine();
}

We're pretty much done with our final solution, except for one final tweak. Let's revisit our NavigateTo and ClickNavigate methods from Part 2 which actually retrieve the HTTP status code. Take a look at the signatures of each of those methods:


public static int NavigateTo(IWebDriver driver, string targetUrl)
public static int ClickNavigate(IWebElement element)

One of the super-groovy things about the .NET Framework since version 3.0 is the introduction of extension methods. These allow you to extend a type with methods of your own design, allowing you to write code as if that type had that method to begin with. Our two methods are tailor-made to be used as extension methods. Simply changing the signature to the following will make that work. I'd also recommend moving those methods to a new static class named something like ExtensionMethods for clarity, but that's up to you.

public static int NavigateTo(this IWebDriver driver, string targetUrl)
public static int ClickNavigate(this IWebElement element)

That means that the final version of our TestStatusCodes method looks like this:

private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";

    // Note that the standard IWebDriver interface doesn't have
    // a NavigateTo() method that takes a URL and returns a status
    // code. However, thanks to the magic of extension methods, 
    // we can make it look like it does, and call it directly off
    // the driver object.
    int responseCode = driver.NavigateTo(url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);

    string elementId = "firefox-promo-link";

    // We're using the same extension method magic here to add in
    // a ClickNavigate() method which looks like it's directly
    // implemented by IWebElement, even though it really isn't.
    IWebElement element = driver.FindElement(By.Id(elementId));
    responseCode = element.ClickNavigate();
    Console.WriteLine("Element click returned response code {0}",
                      responseCode);

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    responseCode = driver.NavigateTo(url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);
}

We'd also probably want to revisit our timeout code in those methods, probably by providing additional overloads that would make it configurable. I've done that in my local version, and it seems to work pretty well. If you want to see all of this code in a single place, you can take a look at the GitHub repository for this and other example projects on using a proxy.

The argument of the WebDriver project committers regarding HTTP status codes is that a method to retrieve them is out of scope for the API. Furthermore, the explanation has been that the proper approach, one that will work for all browsers, without introducing a suboptimal feature to the WebDriver API, is to use a proxy to capture the HTTP traffic and analyze it yourself. The response to that argument has often been that's too hard to do, and it's stupid to use a screwdriver to put in a screw, when one has a hammer that will work just as well. Hopefully, with this series of blog posts, I've shown that it's pretty easy to work out the use of a proxy to get the information you want. My example is in the .NET bindings, but Java, Ruby, and Python examples would look similar, when using a software-based proxy written in those languages.

Tuesday, August 13, 2013

Implementing HTTP Status Codes in WebDriver, Part 2: Achievement Unlocked


UPDATE (21 August 2013): In response to a comment by Eric Lawrence (author of Fiddler and all around awesome chap), I've updated the code sample for the redirect case. Thanks Eric for taking the time to comment and point out where I could make improvements.

In Part 1 of this series, we looked at the beginnings of implementing HTTP status codes in WebDriver the correct way. That is to say, by using a proxy server to monitor traffic for the information we want. To recap, we're using Fiddler as our proxy, the .NET bindings to execute our WebDriver code, and we're running against Mozilla's website as our test destination. At the end of the last blog post, we successfully had a proxy hooked up, which will log resources to the console as they are requested by the browser. Now it's time to actually extract the HTTP status codes from the information that the proxy is able to collect. As a reminder, here's what our WebDriver execution looks like:
private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";
    driver.Navigate().GoToUrl(url);

    string elementId = "firefox-promo-link";
    IWebElement element = driver.FindElement(By.Id(elementId));
    element.Click();

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    driver.Navigate().GoToUrl(url);
}
So the first thing we are doing in our WebDriver code is navigating to http://www.mozilla.org/. So let's create a method that will perform the navigation, and return us the status code. As we saw last time, Fiddler lets us hook up an event delegate to respond every time a resource is retrieved by the browser, and analyze that response. The nice thing about event delegates in .NET is that we don't need to leave them hooked up any longer than necessary. Here's our first stab at a method that will hook and unhook the delegate for the navigation:
public static int NavigateTo(IWebDriver driver, string targetUrl)
{
    int responseCode = 0;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        responseCode = targetSession.responseCode;
    };

    FiddlerApplication.AfterSessionComplete += responseHandler;
    driver.Url = targetUrl;
    while (responseCode == 0)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.AfterSessionComplete -= responseHandler;
    return responseCode;
}
Astute readers will see that this has a couple of issues with it. First, how do we know what behavior we want for redirects? Our base URL to which we're navigating has just such a redirect. Do we expect to return a 300-level response, or follow the navigations through until we receive a 200-level or 400-level response? This is a perfect example of why there's no one-size-fits-all approach to HTTP status codes that will work for every WebDriver user, and a reason why, in turn, this feature is out of scope in the WebDriver API. In our case, if the URL redirects for navigation, we're going to return the redirect response code. In your implementation, if you decide on another approach, you'll want to modify the event handler delegate to meet your own needs.

The second issue is that we aren't guaranteed that we're returning the response code for the proper resource. So we want a modification that will validate that. Also, we'll probably want to create a timeout so that we don't inadvertently loop infinitely in the while loop. Making these modifications, you'll get a method that looks something like this:
public static int NavigateTo(IWebDriver driver, string targetUrl)
{
    int responseCode = 0;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        if (targetSession.fullUrl == targetUrl)
        {
            responseCode = targetSession.responseCode;
        }
    };

    FiddlerApplication.AfterSessionComplete += responseHandler;

    // Yes, we're hard-coding a 10 second timeout here. Don't worry, we'll
    // make that configurable before we're done.
    DateTime endTime = DateTime.Now.Add(TimeSpan.FromSeconds(10));
    driver.Navigate().GoToUrl(targetUrl);
    while (responseCode == 0 && DateTime.Now < endTime)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.AfterSessionComplete -= responseHandler;
    return responseCode;
}
Okay, so now we have a method that will return us the status code on explicit navigation to a URL. What about on a click that navigates to a new location? Clicks are a little trickier, because a click might trigger a navigation, or it might not. In my opinion, you should know what type of click you'll be performing, so I'll create a method that we will explicitly call when we want to perform a click that will navigate, and return the HTTP status code of that navigation. I'll also take this opportunity to demonstrate a way to handle redirects, since the link we're clicking on in our test code also causes a redirect. Again, we'll hook up a delegate for the duration of the time we need it, and unhook it after we're done.
public static int ClickNavigate(IWebElement element)
{
    int responseCode = 0;
    string targetUrl = string.Empty;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        // For the first session of the click, the URL should be the initial 
        // URL requested by the element click.
        if (string.IsNullOrEmpty(targetUrl))
        {
            targetUrl = targetSession.fullUrl;
        }

        // This algorithm could be much more sophisticated based on your
        // needs. In our case, we'll only look for responses where the
        // content type is HTML, and that the URL of the session matches
        // our current target URL. Note that we also only set the response
        // code if it's not already been set.
        if (targetSession.oResponse["Content-Type"].Contains("text/html") && 
            targetSession.fullUrl == targetUrl &&
            responseCode == 0)
        {
            // If the response code is a redirect, get the URL of the
            // redirect, so that we can look for the next response from
            // the session for that URL.
            if (targetSession.responseCode >= 300 &&
                targetSession.responseCode < 400)
            {
                // Use GetRedirectTargetURL rather than examining the
                // "Location" header, as some sites (illegally) might
                // use a relative URL for the header (per Eric Lawrence).
                targetUrl = targetSession.GetRedirectTargetURL();
            }
            else
            {
                responseCode = targetSession.responseCode;
            }
        }
    };

    // Note that we're using the ResponseHeadersAvailable event so
    // as to avoid a race condition with the browser (per Eric
    // Lawrence).
    FiddlerApplication.ResponseHeadersAvailable += responseHandler;

    // Yes, we're hard-coding a 10 second timeout here. Don't worry, we'll
    // make that configurable before we're done.
    DateTime endTime = DateTime.Now.Add(TimeSpan.FromSeconds(10));
    element.Click();
    while (responseCode == 0 && DateTime.Now < endTime)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.ResponseHeadersAvailable -= responseHandler;
    return responseCode;
}
All that remains is to modify our WebDriver code to call our new methods instead of the standard WebDriver ones, and add some console logging to prove that we get actual status codes returned from our methods. That modifies our TestStatusCodes method to look like this:
private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";
    int responseCode = NavigateTo(driver, url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);

    string elementId = "firefox-promo-link";
    IWebElement element = driver.FindElement(By.Id(elementId));
    responseCode = ClickNavigate(element);
    Console.WriteLine("Element click returned response code {0}",
                      responseCode);

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    responseCode = NavigateTo(driver, url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);
}
Running our console application from last time, we now will receive output that looks like the following:
Starting Fiddler proxy
Fiddler proxy listening on port 62594
Navigating to http://www.mozilla.org/
Navigation to http://www.mozilla.org/ returned response code 301
Clicking on element with ID firefox-promo-link
Element click returned response code 200
Navigating to http://www.mozilla.org/en-US/doesnotexist.html
Navigation to http://www.mozilla.org/en-US/doesnotexist.html returned response code 404
Shutting down Fiddler proxy
Complete! Press <Enter> to exit.
Now we have a fully functioning example for Firefox. Next time, we'll add the code to make it cross-browser aware, and add a few more tricks to make it more elegant for use with WebDriver.