Monday, August 26, 2013

Implementing HTTP Status Codes in WebDriver, Part 3: Fit and Finish


This is the final part in my blog series about implementing retrieval of HTTP status codes in WebDriver. In Part 1, I demonstrated the basic premise of enabling use of a proxy to monitor HTTP traffic between the browser and the server providing the pages. In Part 2, I expanded that solution to actually inspect the traffic for the HTTP status codes. In this part, we'll be finishing off the solution by demonstrating how it works cross-browser, and using a few more tweaks to make the solution a little more elegant.

First, let's tackle the cross-browser cases. We'll start by creating a factory and an enum to smooth the creation of browsers of different types. First the enum:

enum BrowserKind
{
    InternetExplorer,
    IE = InternetExplorer,
    Firefox,
    Chrome,
    PhantomJS
}

Now, let's create the factory method which instantiates the browsers. I'm not showing the class declaration to save space, but I'm creating the factory methods in a static class called WebDriverFactory.

public static IWebDriver CreateWebDriverWithProxy(BrowserKind kind,
                                                  Proxy proxy)
{
    IWebDriver driver = null;
    switch (kind)
    {
        case BrowserKind.InternetExplorer:
            driver = CreateInternetExplorerDriverWithProxy(proxy);
            break;

        case BrowserKind.Firefox:
            driver = CreateFirefoxDriverWithProxy(proxy);
            break;

        case BrowserKind.Chrome:
            driver = CreateChromeDriverWithProxy(proxy);
            break;

        default:
            driver = CreatePhantomJSDriverWithProxy(proxy);
            break;
    }

    return driver;
}
Now, I'll list out each of the driver creation methods. These are pretty self-explanatory, but quirks of each driver are noted in the comments in the source code.
private static IWebDriver CreateInternetExplorerDriverWithProxy(Proxy proxy)
{
    InternetExplorerOptions ieOptions = new InternetExplorerOptions();
    ieOptions.Proxy = proxy;

    // Make IE not use the system proxy, and clear its cache before
    // launch. This makes the behavior of IE consistent with other
    // browsers' behavior.
    ieOptions.UsePerProcessProxy = true;
    ieOptions.EnsureCleanSession = true;

    IWebDriver driver = new InternetExplorerDriver(ieOptions);
    return driver;
}

private static IWebDriver CreateFirefoxDriverWithProxy(Proxy proxy)
{
    // A future version of the .NET Firefox driver will likely move
    // to an "Options" model to be more consistent with other browsers'
    // API.
    FirefoxProfile profile = new FirefoxProfile();
    profile.SetProxyPreferences(proxy);

    IWebDriver driver = new FirefoxDriver(profile);
    return driver;
}

private static IWebDriver CreateChromeDriverWithProxy(Proxy proxy)
{
    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.Proxy = proxy;

    IWebDriver driver = new ChromeDriver(chromeOptions);
    return driver;
}

private static IWebDriver CreatePhantomJSDriverWithProxy(Proxy proxy)
{
    // This is an egregiously inconsistent API. Expect this to change
    // so that an actual Proxy object can be passed in.
    PhantomJSDriverService service =
        PhantomJSDriverService.CreateDefaultService();
    service.ProxyType = "http";
    service.Proxy = proxy.HttpProxy;

    IWebDriver driver = new PhantomJSDriver(service);
    return driver;
}
Now that we have the WebDriverFactory class created, we can update our main method to its final form, which is the following:

static void Main(string[] args)
{
    // Note that we're using a desired port of 0, which tells
    // Fiddler to select a random available port to listen on.
    int proxyPort = StartFiddlerProxy(0);

    // We are only proxying HTTP traffic, but could just as easily
    // proxy HTTPS or FTP traffic.
    OpenQA.Selenium.Proxy proxy = new OpenQA.Selenium.Proxy();
    proxy.HttpProxy = string.Format("127.0.0.1:{0}", proxyPort);

    // You can uncomment any of the lines below to verify that the
    // retrieval of HTTP status codes works properly for each browser.
    IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.IE, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.Firefox, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.Chrome, proxy);
    //IWebDriver driver = WebDriverFactory.CreateWebDriverWithProxy(BrowserKind.PhantomJS, proxy);

    TestStatusCodes(driver);

    driver.Quit();

    StopFiddlerProxy();
    Console.WriteLine("Complete! Press <Enter> to exit.");
    Console.ReadLine();
}

We're pretty much done with our final solution, except for one final tweak. Let's revisit our NavigateTo and ClickNavigate methods from Part 2 which actually retrieve the HTTP status code. Take a look at the signatures of each of those methods:


public static int NavigateTo(IWebDriver driver, string targetUrl)
public static int ClickNavigate(IWebElement element)

One of the super-groovy things about the .NET Framework since version 3.0 is the introduction of extension methods. These allow you to extend a type with methods of your own design, allowing you to write code as if that type had that method to begin with. Our two methods are tailor-made to be used as extension methods. Simply changing the signature to the following will make that work. I'd also recommend moving those methods to a new static class named something like ExtensionMethods for clarity, but that's up to you.

public static int NavigateTo(this IWebDriver driver, string targetUrl)
public static int ClickNavigate(this IWebElement element)

That means that the final version of our TestStatusCodes method looks like this:

private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";

    // Note that the standard IWebDriver interface doesn't have
    // a NavigateTo() method that takes a URL and returns a status
    // code. However, thanks to the magic of extension methods, 
    // we can make it look like it does, and call it directly off
    // the driver object.
    int responseCode = driver.NavigateTo(url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);

    string elementId = "firefox-promo-link";

    // We're using the same extension method magic here to add in
    // a ClickNavigate() method which looks like it's directly
    // implemented by IWebElement, even though it really isn't.
    IWebElement element = driver.FindElement(By.Id(elementId));
    responseCode = element.ClickNavigate();
    Console.WriteLine("Element click returned response code {0}",
                      responseCode);

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    responseCode = driver.NavigateTo(url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);
}

We'd also probably want to revisit our timeout code in those methods, probably by providing additional overloads that would make it configurable. I've done that in my local version, and it seems to work pretty well. If you want to see all of this code in a single place, you can take a look at the GitHub repository for this and other example projects on using a proxy.

The argument of the WebDriver project committers regarding HTTP status codes is that a method to retrieve them is out of scope for the API. Furthermore, the explanation has been that the proper approach, one that will work for all browsers, without introducing a suboptimal feature to the WebDriver API, is to use a proxy to capture the HTTP traffic and analyze it yourself. The response to that argument has often been that's too hard to do, and it's stupid to use a screwdriver to put in a screw, when one has a hammer that will work just as well. Hopefully, with this series of blog posts, I've shown that it's pretty easy to work out the use of a proxy to get the information you want. My example is in the .NET bindings, but Java, Ruby, and Python examples would look similar, when using a software-based proxy written in those languages.

Tuesday, August 13, 2013

Implementing HTTP Status Codes in WebDriver, Part 2: Achievement Unlocked


UPDATE (21 August 2013): In response to a comment by Eric Lawrence (author of Fiddler and all around awesome chap), I've updated the code sample for the redirect case. Thanks Eric for taking the time to comment and point out where I could make improvements.

In Part 1 of this series, we looked at the beginnings of implementing HTTP status codes in WebDriver the correct way. That is to say, by using a proxy server to monitor traffic for the information we want. To recap, we're using Fiddler as our proxy, the .NET bindings to execute our WebDriver code, and we're running against Mozilla's website as our test destination. At the end of the last blog post, we successfully had a proxy hooked up, which will log resources to the console as they are requested by the browser. Now it's time to actually extract the HTTP status codes from the information that the proxy is able to collect. As a reminder, here's what our WebDriver execution looks like:
private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";
    driver.Navigate().GoToUrl(url);

    string elementId = "firefox-promo-link";
    IWebElement element = driver.FindElement(By.Id(elementId));
    element.Click();

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    driver.Navigate().GoToUrl(url);
}
So the first thing we are doing in our WebDriver code is navigating to http://www.mozilla.org/. So let's create a method that will perform the navigation, and return us the status code. As we saw last time, Fiddler lets us hook up an event delegate to respond every time a resource is retrieved by the browser, and analyze that response. The nice thing about event delegates in .NET is that we don't need to leave them hooked up any longer than necessary. Here's our first stab at a method that will hook and unhook the delegate for the navigation:
public static int NavigateTo(IWebDriver driver, string targetUrl)
{
    int responseCode = 0;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        responseCode = targetSession.responseCode;
    };

    FiddlerApplication.AfterSessionComplete += responseHandler;
    driver.Url = targetUrl;
    while (responseCode == 0)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.AfterSessionComplete -= responseHandler;
    return responseCode;
}
Astute readers will see that this has a couple of issues with it. First, how do we know what behavior we want for redirects? Our base URL to which we're navigating has just such a redirect. Do we expect to return a 300-level response, or follow the navigations through until we receive a 200-level or 400-level response? This is a perfect example of why there's no one-size-fits-all approach to HTTP status codes that will work for every WebDriver user, and a reason why, in turn, this feature is out of scope in the WebDriver API. In our case, if the URL redirects for navigation, we're going to return the redirect response code. In your implementation, if you decide on another approach, you'll want to modify the event handler delegate to meet your own needs.

The second issue is that we aren't guaranteed that we're returning the response code for the proper resource. So we want a modification that will validate that. Also, we'll probably want to create a timeout so that we don't inadvertently loop infinitely in the while loop. Making these modifications, you'll get a method that looks something like this:
public static int NavigateTo(IWebDriver driver, string targetUrl)
{
    int responseCode = 0;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        if (targetSession.fullUrl == targetUrl)
        {
            responseCode = targetSession.responseCode;
        }
    };

    FiddlerApplication.AfterSessionComplete += responseHandler;

    // Yes, we're hard-coding a 10 second timeout here. Don't worry, we'll
    // make that configurable before we're done.
    DateTime endTime = DateTime.Now.Add(TimeSpan.FromSeconds(10));
    driver.Navigate().GoToUrl(targetUrl);
    while (responseCode == 0 && DateTime.Now < endTime)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.AfterSessionComplete -= responseHandler;
    return responseCode;
}
Okay, so now we have a method that will return us the status code on explicit navigation to a URL. What about on a click that navigates to a new location? Clicks are a little trickier, because a click might trigger a navigation, or it might not. In my opinion, you should know what type of click you'll be performing, so I'll create a method that we will explicitly call when we want to perform a click that will navigate, and return the HTTP status code of that navigation. I'll also take this opportunity to demonstrate a way to handle redirects, since the link we're clicking on in our test code also causes a redirect. Again, we'll hook up a delegate for the duration of the time we need it, and unhook it after we're done.
public static int ClickNavigate(IWebElement element)
{
    int responseCode = 0;
    string targetUrl = string.Empty;
    SessionStateHandler responseHandler = delegate(Session targetSession)
    {
        // For the first session of the click, the URL should be the initial 
        // URL requested by the element click.
        if (string.IsNullOrEmpty(targetUrl))
        {
            targetUrl = targetSession.fullUrl;
        }

        // This algorithm could be much more sophisticated based on your
        // needs. In our case, we'll only look for responses where the
        // content type is HTML, and that the URL of the session matches
        // our current target URL. Note that we also only set the response
        // code if it's not already been set.
        if (targetSession.oResponse["Content-Type"].Contains("text/html") && 
            targetSession.fullUrl == targetUrl &&
            responseCode == 0)
        {
            // If the response code is a redirect, get the URL of the
            // redirect, so that we can look for the next response from
            // the session for that URL.
            if (targetSession.responseCode >= 300 &&
                targetSession.responseCode < 400)
            {
                // Use GetRedirectTargetURL rather than examining the
                // "Location" header, as some sites (illegally) might
                // use a relative URL for the header (per Eric Lawrence).
                targetUrl = targetSession.GetRedirectTargetURL();
            }
            else
            {
                responseCode = targetSession.responseCode;
            }
        }
    };

    // Note that we're using the ResponseHeadersAvailable event so
    // as to avoid a race condition with the browser (per Eric
    // Lawrence).
    FiddlerApplication.ResponseHeadersAvailable += responseHandler;

    // Yes, we're hard-coding a 10 second timeout here. Don't worry, we'll
    // make that configurable before we're done.
    DateTime endTime = DateTime.Now.Add(TimeSpan.FromSeconds(10));
    element.Click();
    while (responseCode == 0 && DateTime.Now < endTime)
    {
        System.Threading.Thread.Sleep(100);
    }

    FiddlerApplication.ResponseHeadersAvailable -= responseHandler;
    return responseCode;
}
All that remains is to modify our WebDriver code to call our new methods instead of the standard WebDriver ones, and add some console logging to prove that we get actual status codes returned from our methods. That modifies our TestStatusCodes method to look like this:
private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";
    int responseCode = NavigateTo(driver, url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);

    string elementId = "firefox-promo-link";
    IWebElement element = driver.FindElement(By.Id(elementId));
    responseCode = ClickNavigate(element);
    Console.WriteLine("Element click returned response code {0}",
                      responseCode);

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    responseCode = NavigateTo(driver, url);
    Console.WriteLine("Navigation to {0} returned response code {1}",
                      url, responseCode);
}
Running our console application from last time, we now will receive output that looks like the following:
Starting Fiddler proxy
Fiddler proxy listening on port 62594
Navigating to http://www.mozilla.org/
Navigation to http://www.mozilla.org/ returned response code 301
Clicking on element with ID firefox-promo-link
Element click returned response code 200
Navigating to http://www.mozilla.org/en-US/doesnotexist.html
Navigation to http://www.mozilla.org/en-US/doesnotexist.html returned response code 404
Shutting down Fiddler proxy
Complete! Press <Enter> to exit.
Now we have a fully functioning example for Firefox. Next time, we'll add the code to make it cross-browser aware, and add a few more tricks to make it more elegant for use with WebDriver.

Friday, August 9, 2013

Implementing HTTP Status Codes in WebDriver, Part 1: Challenge Accepted

A while back, I wrote a post that discussed at length why HTTP status codes are not present in the WebDriver API. Furthermore, the post went on to explain why I believe they're not needed in the API, and that there are other tools better suited to retrieving this particular piece of esoterica. Since I wrote that article, other Selenium contributors have written about the same topic. The general premise of those blog posts and mine is that using a proxy is the proper way to capture the status code if you actually require it.

Nevertheless, the issue in the Selenium issue tracker that was the inspiration for those blog posts continues to receive comments, most of them vehemently opposed to the decision of the Selenium development team. The decision has been called "complete nonsense," "silly," "condescending," and "simply defective," among other choice phrases. My colleagues on the Selenium project have posted code samples that show how to effectively use a proxy with Selenium to solve this problem, and the response to those samples has been that they aren't detailed enough.

Alright, fine. Time to put my proverbial money where my mouth is. I recently looked into what it would take to actually implement a proxy solution, with correct return of HTTP status codes, including writing all of the code necessary to extract it. It doesn't take that much, as it turns out. Once I'd settled on a technology to use, I had a full working example in about 4 hours. Let's take a look at how this would work.

In my example, I decided to use the Mozilla website, http://www.mozilla.org/, as my test. I settled on this because the site isn't likely to disappear anytime soon, and as currently written, it nicely demonstrates some of the issues inherent in getting HTTP status codes. Please note that I don't own the website, so it's possible that these examples could break at any time after this writing; at some point, I'll look at creating a standalone website that illustrates the same concepts. I'm also going to be using the WebDriver .NET bindings, and specifically, version 2.35.0 of the .NET bindings. For the proxy component, I decided to use Eric Lawrence's (now Telerik's) excellent Fiddler proxy.

Let's talk for a moment about why I chose Fiddler. First, I'm a .NET guy, and I try to look for solutions that don't require another runtime (like Java, Ruby, or Python) if possible. Second, Fiddler offers me the FiddlerCore component, which allows me to use an API-only version of Fiddler, and programmatic access to all of the proxy's settings and data. The API may be a little less polished than other .NET component APIs, but it does use an event-driven model, which appeals to me as a .NET developer. While Fiddler isn't open-source, it is free to use, with no feature restrictions based on free vs. paid use. With all of that in mind, let's begin. Here's the basic code that I want to run, using the standard WebDriver API:
private static void TestStatusCodes(IWebDriver driver)
{
    // Using Mozilla's main page, because it demonstrates some of
    // the potential problems with HTTP status code retrieval, and
    // why there is not a one-size-fits-all approach to it.
    string url = "http://www.mozilla.org/";
    driver.Navigate().GoToUrl(url);

    string elementId = "firefox-promo-link";
    IWebElement element = driver.FindElement(By.Id(elementId));
    element.Click();

    // Demonstrates navigating to a 404 page.
    url = "http://www.mozilla.org/en-US/doesnotexist.html";
    driver.Navigate().GoToUrl(url);
}
I'll be running this method from within a console application, with the main method looking something like this:
static void Main(string[] args)
{
    // Eventually, we will use different browsers to prove this
    // solution works cross-browser, but for now, we will use
    // Firefox only.
    IWebDriver driver = new FirefoxDriver();

    TestStatusCodes(driver);

    driver.Quit();

    Console.WriteLine("Complete! Press <Enter> to exit.");
    Console.ReadLine();
}
Let's look at how to integrate Fiddler in this solution. Starting the proxy server couldn't be easier. We'll create a method to do this. One thing to note in the method is that we can either specify a port for the proxy to listen on, or let Fiddler pick one for us.
private static int StartFiddlerProxy(int desiredPort)
{
    // We explicitly do *NOT* want to register this running Fiddler
    // instance as the system proxy. This lets us keep isolation.
    Console.WriteLine("Starting Fiddler proxy");
    FiddlerCoreStartupFlags flags = FiddlerCoreStartupFlags.Default &
                                    ~FiddlerCoreStartupFlags.RegisterAsSystemProxy;

    FiddlerApplication.Startup(desiredPort, flags);
    int proxyPort = FiddlerApplication.oProxy.ListenPort;
    Console.WriteLine("Fiddler proxy listening on port {0}", proxyPort);
    return proxyPort;
}
Technically speaking, we probably don't need to shut down the proxy, since it's the last thing we do before our main method exits, but we're going to be a good citizen and shut it down anyway.
private static int StopFiddlerProxy()
{
    Console.WriteLine("Shutting down Fiddler proxy");
    FiddlerApplication.Shutdown();
}
All that remains is to hook up an event handler so that we can analyze the traffic that comes through the proxy, and to make the Firefox driver aware of the proxy. We can do those things within the context of our main method. After all of these, the final main method looks like this:
static void Main(string[] args)
{
    // Note that we're using a desired port of 0, which tells
    // Fiddler to select a random available port to listen on.
    int proxyPort = StartFiddlerProxy(0);

    // Hook up the event for monitoring proxied traffic.
    FiddlerApplication.AfterSessionComplete += delegate(Session targetSession)
    {
        Console.WriteLine("Requested resource from URL {0}",
                          targetSession.fullUrl);
    };

    // We are only proxying HTTP traffic, but could just as easily
    // proxy HTTPS or FTP traffic.
    OpenQA.Selenium.Proxy proxy = new OpenQA.Selenium.Proxy();
    proxy.HttpProxy = string.Format("127.0.0.1:{0}", proxyPort);

    // Eventually, we will use different browsers to prove this
    // solution works cross-browser, but for now, we will use
    // Firefox only.
    FirefoxProfile profile = new FirefoxProfile();
    profile.SetProxyPreferences(proxy);
    IWebDriver driver = new FirefoxDriver(profile);

    TestStatusCodes(driver);

    driver.Quit();

    Console.WriteLine("Complete! Press <Enter> to exit.");
    Console.ReadLine();
}
When we run our console application, we see something like this:
Starting Fiddler proxy
Fiddler proxy listening on port 62492
Navigating to http://www.mozilla.org/
Requested resource from URL http://www.mozilla.org/
Requested resource from URL http://mozorg.cdn.mozilla.net/media/css/tabzilla-min.css?build=c2a3f7a
Requested resource from URL http://mozorg.cdn.mozilla.net/media/js/site-min.js?build=c2a3f7a
Requested resource from URL http://mozorg.cdn.mozilla.net/media/css/responsive-min.css?build=c2a3f7a
Requested resource from URL http://mozorg.cdn.mozilla.net/media/img/favicon.ico
Requested resource from URL http://www.mozilla.org/en-US/

[... 
Many resources deleted for brevity
...]

Clicking on element with ID firefox-promo-link
Requested resource from URL http://mozorg.cdn.mozilla.net/media/fonts/Vollkorn-Regular-webfont.woff
Requested resource from URL http://mozorg.cdn.mozilla.net/media/fonts/Vollkorn-Bold-webfont.woff
Requested resource from URL http://mozorg.cdn.mozilla.net/media/img/home/promo/flicks/760.jpg
Requested resource from URL http://mozorg.cdn.mozilla.net/media/img/home/promo/android/760.jpg?2013-06
Requested resource from URL http://mozorg.cdn.mozilla.net/media/img/home/promo/makerparty/760.jpg
Navigating to http://www.mozilla.org/en-US/doesnotexist.html
Requested resource from URL http://www.mozilla.org/firefox/
Requested resource from URL http://www.mozilla.org/en-US/firefox/
Requested resource from URL http://mozorg.cdn.mozilla.net/media/css/firefox_fx-min.css?build=c2a3f7a
Requested resource from URL http://mozorg.cdn.mozilla.net/media/img/firefox/template/header-logo.png?2013-06
Requested resource from URL http://www.mozilla.org/en-US/firefox/fx/ 

[... 
Many resources deleted for brevity
...]

Shutting down Fiddler proxy
Complete! Press <Enter> to exit. 
Obviously, this particular example doesn't get us to our desired goal just yet. However, it does allow us to hook up a proxy. Next time, I'll show you how we can refine this solution to actually extract those status codes.