Wednesday, July 25, 2012

WebDriver: Y U NO HAVE HTTP Status Codes?!

There's a long-standing issue in the Selenium issue tracker dealing with the fact that the WebDriver API does not expose HTTP status codes to the user. For those of you who like to keep score at home, it's issue #141 in the tracker. The issue was opened in February, 2009, and was closed as "Won't Fix" in December of that year. Despite the finality of many project contributors and the main project architect saying that this feature will not be made available, it has continued to garner comments, many vehement, for reconsideration. I'm going to try to spend a little bit making what I hope is a reasoned, rational argument why this decision was made, and why the feature isn't needed in the project.

How Did We Get Here?

What follows is an oversimplified brief recap of some history. In the beginning was Selenium RC, which was an API that grew organically during its existence, with no rhyme or reason for how methods were tacked onto its single object. Over time, it became a brilliant example of the God object anti-pattern, violating all kinds of object-oriented programming principles. One of the more obscure methods engendered by the organic growth of the project was called captureNetworkTraffic(). This method purported to capture all of the network traffic between the browser and the site being automated, and was made possible because in some browser configurations, Selenium acted as a proxy between the browser and the site being automated, thus all browser traffic passed through Selenium, and could be captured or manipulated.

And so it came to pass that Selenium RC was found wanting, and lo, there was much weeping and wailing and gnashing of teeth in the browser automation community. And thus it was that the WebDriver project was born, and eventually merged into the Selenium project to become Selenium WebDriver. WebDriver was a completely different approach to browser automation, preferring to act more like a user, which solved the fundamental problems inherent in the Selenium RC approach. However, since much of the actual driving of the browser would now be done external to the browser itself, and with no proxy in between the browser and the site being automated, it made creating a method similar to captureNetworkTraffic() difficult at best, and impossible at worst. 

Where Are We Now?

This brings us to the current state of affairs, with HTTP status codes being unavailable in the WebDriver API. An architectural decision was made during the creation and development of the WebDriver API that this feature would not be implemented, and that it would be declared "out of scope" for Selenium WebDriver. This has caused some consternation among users, especially since the issue has been closed, and it's been said that the feature won't be implemented, and been said so in extremely plain, even blunt, terms. There are some valid technical reasons why this decision was made. Here are just a few of them:
  • What about redirects? Do you just return the last code after the redirect, or the code that indicates a redirect? What do you say to those who disagree with your answer to that question?
  • Some browsers make it impossible to get the status code. Do you really want an API that works on some browsers, but not others?
  • WebDriver is concerned with driving the browser, not necessarily just web application testing. Does returning HTTP status codes really fit this mission?
  • WebDriver is concerned with driving the browser as a user would. Does a browser show HTTP status codes to the user, or just rendered pages?

Nevertheless, that hasn't stopped people from arguing that it should be included. Let's take a few minutes to examine some of those arguments, and then we'll take a look at what options there are for solving this issue.

"HTTP status codes are an important part of website testing."

Yes, I can see the argument that HTTP status codes could be an important part of testing your website. However, while web application testing is an important use case of the Selenium project, it's far from the only one. A frequent response to this is, "But your own web pages and literature say it's for web application testing!" That's a fair critique about documentation and public perception of the project, but that's really a separate discussion. HTTP status codes are not a required part of automating the browser.

"I don't care that it doesn't work on all browsers, I need those status codes."

One of the major advantages of the WebDriver API over what has come before is its elegance and purity. Implementing a solution that works in only some browsers is vastly inferior to one that works for all browsers, especially when the solution isn't in the core competency of the library. There are solutions that do work for all browsers without polluting the API, albeit they require integration with other tools that <gasp> are not Selenium. Tacking on this feature for some browsers in a suboptimal way that might miss important edge cases because it's not a core competency is akin to driving a wood screw into a board with a hammer because that's the only tool you have. Yes, it'll work, but it's not going to be pretty, and is likely to fail you somewhere down the line.

"Other parts of the WebDriver API let you do things a user can't do, why not expose status codes?"

Proponents of this argument usually cite manipulation of cookies, or finding of elements by anything other than visual inspection, or viewing a page's HTML source, or any number of other items. All of those items are germane to interacting with the page itself, with what is displayed. The HTTP status code is not directly concerned with what is displayed on the page.

"But so many people want the feature, you should really add it."

Ah, yes, the old, "But I want a pony," argument. Or it's corollary, "20 million New Kids On The Block fans can't be wrong!" This argument is occasionally followed by a sometimes hostile, "Well, your lack of this feature makes your library completely useless to me," or even, "You'd better add it or else I'll stop using it!" This latter response is the equivalent of, "I'm going to hold my breath until you give me exactly what I want!" Just because something's popular doesn't make it a good idea.

Where Do We Go From Here?

Just as proponents of wanting to see HTTP status codes in the WebDriver API are convinced that the arguments against including it don't hold water, the members of the project team are equally convinced that adding it is a bad idea. So if you feel like you absolutely need to have them, what can you do? Well, you have a couple of options.

Remember how I said earlier that Selenium RC was able to give you this information because it acted as a proxy? You can recreate that exact same environment with WebDriver! Of course, you have to use a dedicated proxy to do it, but guess what? There are lots of software proxies around that make this really easy. Additionally, you're using the proxy to capture the traffic, not something half-baked that's been shoehorned into the WebDriver API, which is a little thing I like to call using the right tool for the job. As an example, the BrowserMob proxy is one that's being used successfully by lots of people. It's open-source, and it's written in Java, so if you're using Java, you can even control it directly from your existing code. If you're not using Java, fear not, as there are wrapper libraries written for many different languages, including Python, .NET, Ruby, and PHP, to name a few. The WebDriver library even allows you to set the browser you're automating to use the proxy.

"But I don't want to use a proxy! I only want to have to manage Selenium as a dependency," I hear you say. I've got a little secret for you: WebDriver is Open Source Software. It's even very liberally licensed. If you don't like a decision made by the project team, fork it, create a patch, and share with the world. I personally love innovation in the Open Source world. Don't tell me how you want to see it done, show me, with working code. Unless you can demonstrate your solution working for all browsers though, I'll probably use a proxy if I need this functionality, and that's my choice.