Thursday, November 1, 2012

.NET Bindings: Whaddaymean "No response for URL"?

The Selenium project just pushed out version 2.26.0 for all of its languages, after a months-long hiatus between releases. The delay wasn't intentional, but it happened, so it's been awhile since the bindings were updated. As usual, the release was accompanied by a post to the user-facing mailing lists for the project. Also as usual, the first reply was asking a question about what didn't get fixed.

The issue, number 3719 in the issue tracker, involves the .NET bindings returning an intermittent failure for some operations. The message text of this error reads, "No response from server for url". The post to the mailing list basically asked why this issue wasn't fixed in the recent release. I had to struggle quite a bit not to summon the snark and respond,
"Sigh. How would you like this particular issue to be fixed?"
The .NET bindings use the .NET Framework's System.Net.HttpWebRequest class for communicating with a remote server that speaks the WebDriver JSON Wire Protocol. I must carefully note here that the term "remote server" can refer to many things. It can refer to an instance of the Java remote WebDriver server. It can also refer to an instance of IEDriverServer.exe or chromedriver.exe, the main components of the Internet Explorer and Chrome drivers, respectively. It can also refer to an instance of the Firefox extension that the FirefoxDriver uses internally to control Firefox. Note that in any of these cases, the "remote server" may, in fact, be running on the same machine as the client bindings. At present almost all of the driver implementations use this architecture of a server component running an HTTP server talking to the client bindings.

So the .NET bindings use the HttpWebRequest class to initiate command with the "server" component. To get the response back from the HTTP server, we call the GetResponse() method. Now, in the normal case, everything is just fine. The bindings get a valid response back from the server, interpret that response, and everything moves right along. Sometimes, the method throws a System.Net.WebException, like if the server is unreachable or the like. The bindings know about that possibility and catch the exception. The exception even has a .Response property on it to allow the bindings to continue to use a valid System.Net.HttpWebResponse to interpret what the remote WebDriver HTTP server is trying to say.

Sometimes, however, the HTTP server doesn't return any response, and it doesn't throw an exception. It just goes off into the aether, never to return. In that case, our response object is null, and here's the real question: What do you expect the .NET bindings to do in that case? The bindings have no idea of the status of the immediately preceding request. They don't know if it succeeded or failed. They don't know if the server is even still breathing or not. That means blindly attempting a retry would be futile at best, and destructive at worst. All the bindings can do is say, "Hey, we sent off a request, like you asked us to, but we didn't get a response back. Don't know what else to tell you, we tried. Sorry it didn't work out."

The worst part about this is that it looks like the bindings are at fault. The bindings are only reporting what happened, and I'm not sure what other approach any sane client could possibly do. Of course, you reading this may disagree. If so, and you have concrete ideas how to solve the problem that don't involve a blind retry or a complete rewrite of the .NET Framework's System.Net.HttpWebRequest class, I'd love to see the implementation. Show me the code; I love receiving patches.

18 comments:

  1. Jim,
    That was probably THE MOST ELABORATE AND PICTURESQUE explanation to this issue. No one could have put it better than how you have described. Thank you so much for taking the time out to "rant" about it in a blog and still ensure that the person who reads it, doesn't sense one bit of a rant but instead reads it as a nice little tutorial which explains genuinely what is going on behind the scenes.. !

    ReplyDelete
  2. It is just awesome info to explain it out this fix has been saviour for me

    ReplyDelete
  3. Thanks for the explanation, Jim. I never realized what exactly the issue was, and understanding it makes it less annoying.

    In my environment, I found that moving from an old Celeron box to a brand new quad-core made the issues disappear. Is it reasonable to assume that the reason for the server not responding was resource-related?

    ReplyDelete
    Replies
    1. I suppose it could be. I really wouldn't want to speculate too much on reasons why the server wouldn't fully respond to the HTTP request sent by the .NET bindings.

      Delete
  4. Thanks, Jim! Your explanation helped a lot as I've dealt with this issue for a while now and never really got the right idea on what is behind it. In this context, the behavior of the bindings makes totally sense and I would not change it.

    I noticed while debugging one of my last failures that the browser on the test client machine was in an unresponsive state when I logged in right after a failure to see what happened. Since it is not uncommon on a Windows system that the browser intermittently does not respond for a short while, maybe that could be a contributing factor to this issue. It would explain the random occurrences of the missing response, where the same test case passes just moments later during another test run. And by the time someone takes a look at the client machine, the browser "recovered" and functions normally. If the NULL response only occurs in the .NET bindings but not in Java, I wonder if it is a behavior of the System.Net.HttpWebRequest to handle a timeout/stalled request in the way you are experiencing it, which makes me wonder if there are configuration settings (or properties on the HttpWebRequest) that can be set to change .NET's handling.

    Anyways, just wanted to share some thoughts as I am curious to hear what others think. Thanks again for the detailed post.

    ReplyDelete
  5. I think the understanding of most people who encounter this error is, that the error should not occur at all and that hence there has to be a bug somewhere in the system.

    If i run a test locally on a dedicated machine it seems unlikely that the message to the Selenium server was lost in the network as there is none, nor should the server be unable to respond as the machine is doing little else than running a browser, the selenium server and the test suite.

    Some say the Selenium server cannot handle too frequent requests, if that is the case should not all calls to driver methods be blocking (somewhat like page navigation) to avoid any "congestion" which may cause missing responses?

    ReplyDelete
    Replies
    1. You're right, in an ideal world, this should never happen, but I think you're missing the point a little. This problem doesn't necessarily happen with "the Selenium server". It may happen with chromedriver.exe, or with IEDriverServer.exe, or with the extension that WebDriver uses to drive Firefox. That is one of the things I was trying to communicate in this blog post.

      Another misconception is that there's *always* interaction with the network stack. The .NET bindings (and the other client bindings too) use an HTTP client to communicate with the "server" component. As such, you *must* interact with the network stack. True, you may be only communicating with localhost, and the network stack may take pains to optimize that communication channel, but you're nevertheless interacting with the network. While it's unlikely the message was "lost in the network," it *is* possible that the server component didn't respond. Don't forget, either, that some of those components are not even created by the Selenium project, and that trend is only likely to continue. This is another point I was trying to take pains to point out in the blog post.

      The point is that it's not a problem in the .NET bindings per se, which is what most people assume, and what they complain about. Instead, it's more likely either a problem with whatever "server" component is being used, or it's a problem deeply embedded in the core .NET Framework System.Net.HttpWebRequest class. Either way, I don't see any way to right the problem from within the .NET bindings themselves.

      As always, though, I'm happy to be proven wrong. As I said in the last paragraph of the post, show me where to correct the code, if I'm wrong; I love patches.

      Delete
    2. By the way, this problem went away for me when i updated from v2.25.1 to v.2.28.0, so along the way something relevant must have been changed.

      No idea why i was still using 2.25.1 back then...

      Maybe the 2.26 change "Bumping .NET HttpWebRequest service point connection limit to 2000." was relevant.

      Delete
  6. Thanks for explanation! However, looking from end-user point of view the issue must be fixed. What's the point of a testing tool if it's not reliable? We had to disable our 40+ webdriver tests on CI and stopped writing new ones :(

    Can we narrow down the source of a problem? Can we say that it's not the bug in the browser? If yes and we're talking about FF it's either webdriver extension or the .NET bindings, right? Is the problem reported on platforms other than .NET? If no, we can say then webdriver is OK, right?

    p.s.
    Did you try new System.Net.Http.HttpClient from .NET 4.5?

    ReplyDelete
    Replies
    1. You're asking all of the right questions. The first step to "narrow down the source of the problem" is having a consistently reproducible case. To date, no one has been able to provide one. I've never personally encountered the issue in a production environment, so I'm in no position to offer any further solutions. If you can provide a reproducible case, or can provide a solution to the problem, please share it with me. I've said it twice before on this page alone (once in the post, once in a reply to another comment), I'll be very happy to receive any patches that might mitigate the issue.

      Regarding .NET 4.5, I've not yet used it. I don't have a development environment that will allow me to do so, as the Visual Studio Express Editions are not sufficient for my work on WebDriver (I require the Professional Edition for my work on the IE driver). Feel free to try it out and report back your findings.

      Delete
  7. hi Jim,

    Thanks for that detail information:)
    atleast now i am feeling better that is not fault of my code:)

    but if you can help me to resolve my issue, it will be gr8

    In my application i am trying to automate ssrs report.
    for that 1st i am trying to find all the open brower using window handler. and try to find report viewwer page.

    var window = ApplicationInstanse.Driver.SwitchTo().Window(windowHandle);
    if (window.Title.Contains("Report Viewer"))
    {
    popup = window;

    break;
    }

    but its throwing error for reportviewr page or you can say not able to locate that page

    error :"No response for URL

    ReplyDelete
    Replies
    1. This is a issue with the ChromeDriver. Afortunately it was fixed with the last version of ChromeDriver.

      Delete
  8. Have you tried Disposing your HttpWebResponse instances properly? I saw an issue on our Live boxes where that class no longer responded to requests as it had an internal limit of X concurrent requests.

    If this is the class: https://github.com/SeleniumHQ/selenium/blob/master/dotnet/src/WebDriver/Remote/HttpCommandExecutor.cs

    There look to be a few using's required.

    ReplyDelete
  9. Thanks Jim! This helped me understand why one of my tests over Microsoft Test Manager was failing -- it was a proxy issue and not a code issue. Super helpful post!

    ReplyDelete
  10. Sauce labs will create a job via the HTTP port but if the JSON packets are blocked by a network issue then your local test will appear to hang infinitely. The job is created but it does nothing and there is no error in your test output. The client side of Selenium doesn't seem to be able to timeout after trying to send JSON packets and failing. It will hang infinitely. Maybe the RemoteWebDriver class could be extended to timeout and fail the test with a human readable message after failing to deliver JSON packets within 30 seconds?

    ReplyDelete
    Replies
    1. If you're seeing infinite hangs when using the RemoteWebDriver class, you've either done something to change the default timeout of the .NET bindings, or you've uncovered a previously unknown bug in either the .NET Framework's HttpWebRequest or HttpWebResponse class. The request timeout is already set for every HTTP request issued by the RemoteWebDriver class at 60 seconds. If those classes don't throw exceptions or properly return, I'm not sure how you'd expect the .NET bindings to monitor or notify on those conditions. I certainly don't think that rewriting the .NET Framework classes within the WebDriver project is a viable solution.

      Delete
  11. Hi Jim. Thanks for the post; it helped me understand conceptually what's going on. I have a question. You claim that this post applies to the error message "No response from server for url", and you linked to issue 3719, but is this same issue also the underlying cause of the following error message?

    WebDriverException was unhandled
    The HTTP request to the remote WebDriver server for URL http://localhost:62179/session/db56b14a-0e9e-4831-b545-4aaf5093d928/element/fe109bb3-eb3a-4b6d-8850-1837c91ecc02/click timed out after 60 seconds.

    The issue seems conceptually similar, but since the actual error message is somewhat different, I wanted to confirm whether this is the same issue. My problem is very similar to issue 5071, but unfortunately, I am in the same boat as that poster: try as I might, I can't reproduce it on a public page.

    ReplyDelete
  12. Hi Jim. This makes absolute sense. My client application did not receive a response from ChromeDriver and I received this same error message. We also started seeing errors like "socket could not be performed because the system lacked sufficient buffer space or because a queue was full" from my client app and other apps running on the server.

    This pointed to some memory leak or i.e. TCP/IP port exhaustion. It ended up being a 3rd party tool we have on our server that connects to Amazon S3. So it just makes sense that anything preventing the WebDriver from connecting or providing a response would cause this...

    ReplyDelete