Friday, June 22, 2012

What's Wrong With the Internet Explorer Driver?

The Internet Explorer driver in the Selenium WebDriver project has consumed far too much of my life over the last two years, which is when I first undertook investigating rewriting the driver to repair some of its shortcomings. During that time, I've learned far more about COM programming in the C/C++ world than I'd ever known before, probably far more than I ever cared to know. The driver has been in widespread use as part of the regular Selenium releases for nearly 18 months now, and I think it's time to take stock once and for all about the currently known issues with so-called "native events" used to interact with elements in the IE driver.

My purpose in bringing this up is twofold. First, it's a good place to acknowledge where we still have to come in order to make the IE driver as good as possible for the users of Selenium. Secondly, it gives me the chance to reiterate the open invitation that I've always had for anyone who'd like to review the code of the IE driver, make improvements, and submit patches. Before launching into these challenges, I need to cover a few basic assumptions about why the driver is architected the way it is.

There are a few main principles of the WebDriver project that directly impact the decisions of how the IE driver is built. They are, in roughly priority order:
  1. The driver should be installable with an xcopy deployment mechanism, and be capable of execution without elevating to admin permissions. Sadly, a significant percentage of the users of Selenium do not have admin access to their Windows machines, which precludes the use of a plugin (Browser Helper Object or BHO) for IE, as these require admin access to the registry at the very least.
  2. The driver should use "native events" to interact with elements. This means using OS-level mechanisms to simulate user keyboard and mouse inputs. These would be contrasted by "synthetic events" which are JavaScript simulations of those inputs, which have challenges with accuracy of simulation and fidelity of operation.
  3. The driver should not require the browser instance being automated to be the focused window in the OS. It is expressly a goal of WebDriver that developers running WebDriver code will be free to use their machines for other purposes even while that code is running in the background.

Given these three requirements then, there are two major problems with using native events in the IE driver. The first of these is that mouse clicks to the IE window get swallowed up when the browser window does not have the system focus. In this case, the element in question has a focus rectangle around it, but no click appears to have happened.

Screenshot of an element click into a background IE window
Let me say a brief word about how native events are done in the IE driver. The driver currently works by sending Windows messages to the IE window being driven. We know this is not necessarily the best approach, and that the "correct" way to simulate input for the IE window would be using the SendInput() API. However, this would require the browser window to have focus, which opens up all manner of issues, particularly if you have two InternetExplorerDriver instances attempting to manipulate pages at the same time. 

The "flashing hover" problem
The second issue is with mouse hovers using native events. The symptoms of this problem are that when you execute a mouse hover using the Actions class of the IE driver, and your physical mouse pointer is within the bounds of the browser window, the menu will flash and immediately disappear. As near as I've been able to determine, IE is doing some sort of hit-testing (probably calling the WindowFromPoint() API) when it receives a WM_MOUSEMOVE message which redraws the canvas if the location of the physical cursor is detected inside the window boundaries.

Solving the above two problems would go a long way toward the long-term stability of the IE driver. Alternatively, if the IE team at Microsoft took over the maintenance of the driver, as the teams responsible for Chrome, Opera, and soon Firefox have done, then the true experts in the architecture of the Internet Explorer browser could bring all of their experience and expertise to bear, and help make Internet Explorer a first-class citizen in the world of automated web testing.
 

In fact, that would be my challenge to the IE team at Microsoft: step up, and contribute code to the IE driver. Take the existing code, and do something great and awe-inspiring with it. Become a leader in this space, instead of the lagging follower you currently are.

9 comments:

  1. Jim,
    That brings an interesting question. Do the folks in Microsoft know about this ? Pardon me if this question sounds stupid, but have they been contacted regarding this ?

    ~ Krishnan

    ReplyDelete
    Replies
    1. We have, at various times, reached out to Microsoft in regards to producing the IE driver, and the issues we've encountered therein. Until recently, we've gotten very little traction within the IE team itself. There has been some movement within other groups at Microsoft however, so perhaps we will see some gains in the relatively near future.

      Delete
  2. Hi Jim,

    Is this principle of the WebDriver project the main reason behind the 2 problems? "The driver should not require the browser instance being automated to be the focused window in the OS". If so, is it a must-have principle?

    We're using other test automation tools which don't have this principle, but run good on IE. From our point of view, it is more important to be able to automate our tests on IE, than being able to run our tests while we're working on the same machine. We're willing to dedicate machines (of labs) to test automation!

    If this principle is the only reason, we can argue it in the right place..

    ReplyDelete
    Replies
    1. The "principle of no window focus" is the primary cause of the issues we've seen. It is an important design principle of the WebDriver project, and is codified in the WebDriver W3C standard (see http://www.w3.org/TR/webdriver/#running-without-window-focus). While I appreciate your point of view as to the priority of running without window focus, please understand that not every user of WebDriver shares it.

      Delete
  3. The no focus Issue in IE is a real pain in the proverbial. The two hacks I have deployed with some success are as follows.

    using driver.SwitchTo() to get a handle on the browser window. As The IE driver seems to lose focus after clicking any alert /pop up window.

    using javascript driver.JavaScript("window.focus()");
    The javascript() method is an extension / convenience method that uses the IJavaScriptExecutor.

    By the way Jim any update from the guys at Microsoft on IE driver ?

    ReplyDelete
    Replies
    1. When there's something to update that I can share about Microsoft's participation in the IE driver, I'll be sure to post it publicly.

      Delete
  4. Hi Jim,

    I was wondering what are the pitfalls of disabling the native events in IEDriver.

    Can you please name a few issues which comes when running the tests with native events disabled?

    -Manjeet

    ReplyDelete
  5. Iam using Selenium Webdriver + TestNG + Firefox 17.0.5 ESR

    Scripts which running successfully in Firefox 17.0.5 ESR
    But not working fine in Internet Explorer 8, while running in IE8 the pages are filckering and stops in the middle of the execution.
    Any suggestions.

    Regards
    Sureshkumar

    ReplyDelete