I'm very proud to announce the 4.0-alpha2 release of the Selenium .NET bindings! There are several exciting things to look forward to in this release. The first is the fixing of several issues that have cropped up because beginning with version 75, Chrome (and ChromeDriver) now use the W3C WebDriver Specification as the default protocol dialect for communication between Selenium and the driver. This has led to a few issues like the loggingPrefs capability being renamed, and the legacy logging APIs (driver.Manage().Logs) no longer working. This functionality should be restored in this release.
By far, the biggest addition to this release of the .NET bindings is the addition of integration with the Chrome DevTools Protocol (CDP). We are extremely excited to be able to bring this feature to users of Selenium. It allows users, using their existing WebDriver instance, to instantiate and use a CDP session, including the two-way communication of events. The .NET API for using CDP is still experimental, and may change between releases until the alpha/beta period is over. But to whet your appetite, here's what code to use this looks like in 4.0.0-alpha02:
The API uses the .NET System.Net.WebSockets.ClientWebSocket implementation for communication with Chrome, which means it's limited to use with Windows 8.1 and above. This is a limitation of the WebSocket implementation, so complaints about that should be directed toward Microsoft. Accordingly, most of the CDP API is async, though the remainder of the WebDriver API is not.
Also, for the moment, the .NET bindings do not implement domains marked as "experimental" in the protocol definition. One thing that we really do not want for Selenium is for it to be tied down to specific versions of Chrome. Since the DevTools Protocol is not subject to any standard, and can be changed at the whim of the Chromium developers, this seems like a potentially suboptimal solution if we do that.
The CDP integration is something we'd really like to get feedback on, so give it a whirl, and let us know what you think.
I hope the preceding series of posts has been useful. To wrap things up, I want to share a GitHub repository that contains sample code for each of the items we've discussed. It includes an ASP.NET Core demo web site that implements Basic, Digest, and NTLM authentication. It includes sample Selenium code using BenderProxy (version 1.1.2 or later) and PassedBall (version 1.2.0 or later) to automate the site. The Selenium code runs in a console application, which will await you pressing the Enter key before shutting down the proxy and quitting the browser. This will allow you to see the state of the browser before everything quits. Other features of the sample repo include working factory classes for Selenium sessions and the demo cases themselves.
To make the demo in the source repo properly work, you must run it on Windows, because we are enabling NTLM authentication. Also, you will need administrative access on your Windows machine, which is unfortunate, but there is no other way to get the development web server to listen on a host name other than "localhost". If you change the test to navigate to the site on "localhost", the browser will likely bypass the proxy, because most browsers bypass proxies browsing on localhost without taking other configuration steps. By default, the demo project uses www.seleniumhq-test.test and port 5000, but you can use whatever you want. Here's how to configure your test environment so that the demo app will work properly:
From an elevated ("Run as Administrator") command prompt, edit your hosts file to contain a mapped entry for the host you wish to use. The hosts file can be edited in any text editor, including Notepad, so the following command will open it:
notepad.exe %WinDir%\System32\drivers\etc\hosts
Once open, add the following line:
127.0.0.1 <host name>
Be sure to substitute your preferred host name for <host name> . Save and close the hosts file. As an aside, this is a very useful technique for Selenium code to simulate navigation to external sites without actually having to navigate outside one's local machine.
Also in the elevated command prompt, execute the following command:
netsh http add urlacl url="http://<host name>:<port>/" user=everyone
Be sure to substitute your preferred host name and port for <host name> and <port> respectively. You should see a message that the URL reservation was successfully added. Now, this is a dangerous command, because it does open up a URL reservation for everyone, so you don't want to leave this permanently in place. You can remove it at any time after you're done using the sample by using another elevated command prompt to execute:
netsh http remove urlacl url="http://<host name>:<port>/"
Once you've added the hosts file entry and the URL ACL, you're ready to load and run the authentication tests. Open the solution in Visual Studio 2019, and you should be able to build and run. When running, the solution runs a console application that will launch the test web app, start the proxy server, start a browser configured to use the proxy with Selenium, navigate to a protected URL for a specific authentication scheme, and then wait for the Enter key to be pressed. This will let you examine the browser to validate that, yes, the authentication succeeded You can also examine the diagnostic output written to the console by the test code, which describes the WWW-Authenticate and Authorization headers being used. Once you've validated to your satisfaction that in the browser really did authenticate using Selenium and without prompting the user, you can press Enter, which will quit the browser, stop the proxy server, and shut down the test web app. As an extra validation step, you can also start the test web app from Visual Studio and manually navigate to the URLs to validate that they really do prompt for credentials when browsed to.
Here's the Main method of the test app:
As you can see, you can change the browser being used (line 5 in the listing above), and the authentication type (line 9 in the listing above) being tested by changing the commented lines in the main method. If you decided to use a different host name or port, you can also change that by uncommenting and changing the appropriate lines (lines 15 and 16 in the listing above, respectively).
Hopefully, this series has given you some insights into how browsers perform authentication, and how it's possible to automate this using Selenium, without resorting to other UI automation tools. Happy coding!
Now that we've created a thorough intellectual framework for how to handle authentication requests using Selenium in combination with a web proxy, and thanks to our last post, we can handle more than Basic authentication, let's take things a step further, and see how you can use Selenium in automating pages secured with NTLM authentication. Before we can do that, though, we need to have an understanding of how NTLM authentication differs from the previous types of authentication we've used before.
NTLM authentication is a Microsoft-developed technology, originally implemented in the company's IIS web server product. It's not widely used on the public internet, but it does integrate nicely with things like Active Directory, so it can be quite useful for web applications used on company intranets that require security based on Active Directory credentials. This means that to provide sample code, we'll need to have a few things in place first. First, we'll need a test website that we can run locally, running on a server that implements NTLM authentication. Since we're working in C# in this series, we can create an ASP.NET Core web project to do that.
Second, we'll also need to host the application using Windows. Even though the ASP.NET Core project can run against .NET Core, and that can run on platforms other than Windows, we'll need to actually run on Windows to take advantage of NTLM authentication, unless we want to introduce a ton of complexity with Active Directory domains and the like (which we don't for this post).
Finally, most browsers bypass the use of a proxy when running strictly on localhost. This means that if you're running things all on the same system, you'll need to either configure the browser not to do this, or trick it into thinking the site the browser is connecting to isn't the local machine. The latter is far easier, since it only involves adding a line to the Windows hosts file (located at %WinDir%\System32\drivers\etc\hosts). On my test system, I've redirected www.seleniumhq-test.test to 127.0.0.1 by using the hosts file, and the sample code will reflect this.
NTLM authentication is a challenge-response based authentication scheme, and it differs from other HTTP authentication schemes in that it authenticates a connection, not an individual request. This means that the browser and server must support so-called "keep-alive," or persistent TCP connections between them. It also means that our proxy has to support persistent TCP connections, and must allow us to use that exact connection for making the requests. Fortunately, the proxy we've been using so far, BenderProxy, does support this.
The challenge-response mechanism used is complicated. Very complicated. So again, we'll be using the PassedBall library to parse authentication headers and generate authorization responses. It also requires multiple request/response round trips to perform the authentication handshake. Here's the implementation code for handling the NTLM authentication challenge for a sample site hosted on our local host machine:
Note carefully that the initial 401 Unauthorized response may contain multiple WWW-Authenticate headers, so one may need to make sure the proper one is being used to interpret the response. Browsers, when faced with this, will usually choose what they perceive to be the "strongest" authentication method. In our case, we need to do that determination for ourselves.
We'll wrap up this series with one more post, summing everything up.
In the last post in this series, we saw the general procedure for handling authentication requests with Selenium and a web proxy:
- Start the programmable proxy
- Start a Selenium session configuring the browser to use the proxy
- Wire up a method to intercept the 401 Unauthorized response
- Use the method to resend the request with the correct Authorization header value
As we noted previously, the use of the Basic HTTP authentication scheme is rather weak. There are other authentication schemes that don't require the sending of a password in plain text over the wire. One such case is HTTP Digest authentication. Let's see what that looks like. First, let's navigate to a page that implements Digest authentication, and examine what we see. As before, we'll use the hosted version of The Internet at http://the-internet.herokuapp.com/
Browser sends:
GET http://the-internet.herokuapp.com/digest_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Browser receives back:
HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: text/plain
Content-Length: 0
Www-Authenticate: Digest realm="Protected Area", nonce="MTU1ODkwNDI2MyBkYjYzMTA0ZTY0NmZjNmZhNDljNzQ2ZGY0ZTc3NDM4OA==", opaque="610a2ee688cda9e724885e23cd2cfdee", qop="auth"
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Sun, 26 May 2019 20:57:43 GMT
Via: 1.1 vegurTTP/1.1 200 OK
Note the value of the WWW-Authenticate header, which is considerably more complex than in the Basic authentication scheme case. The algorithm for figuring out the correct value for the Authorization header is likewise much more complex, which, in the simplest case, involves getting the MD5 hash of the string "userName:realm:password", then the MD5 hash of the HTTP verb and the URL of the resource being requested, then getting the Base64-encoded string of those two hashes along with the "nonce" value send in the authenticate header.
Whew. That's an awful lot to keep straight. Probably a little too complicated to post the code for resolving all of the nuances of it within this blog post. So it's time to introduce a new library to add to our toolbox for calculating the authorization header value for any of a variety of authentication methods. That library is called PassedBall, and it's available both on GitHub and as a NuGet package. Since PassedBall supports Digest authentication, and using the same process as in our previous post, here's the implementation of the method to intercept and resend the HTTP request:
Now that we have a library and generic framework for the generation of arbitrary authentication schemes, we'll look at one last approach for authentication, one that uses connection semantics for authentication, NTLM authentication.
As I mentioned in the immediately prior post in this series, the way to avoid having the browser prompt for credentials while using a Selenium test is by supplying the correct information in the Authorization header. Since Selenium's focus is automating the browser as close to how a user does so as possible, there's not a built-in way to examine or modify the headers. However, Selenium does make it very easy to configure the browser being automated to use a web proxy. A web proxy is a piece of software that stands between your browser and any request made of a web server, and can be made to examine, modify, or even block requests based on any number of rules. When configured to use a proxy, every request made by your browser flows through the proxy. Many businesses use proxies to ensure that only authorized resources are being accessed via business computers, or making sure that requests only come from authorized computers, or any number of other legitimate business purposes.
How do you configure your browser to use a proxy with Selenium? The code looks something like this:
Since we're Selenium users, we'll be using a proxy that allows us to programmatically start and stop it, and hook into the request/response chain via our code, and modify the results in order to interpret and replace the headers as needed. Any number of proxies could be used in this project. Many Selenium users have had great success using BrowserMob Proxy, or there are commercial options like Fiddler. Since I personally prefer FOSS options, and don't want to leave the .NET ecosystem, for our examples here, we'll be using BenderProxy. Here's the code for setting that up.
Now, how do we wire up the proper processing to mimic the browser's processing of an authentication prompt? We need to implement the addition of an Authorization header that provides the correct value, for the authentication scheme requested by the server. BenderProxy's OnResponseReceived handler happens after the response has been received from the web server, but before it's forwarded along to the browser for rendering. That gives us the opportunity to examine it, and resend another request with the proper credentials in the proper format. We're using the Basic authentication scheme in this example, and once again using The Internet sample application. Here's the code for the method:
Running the code, we'll see that when the Selenium code is run, the browser will show the authorized page, as we intended. As you can tell from the implementation code, Basic authentication is pretty simple, sending the Base64 encoding of "userName:passsword". Its simplicity is also one reason it's not used very often, as it sends the credentials across the wire, essentially in clear text. There are other, more secure authentication schemes available, and they can be automated in similar ways. The trick is knowing how to specify the value for the Authentication header. In the next post in the series, we'll look at another authentication mechanism, and how to handle something a little more complicated.
In order to understand how to use browser technologies to automate pages that use some form of authentication, it is useful to know what happens when you browse to such a page. What's actually happening when your browser prompts you for some form of credentials, usually a user name and password, before it will let you access a given resource on the web?
At the risk of dropping down to a ridiculously low level, let's talk about how browsers transfer data for browsing websites. First, an obligatory disclaimer. I'm going to deliberately gloss over using pages served via secure HTTP ("https"), and I'm going to ignore mostly-binary protocols like HTTP/2 for this series. Those items, while important, and may impact the outcomes you see here, are beyond the scope of this series.
Most of the time, a browser is using the Hypertext Transfer Protocol (or HTTP) to communicate with a given web server. When you type in a URL in your browser's address bar, your browser sends off an HTTP request (that's what the "http://" means at the beginning of the URL), and receives a response from the server. For the following examples, we'll be using Dave Haeffner's excellent Selenium-focused testing site, The Internet, which is designed to provide examples of challenging things a user might encounter when automating web pages with Selenium, and a hosted version of which is available at http://the-internet.herokuapp.com. Here's what a typical exchange might look like:
Browser sends:
GET http://the-internet.herokuapp.com/checkboxes HTTP/1.1
Host:the-internet.herokuapp.com
User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language:en-US,en;q=0.5
Accept-Encoding:gzip, deflate
Connection:keep-alive
Upgrade-Insecure-Requests:1
Browser receives back:
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Content-Length: 2008
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:44:54 GMT
Via: 1.1 vegur
<body of HTML page here>
This is what happens for virtually every time a browser makes a request for a resource. The important thing to note is in that first line of the response. The "200 OK" bit means that the server had the resource and was sending it in response to the request. Now let's look at a request for a resource that is protected by authentication:
Browser sends:
GET http://the-internet.herokuapp.com/basic_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Browser receives back:
HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Www-Authenticate: Basic realm="Restricted Area"
Content-Length: 15
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:52:24 GMT
Via: 1.1 vegur
Note the all-important first line of the response, which says "401 Unauthorized". That tells us that we have a page that requires authentication. Note that if you asked your browser to browse to the page http://the-internet.herokuapp.com/basic_auth, you would have been prompted for a user name and password. Note in the response the line that says Www-Authenticate: Basic realm="Restricted Area". That tells the browser that the "Basic" authentication scheme is expected, and that the user's user name and password are required, and so the browser prompts you, and then it re-sends the request to the server, but with an additional header. If you used the proper credentials for the aforementioned URL (user name: admin, password: admin), you'd see something like the following:
Browser sends:
GET http://the-internet.herokuapp.com/basic_auth HTTP/1.1
Host: the-internet.herokuapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Authorization: Basic YWRtaW46YWRtaW4=
Browser receives back:
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Content-Length: 1643
Server: WEBrick/1.3.1 (Ruby/2.2.5/2016-04-26)
Date: Thu, 23 May 2019 23:59:31 GMT
Via: 1.1 vegur
<body of HTML page here>
Clearly, that additional header that says Authorization: Basic YWRtaW46YWRtaW4= tells us that the browser must've done something with those credentials we gave it. If only we had a way to intercept the unauthorized response, calculate what needs to go into that authorization header, and resend the request before the browser had the chance to prompt us for credentials, we'd be golden. As luck (and technology) would have it, we do have exactly that ability, by using a web proxy. Every browser supports proxies, and Selenium makes it incredibly easy to use them with browsers being automated by it. The next post in this series will outline how to get that set up and working with Selenium.
When you've been using Selenium for UI-based testing by using browser automation as long as I have, you occasionally see questions repeat themselves. Nearly any website of any complexity has functionality that relies on user authentication. Nowadays, that's most often done with some sort of form-based UI, which creates a session that gets tracked by the browser via a cookie. The variations of those are wide, but they don't require any additional tooling to handle via Selenium.
However, every now and again, someone will post to one of the Selenium mailing lists or on the IRC/Slack channel asking about a site they have to automate that relies on the browser itself asking for the credentials with which to attempt authentication. Selenium is very good at automating pages within a browser, but makes little or no attempt at automating the parts of the browser outside the page being displayed, like file download selection dialogs, print dialogs, or, and most relevant here, browser-displayed credential dialogs. This usually means that the user tries to find another way to manipulate the dialog, and that often means a language-specific tool (like Java's Robot class) or a platform-specific one (like AutoIt), or an approach that once worked with browser, but has now been deprecated and disallowed by nearly all as a security risk (putting the user name and password directly in the URL). The challenge with these approaches is that they rarely scale well, and almost never work correctly in the remote or grid case.
This happens often enough that I'm going to start a series of posts about authentication, and how to effectively automate it with Selenium, without resorting to single-language or single-platform utilities. The last time I posted a series of blog posts about how to accomplish something using Selenium combined with other tools, one of the comments I got was that they were "not impressed it [took] three blog articles to explain how to accomplish" the task.
The blog post series in question could easily have been done in one post, with a simple code example, but without any explanation of what the code was actually doing. I don't believe that approach is worthwhile, and is actually detrimental to learning.
So let me lay this out ahead of time. The content of this post series could be done in a single post. It would be extraordinarily long, and a lot of it would end up saying, "go look at the sample code to get the full picture." I'd rather not take that approach. I'd much rather take my time, and at least attempt to give the relevant details in smaller, more quickly digested chunks. If you find that approach lacking, and would rather "just give me teh codez," I'll respectfully suggest heading elsewhere. As I add posts to the series, I'll try to keep an updated list of the parts of the series at the bottom of each post so that you can jump to the section you need.
One last thing, I'll be showing code using C#. Assuming you have similar libraries available in your language of choice, the code can be ported to other languages and libraries. Doing so in this space will be beyond the scope of this post series. Also, I'll be using a more verbose coding style than is possible with modern versions of C#. This is a stylistic choice for explicitness and clarity; you're welcome to use more modern syntax in your own code.
Posts in this series:
Prologue: This post
Part 1: How Does Browser Authentication Work Anyway?
Part 2: Using a Web Proxy for Basic Authentication
Part 3: Beyond Basic Authentication
Part 4: NTLM Authentication
Epilogue: Final Thoughts
I am very proud to announce the release of the first alpha version of the Selenium 4.0 .NET language bindings! These bindings have been years in the making, and are now available for the first time in alpha form. They are by no means finished, and new features will be available before release. Some things to note about the bindings:
- The bindings now only support .NET Framework 4.5 and above, and .NET Core 2.0 and above (via .NET Standard). This is to gain support for additional classes in the .NET Framework that are unavailable in previous versions of the framework.
- The internals of how the bindings communicate with the browser drivers has been completely rewritten to use System.Net.Http.HttpClient. I'm sure that something in this conversion has been missed, so there needs to be thorough testing of this.
- The bindings now only support the W3C WebDriver Specification dialect of the wire protocol. This simplifies the code for the .NET bindings considerably.
- Methods and classes that were marked with the Obsolete attribute in 3.141 of the .NET bindings have been fully removed. This includes the ExpectedConditions and PageFactory classes. If you want to continue to use those structures, the DotNetSeleniumExtras packages will be updated for the final release.
The complete list of changes is listed in the bindings' CHANGELOG. Please download and try out the bindings, and send your feedback. If you run into issues, you can file a new issue in the issue tracker at the Selenium project GitHub repository or you can contact me via Twitter (@jimevansmusic) or on the Selenium project's IRC or Slack channel. Happy automating!
Users using the IE driver when they do not have the ability to properly set the Protected Mode settings of the driver, usually restricted by an overzealous IT department, have always faced challenges when using the IE driver. This has been a known issue ever since the rewritten driver was introduced in 2011. The technical reasons for requiring the setting of Protected Mode settings are well-documented, and haven't changed in the intervening years.
In order to use the driver without setting the Protected Mode settings, the user had to resort to passing capabilities into the driver session creation, but this was still dicey because the driver could do nothing to mitigate when a Protected Mode boundary was crossed. So even then, it was possible, even likely, to receive errors like, "Unable to get current browser," or, "Unable to find element on closed window." As such, there was no conceivable way to work around the issue and still support all of the versions of Internet Explorer that were required. Since as of July 2019, the driver will support no versions other than IE 11, that landscape has changed somewhat. A change to the IE driver was recently (at the time of this writing) committed that makes the attempt to at least make the experience somewhat better.
Now, when the user does not set the Protected Mode settings of the browser and sends the capability to bypass the checks for those settings, the driver will attempt to predict when a Protected Mode boundary will be crossed, and set in motion a process to reattach itself to the newly created browser. This process is far from perfect. It is subject to really challenging race conditions that are truly impossible to eliminate entirely, because of the architecture of the browser itself. Nevertheless, even in its flawed state, this is still a better outcome than it was previously for users.
Please note that the advice and support policy of the IE driver will continue to be that the user must set the Protected Mode settings of the browser properly before using the driver. Any "issues" that arise by not having the settings set, but that disappear when the settings
are corrected, are not considered by the project to be valid issues. This will include, but not be limited to, issues like abandoned browser instances not being closed, use of multiple instances of the driver where the wrong browser window is connected to and automated, and issues where the driver appears to hang upon navigation to a new page. If the problem disappears when the browser is properly configured, any issue reports will be immediately closed with a note to properly configure the browser and remove the capability.
The following situations should be at least partially mitigated by the change:
- Navigation to a new page
- Clicking on a link (specifically an <a> tag) that will lead to navigation to a new page
- Clicking a link that opens a new window
Other cases, like navigating backward and forward through the browser history, clicking an element that submits a form, and so on, may not be handled. In those cases, issue reports will be summarily closed, unless a specific pull request fixing the issue is also provided. Additionally, use of things like proxies to capture traffic between the browser and web server may miss some traffic because of the race conditions inherent in the mechanism used to reattach to a newly created browser. Again, these race conditions are unavoidable, and
issue reports that are based on them will be immediately closed with a note indicating that the browser must have its settings properly set. These strict guidelines are not intended to be harsh, and are not put in place with the intent to avoid investigating and fixing issues;
rather, they must be enforced because the underlying architecture of the browser makes them unavoidable.
While not perfect, it's hoped that these changes will make things a little easier for users who run against Internet Explorer, but are prevented by circumstances beyond their control from properly configuring the browser. If you're one of those unlucky users, I hope you'll give the driver a spin, and see how it works for you.