Web Screenshot API

Over the past decade in web front-end engineering we’ve learned a great many things about how best to test our applications. I still remember an article from maybe 2006 or 2008 (which I can’t find–maybe it was written by John Resig or Jörn Zaefferer?) discussing how interrogating what the browser believed about its state generally matched reality.

This approach works very well testing JavaScript plugins for things like element existence and function correctness, but completely falls flat when attempting to review highly interdependent components or layout–it simply does not make sense to write JavaScript tests for every CSS declaration of every element. Further, this becomes absolutely nonsensical with the myriad resolutions, pixel densities, operating systems, and browser combinations we encounter on the modern web.

A visual test is the only way to ensure layout correctness, and we need to make it easy to accomplish across a wide variety of devices.

WebDriver

The W3C WebDriver specification was born of the desire to allow external scripts to automate browsers for testing. This makes a lot of sense when you have a large website whose pages are server-constructed and loosely coupled: a developer can write tests individually for each page and navigate through the application programmatically. As the web application’s state is reconstructed by the server on every request the continuity of an externalized test suite is incredibly beneficial. The authors and implementers of WebDriver recognized very early that being able to take a screenshot is a necessary feature in order to validate layout. Using something like Selenium Grid a developer can run their tests across the wide variety of devices necessary to confirm correctness.

Modern Web Application Testing

A number of the current client-side framework testing approaches are run as instrumentation inside of the running application. This fits with the interrogation of the browser about its state, and allows the framework to pause at the correct points in execution to process certain tests. With long-lived state this gives the test suite the ability to test the entire application without relying on external tools. This is generally an improvement over external browser automation frameworks: it can reduce the “flakiness” of those tests and capture much more information about failing tests such as the current call stack. Running tests across multiple devices becomes as simple as pointing them all at a specially crafted URL. One problem: there is not currently any way to trigger capturing a screenshot from testing code, preventing us from testing layout.

We need .takeScreenshot() for the web.

Because of the evolution of testing methodology we need to allow for some testing-specific functionality such as taking a screenshot to become available as a web-facing API. There are numerous security issues involved in something like this, but those can be easily mitigated as this feature would only ever be enabled by developers and can require command line flags, about:config, developer tools, and/or other esoteric configuration methods. Further, the spec for this functionality should exactly mirror that of WebDriver’s to make it easy to implement in any browser which supports the WebDriver spec.

What needs to be done to make this happen?