This article is part 2 of a 5 part series by Mark B. Friedman (LinkedIn).
Click here to read part 1 of the series – Link
SPDY 101– Here’s what Wikipedia (https://en.wikipedia.org/wiki/SPDY) has got to say about SPDY.
SPDY. (pronounced speedy) is an open networking protocol developed primarily at Google for transporting web content. SPDY manipulates HTTP traffic, with particular goals of reducing web page load latency and improving web security. SPDY achieves reduced latency through compression, multiplexing, and prioritization, although this depends on a combination of network and website deployment conditions. The name “SPDY” is a trademark of Google and is not an acronym.
Throughout the process, the core developers of SPDY have been involved in the development of HTTP/2, including both Mike Belshe and Roberto Peon. As of February 2015, Google has announced that following the recent final ratification of the HTTP/2 standard, support for SPDY would be deprecated, and that support for SPDY will be withdrawn completely in 2016.
SPDY – After extensive testing at Google and elsewhere, some clarity around SPDY performance has begun to emerge; we are starting to understand the characteristics of web applications that work well under SPDY and those that SPDY has little or no positive impact on. At a Tech Talk at Google back in 2011, the developers reported that implementing SPDY on Google’s web servers resulted in a 15% improvement in page load times across all of the company’s web properties. The SPDY developers did acknowledge that the experimental protocol did not help much to speed up Google Search, which was already highly optimized. On the other hand, SPDY did improve performance significantly at YouTube, a notoriously bandwidth-thirsty web application. Overall, Google’s testing showed SPDY required fewer TCP connections, fewer bytes transferred on uploads, and reduced the overall number of packets that needed to be transmitted by about 20%.
Google initially rolled out SPDY to great fanfare, publicizing the technology at its own events and at industry conferences like Velocity. At these events and on its web site, Google touted page load time improvements on the order of 50% or more in some cases, but did not fully explain what kinds of web site configuration changes were necessary to achieve those impressive results. Since then, there have also been several contrary reports, most notably from Guy Podjarney, a CTO at Akamai, who blogged back in 2012 that the touted improvements were “not as SPDY as you thought.” Podjarney reported, “SPDY, on average, is only about 4.5% faster than plain HTTPS, and is in fact about 3.4% slower than unencrypted HTTP” for a large number of real world sites that he tested. After extensive testing with SPDY, Podjarney observed that SPDY did improve page load times for web pages with either of the two of the following characteristics:
- Monolithic sites that consolidated content on a small number of domains
On a positive note, Podjarney’s testing did confirm that multiplexing the processing of Responses to GET Requests at the web server can boost performance when a complex web page is composed from many Requests that are mostly directed to the same domain, allowing HTTP/2 to reuse a single TCP connection for transmitting all the Requests and their associated Response messages.
As I will try to explain in further detail below, the HTTP/2 changes reflect the general trend toward building ever larger and more complex web pages and benefit the largest web properties where clustering huge numbers of similarly-configured web servers provides the ability to process a high volume of HTTP Requests in parallel. As for web pages growing more complex, the HTTP Archive, for example, shows the average web page increased in size from 700 KB in 2011 to 2 MB in 2015, with the average page currently composed of almost 100 HTTP objects. Internet access over broadband connections is fueling this trend, even with network latency acting as the principal constraint on web page load time.
A large web property (see Alexa for a list of top sites) maintains an enormous infrastructure for processing huge volumes of web traffic, literally capable of processing millions of HTTP GET Requests per second. The web site infrastructure may consist of tens of thousands (or more) individual web servers, augmented with many additional web servers distributed around the globe in either proprietary Edge networks or comparable facilities leased from Content Delivery Network (CDN) vendors such as Akamai. The ability to harness this enormous amount of parallel processing capability to respond to web Requests faster, however, remains limited by the latency of the network, which is physically constrained by signal propagation delays. A front-end resource of these infrastructures that is also constrained is the availability of TCP connections, which is limited by the width of the TCP Port number, which is 16 bits. That limitation in TCP cannot be readily changed, but the HTTP/2 modifications do address this constraint.
SPDY also included server push and prioritization, but far less is known about the impact of those specific new features today. The final draft of the protocol specification is available at http://http2.github.io/http2-spec/.
Whats In Store: At this point I want to drill deeper into the major features in the HTTP/2 revision and then try to evaluate their tangible impact on web application performance. The HTTP/2 revision of the protocol features the following:
- Server push
- Header compression
- Streamlined SSL connections
Multiplexing over a single HTTP connection is the most important new change and the one we understand the most because of Google’s SPDY project.
- Multiplexed, interleaved streams for processing HTTP Requests in parallel at the web server. The capability to process multiple GET Requests in parallel over a single HTTP connection is the centerpiece of the protocol changes.
Web pages are generally composed from multiple HTTP objects, but up until now, HTTP 1.x has been limited to the serial processing of individual HTTP GET Requests issued by the web client for objects as they are discovered in the HTML markup and added to the Document Object Model for rendering. This serial rendering process is depicted schematically in Figure 1.
Figure 1. The web client in HTTP/1.x issues GET Requests to a web server serially over a single TCP connection. A follow-up GET Request is delayed until the Response message from the previous Request is received. HTTP/1.x allows for multiple connections to the same domain in order to download content in parallel.
In the diagram in Figure 1, the Round Trip Time (RTT) is also indicated, the time for a message to be transmitted from one Host to the other and for a TCP packet acknowledging receipt of that request to be received back at the Sender. The network Round Trip Time also reflects the minimum amount of time that a client needs to wait for an HTTP Response message from the web server in response to an HTTP GET Request. Notice this minimum response time is 2 * the network latency, independent of the bandwidth of the transmission medium. Network bandwidth only becomes a factor when the HTTP Request and Response messages are very large compared to the size of the segments TCP transmits, which are usually limited to 1460 (or fewer, depending on the size of the IP and TCP headers) bytes, due to restrictions in the Ethernet protocol that limit the size of the Maximum Transmission Unit (MTU). TCP messages that are larger than the MTU are broken into multiple packets by the IP layer before they are handed off to the network hardware, or Media Access (MAC) layer.
Network RTT is mainly a function of the physical distance separating the two machines, plus additional latency for each physical connection along the route, network hops where packets require some minimal amount of processing by IP routers in order to be forwarded to the next stop on the way to their ultimate destination. Web clients accessing web servers over the public Internet can expect to encounter RTTs in the range of 30-100 milliseconds, which is mainly a function of the physical distance separating the web client from the web server. None of this, of course, changes under HTTP/2.
The ability to download content using multiple sessions is one form of parallel processing that is currently available to HTTP 1.x clients. Currently under HTTP/1.x, these Requests are issued serially per connection, arriving one at a time at the web server where they are processed in the order in which they are received. At issue with that approach is that each concurrent session under HTTP 1.x does require the establishment of a separate TCP connection, something that is even more time-consuming under HTTPS. Multiple sessions can also be wasteful when the individual connections are only used to transfer a single HTTP object or are accessed sporadically.
Of course, Figure 1 also greatly simplifies what the web infrastructure at a large web property looks like. The diagram depicts a single web server, which is how it appears to the web client. In actuality, there can be thousands of web servers configured in a single, co-located cluster that are each capable of responding to the Request. The HTTP/1.x protocol being both connectionless and sessionless means that each Request is an independent entity. The stateless character of the HTTP protocol is what makes it possible for any web server in the infrastructure to respond to any Request. This sessionless behavior is also the key factor that allows for applying parallel processing to web workloads on a massive scale. (See this link for more on this aspect of the HTTP protocol.) TCP, the underlying Transport layer, is connection-oriented, but HTTP/1.x is not.
However, there are many web applications that generate HTML Response messages dynamically based on session state, usually the identity of the customer and, often, the customer’s current location. HTTP allows the web application to store a cookie at the web client where data encapsulating the session state is encoded and made available to subsequent Requests. Cookie data is automatically appended to subsequent GET Requests issued for the same domain in one of the HTTP message header fields.
Unfortunately, it is all too easy to overuse the Session object in ASP.NET to the point where the SQL Server access to Session data inhibits both the scalability and performance of the application. The popular Model-View-Controller (MVC) paradigm in ASP.NET is especially vulnerable to this condition.
Mark B. Friedman (LinkedIn) is a Principal and the CTO at Demand Technology Software, which develops Windows performance tools for computer professionals. As a professional software developer, he is also the author of two well-regarded books on Windows performance, as well as numerous other articles on storage and related performance topics. He was a recipient of the Computer Measurement Group’s A. A. Michelson lifetime achievement award in 2005. He currently lives near Seattle, WA and blogs at http://computerperformancebydesign.com.