A super-fast web transaction (and Google SPDY)

(Update: I had a formatting error in the original posting, this has been fixed.)

A few weeks ago when I wrote about the non deployment of SSL I touched on an old idea I had to make web transactions vastly more efficient. I recently read about Google's proposed SPDY protocol which goes in a completely opposite direction, attempting to solve the problem of large numbers of parallel requests to a web server by multiplexing them all in a single streaming protocol that works inside a TCP session.

While calling attention to that, let me outline what I think would be the fastest way to do very simple web transactions. It may be that such simple transactions are no longer common, but it's worth considering.

Consider a protocol where you want to fetch the contents of a URL like "www.example.com/page.html" and you have not been to that server recently (or ever.) You want only the plain page, you are not yet planning to fetch lots of images and stylesheets and javascript.

Today the way this works is pretty complex:

  1. You do a DNS request for www.example.com via a UDP request to your DNS server. In the pure case this also means first asking where ".com" is but your DNS server almost surely knows that. Instead, a UDP request is sent to the ".com" master server.
  2. The ".com" master server returns with the address of the server for example.com.
  3. You send a DNS request to the example.com server, asking where "www.example.com is."
  4. The example.com DNS server sends a UDP response back with the IP address of www.example.com
  5. You open a TCP session to that address. First, you send a "SYN" packet.
  6. The site responds with a SYN/ACK packet.
  7. You respond to the SYN/ACK with an ACK packet. You also send the packet with your HTTP "GET" reqequest for "/page.html." This is a distinct packet but there is no roundtrip so this can be viewed as one step. You may also close off your sending with a FIN packet.
  8. The site sends back data with the contents of the page. If the page is short it may come in one packet. If it is long, there may be several packets.
  9. There will also be acknowledgement packets as the multiple data packets arrive in each direction. You will send at least one ACK. The other server will ACK your FIN.
  10. The remote server will close the session with a FIN packet.
  11. You will ACK the FIN packet.

You may not be familiar with all this, but the main thing to understand is that there are a lot of roundtrips going on. If the servers are far away and the time to transmit is long, it can take a long time for all these round trips.

It gets worse when you want to set up a secure, encrypted connection using TLS/SSL. On top of all the TCP, there are additional handshakes for the encryption. For full security, you must encrypt before you send the GET because the contents of the URL name should be kept encrypted.

A simple alternative

Consider a protocol for simple transactions where the DNS server plays a role, and short transactions use UDP. I am going to call this the "Web Transaction Protocol" or WTP. (There is a WAP variant called that but WAP is fading.)

  1. You send, via a UDP packet, not just a DNS request but your full GET request to the DNS server you know about, either for .com or for example.com. You also include an IP and port to which responses to the request can be sent.
  2. The DNS server, which knows where the target machine is (or next level DNS server) forwards the full GET request for you to that server. It also sends back the normal DNS answer to you via UDP, including a flag to say it forwarded the request for you (or that it refused to, which is the default for servers that don't even know about this.) It is important to note that quite commonly, the DNS server for example.com and the www.example.com web server will be on the same LAN, or even be the same machine, so there is no hop time involved.
  3. The web server, receiving your request, considers the size and complexity of the response. If the response is short and simple, it sends it in one UDP packet, though possibly more than one, to your specified address. If no ACK is received in reasonable time, send it again a few times until you get one.
  4. When you receive the response, you send an ACK back via UDP. You're done.

The above transaction would take place incredibly fast compared to the standard approach. If you know the DNS server for example.com, it will usually mean a single packet to that server, and a single packet coming back -- one round trip -- to get your answer. If you only know the server for .com, it would mean a single packet to the .com server which is forwarded to the example.com server for you. Since the master servers tend to be in the "center" of the network and are multiplied out so there is one near you, this is not much more than a single round trip.

A few extra packets flow. The DNS server probably will have to send two packets (one to you, one to the next server) where before it just answered you. This is asynchronous though, it does not slow down the primary transaction. In theory the DNS server does not have to answer you at all, however, because the target web server or final DNS server could also provide that answer along with the answer to the web request -- it could come in the same packet, even. However, with only one server responding, the cause of failure is harder to detect, and you don't get to cache the intermediate results to simplify retries in a failure.

Next, let's consider what should happen if the response is complex. A complex answer might be a large (many packets) answer or a slow one generated over time. More commonly, it might also be an HTML page that embeds a lot of images, stylesheets and javascript, invoking sessions. In this case, the response to the request would be to open a TCP session (not just UDP) to the target address and port. (Or possibly a different one if this was specified.) Normally this means the requester should listen both for UDP answers on the port, or attempts to open TCP sessions on the port. If a TCP session is opened, it would be used for all the complex parts of the response, and for further requests. Other TCP sessions could also be opened by either end, though most commonly by the initial requester, who now knows the IP address and port to use from the initial response. (As noted, the response combined both an answer to the GET and also the DNS answer.)

This now has become slightly slower, but not much, and indeed it is still faster than the full DNS and TCP approach. However, in cases where it is known the response is likely to be complex, the old-style approach could still always be used -- the requester opening a TCP socket to the server. It's not out of the question that URLs could be written in a form that lets the requester know if the response is likely to be simple or complex, and so requesters would know which way to start. Of course, this protocol could also be given a different name (wtp:) and protocol designation, but that would only work once all web clients were expected to support it, though it would be quite wasy to support since "fall back to HTTP" always works as a means to support it. Initially another trick might be useful, like using "wtp" as the domain name instead of www, or otherwise hiding that in the URL.

Inside these TCP sockets, protocols like the Google SPDY pointed to above could of course be used to optimize them.

Authentication and Security

This protocol should have encryption in it from the get go, and done in a way that is transparent and easy. This is particularly important because people will attempt to spoof requests and responses in any UDP style protocol, and even make random attempts to open ports randomly in hope of finding clients waiting for WTP responses. Clients waiting for connections are ripe for security attack and buffer overflow efforts.

In many cases, requesters will know a public encryption key for the target server. This key might have been remembered from past transactions. It might have come in a DNS response. And it might even be embedded in the URL or in the link tag. (For example, consider writing a link tag as: <a href=wtp://example.com/page.html key=nnnnnnnnnnnn />.)

If a key is known, the payload of the WTP request would be encrypted using that key. In the payload would be the rest of the URL (the site name has to be in the clear, or encrypted with the known key of the DNS server which would be different.) Also in the payload would be a randomly chosen symmetric key to be used to encrypt responses and other further traffic.

In this event all responses would be encrypted, and can thus also be verified as coming only from the owner of the public key in question. In fact the whole session would be encrypted -- with no extra round trips.

If a key is not known for the target, a few options are available. One would be to send the initial request in the clear, but to include in the initial response a public key for the requester. In that case, all responses would come with an encrypted payload, encrypted using that public key, and that payload would contain the symmetric key for further communications. In this case an attacker is able to see the initial request. However, if the request was something simple like "GET wtp://www.example.com/" -- ie. the home page -- there is nothing in particular to protect, as the sniffer will already be able to see where the request is going.

If a key is known for the DNS server, then the request to the DNS server (and response) can and should be encrypted, though the DNS server will unwrap that and forward the request on either in the clear, or using a key it knows for the target it is forwarding to. Hop-to-hop encryption is inferior, but still better than nothing.

Finally, the request could be encrypted using an ID-based encryption a clever technique where you can calculate a key from the domain name of the site, and a system exists where that site, and only that site, can learn what the private key is from a special central server. The main problem with ID based encryption is it requires a central authority, or at least a central authority or cooperating group of central authorities for any given identity. However, there is an out. The system could be built so that which central authority you use is based on a hash of the domain name. Since sites can choose the lower level parts of their domain names, ie. www1, www2, www3 and so on) they can choose one that will be bound to the central authority they choose. This decentralizes that system and allows competition, but does require clients to be able to get and cache the key generation formula for each authority.) Sites that wish to allow people to connect to "www" would not get to choose their authority, but could have all requests there redirected to a different domain -- losing some, but not all of the benefit of the one-round-trip system.

In this way, all requests can be fully encrypted from the start. In fact, even the DNS server forwarding the request will not know the URL that is being fetched.

As noted, any code that is going to listen for a WTP response must be carefully written, anticipating attacks and spoofs. Of course, demanding that the response be encrypted using the provided key is an excellent way to quickly detect and discard any spoofs.

As noted, there are a few other ways to embed the key. The domain could be written as www.wtpknnnnnnnnnnnnnnnnnnnn.example.com in all places, though of course this means the URLs can't be typed by hand. The key could also be in the URL portion, such as wtp.example.com/page.html?wtpkey=nnnnnnnnnnnnnnnnnnnnn, and no ID encryption would be needed. If ID encryption were to be the norm, sites might need a way to say they don't want to use it, and prefer other methods. And the protocol and syntaxes would need some room to allow people to choose different, not yet invented authentication and encryption methods.

Of course, all keys could also be accompanied by certificates from trusted certificate authorities (CAs) so that users could not just trust their sessions, but be sure about who they are talking to, just as is done in TLS today.

NAT and Firewall

Most requesters are behind NAT and thus can't provide an open port for responses. However, almost all NATS will send back a UDP response on the same port as an outgoing UDP request, at least if it comes from the target server. They won't necessarily allow an incoming TCP open, or a response from a different server than got the request, however.

As such, it becomes necessary for NATs to proxy this protocol. At a simple level, if they see an outgoing WTP request, they would open up a port to the requester, and rewrite the response IP and port fields so any responses (UDP or TCP) come there. This requires that the response IP and port be in the clear and rewritable, or that the protocol support NATs and firewalls adding an additional response port to any request, possibly encrypted independently by the NAT.

With existing NATs, the client making the request would need to be smart, possibly using a protocol like STUN to figure out their NAT and write the correct values in the response port.

Finally, the DNS servers receiving WTP requests would of course notice that the response IP was unreachable, and know to rewrite to the actual source address and port from which the request was received. As long as the NAT was not fully symmetric, and allowed the remote web server to send UDP to that IP and port, the protocol would work.

It would only work for UDP with most NATs, however. As such the protocol would need a UDP response which says, "My answer is complex and I can't open a TCP session to you. Please open a TCP session to this address and port to continue our transaction." In most cases the client will have perhaps already decided that it will default to the old HTTP method most of the time when behind such a NAT, and initiate all requests as TCP and HTTP.

Do we need this?

This protocol would have been a big win in the earlier, simple web. However, today, fewer and fewer web requests return a simple result. Most pages are full of included elements and images, and are quite long. The extra overhead of all the TCP is no longer as expensive on a relative basis.

However, there are still a number of requests which are simple. In particular, non-human requests such as web APIs SOAP and friends could use a quick protocol. AJAX requests, done by javascript code found in web pages, also could use a quick transaction protocol in many cases. (Though it should be noted such requests are always done when you know the server and may well have a keep-alive TCP session going.)

It may also be the case that turning web transactions into a very quick 2,000 bytes followed by a TCP session with the rest of the results would generate a much faster web. The initial layout of more modest web pages would appear immediately, to be filled in later after all the TCP handshakes. Research would be needed to see if this is valuable.

Add new comment