Waiting for API calls: how I learned to stop worrying and love WebSockets instead
On a recent project, I found myself working on the backend implementation of a web-based application tool for one of our clients. At every step of the application process, the backend is to perform external API calls and internal calculations in the background, with some of the requests requiring big data analysis with a long processing time that could not be done in a synchronous manner. You’d think they wouldn’t want to make the user wait longer than they already need to, right? I’m afraid not! To get the updated status of the user, the frontend uses short polling, resulting in redundant calls and waits. This experience brought me flashbacks of ordering beer in an Amsterdam bar — speed is NOT a priority.
WebSockets explained with beer
While Amsterdam bartenders might make you wish you stayed home, my experience in Düsseldorf was very different. You get a glass of beer right when you are seated, and before you even finish it, a waiter will have placed a new one in front of you. To show that you don’t want to receive any more drinks, you place your coaster on top of the glass. As some of you might have noticed, the solution for the “beer mechanism” that they have in Düsseldorf can also be applied to my client-server communication situation.
Let’s circle back to the original problem. To retrieve the most up-to-date user state, the frontend sends a periodic HTTP request to the backend server every 15 seconds. That is, if the information is available after 16 seconds from the initial request, the user needs to wait an extra 14 seconds simply because the frontend queries for updates in intervals of 15 seconds. This was happening multiple times during the process. As a result, user retention suffered.
Besides the long, redundant waits, this polling strategy imposes unnecessary network traffic and computing power. Making API calls at regular intervals throughout the application process, even when the user leaves the website open and unused, imposes hundreds and even thousands of unnecessary calls. In addition to its energy inefficiency, it is also cost-ineffective, placing unnecessary server load on its cloud service network.
Back to our analogy:
- Amsterdam = Polling — keep asking for the information until you eventually get what you want.
- Düsseldorf = WebSockets — as soon as the information has become available, you get it. To stop the flow, you signal that you want to close the connection, i.e., place a coaster on top of the glass.
Are we there yet?
When thinking of polling, a useful analogy is a child in a car, constantly asking his mom “Are we there yet?”. It makes more sense for the kid to simply wait and let his mom notify him when they’ve arrived — which is exactly what WebSockets are all about. Like the mother informing her child that they’ve arrived, with WebSockets, the server provides the client with an update as soon as new information becomes available. When a connection is established between client and server, events can flow between them in an efficient, timely manner.
So, what should we do instead? Polling at larger intervals is not an option, as the user will have to wait longer at each step. However, decreasing the intervals will result in more wasted energy and resources. There is no optimal tradeoff here. Instead, another approach should be considered. This is where WebSockets can come into play and provide a better solution.
Let’s take a few steps back and lay out the problem. The polling strategy that is currently in place is definitely not ideal for good UX. Studies show that the longer a site takes to load, the more users are going to drop off. Moreover, higher energy consumption is never a good thing, least of all for your device’s battery. On top of that, most of the calls are redundant as most responses are identical to their predecessors. Not a lot changes in a few seconds, but when things do change, the backend should notify the frontend right away.
Each time a request and subsequent connection is made, a new data query is executed, HTTP headers are parsed, a TCP handshake is performed, a response is generated, and the data is delivered.
Short polling is done with HTTP GET requests, where the client pings the server in an attempt to retrieve new data. There may or may not be new data available, but the requests are sent nonetheless. All of the steps mentioned above have to be done with every request, and then the connection must be closed, and resources cleaned up. Thus, repeating requests is a costly process.
Typical traditional request/response protocols don’t allow for connections to stay open like we need them to in our use case. To get around this, we could use long polling, which is a more efficient version of the polling techniques I am looking to replace. Instead of repeating the resource-wasting polling cycle outlined above, long polling allows us to perform one cycle and then hold the client connection open for as long as possible, sending a response only after response data is available or a timeout occurs.
While this sounds helpful, it is resource-intensive on the server side. In addition, message order can also become problematic, as numerous HTTP requests from a single client could be in flight concurrently. This uniqueness problem is solved with WebSockets, which eliminate both the latency issues and the multiple-message issues.
WebSockets allow for true two-way communication to take place between a client (i.e., browser) and server over a single TCP connection. This allows for low-latency data transfer in real time, unlike in short or long polling techniques. WebSockets are supported by essentially all modern browsers and servers.
Here’s how they work: The client initiates the handshake with the server, and they agree to upgrade the connection to WebSockets. The two parties can then send messages back and forth. Either side can choose when to communicate, and either can choose when to end communication, by sending a message to the other. For our use case, it’s the best of both worlds!
While WebSockets present clear advantages, not all infrastructure and middleware layers support it with ease. If you are limited to using the HTTP/HTTPS protocol, have a look at Server-Send Events (SSE) as an alternative to polling.
SSE, or Server-Sent Events, use a one-way channel for streaming text-based events from a server to a client. Unlike WebSockets, which use full-duplex communication, SSE are half-duplex. For our purposes, there is no need for the backend server to periodically receive data from the client while they are connected, and furthermore the events are all text-based. SSE may therefore be another good candidate with which to replace the polling.
The client establishes a persistent HTTP connection with the server via an HTTP GET request. Then, the server can push events whenever it likes with no initiation from the client for any further requests.
WebSockets vs Server-Sent Events
SSE may be preferable to WebSockets for a number of reasons. They operate on the same HTTP/HTTPS protocols that web applications already use, including the same techniques for proxying and authentication. This can offer benefits for both backend and frontend due to the low development overhead and ease of integration into existing HTTP-based practices. WebSockets, on the other hand, require the adoption of a new protocol that could be problematic when it comes to firewalls, authentication, and infrastructure.
However, SSE may introduce limitations on scalability. There’s a hard limit on SSE connections per browser at six, in accordance with the HTTP/1.1 specs, so its use cases might be limited — as in our case. Additionally, the keep-alive connection needs to be supported by both the client and server. Further, this unilateral communication is not designed for binary data.
HTTP polling is a bandwidth hog on modern web architecture. With short polling, an application’s requirements have to be worked around to estimate an interval. No matter how accurate this estimation, the cost will be higher as requests must be continuously opened. Long polling can greatly reduce these costs by holding a connection open for a while longer, but still relies on periodic HTTP requests.
WebSockets are designed for situations in which reacting to a change quickly is important, particularly when the client is unable to predict the change. By foregoing the need for an HTTP request/response with every message sent or received, WebSockets can operate with more efficiency and lower overhead. This is achieved by having the client send a single request, after which the server and client can push messages at will, significantly reducing latency relative to polling methods.
Server-Sent Events use one long-lived connection to stream updates from the server to the client. Similar to WebSockets, this allows headers to be passed only once, when the request is made, leaving only necessary data to pass through the connection. SSE are, also like WebSockets, native to existing HTTP specs and most modern browsers.
Without a doubt, real-time applications such as the use-case I’ve presented here can greatly benefit from WebSockets and SSE for real-time client-server communication. Both options provide a scalable, efficient architecture for highly interactive applications on the web.
Now that we got that out of the way, I’m going to grab a beer (bars are still closed, don’t worry).