WebSockets in a Scalable Application

At Buutti, our consultants often face challenges related to scalable architecture. Nowadays, one of the common requirements for web applications is the ability to scale flexibly to accommodate a large number of users. A very typical approach to implementing this scalability is through horizontal scaling, where identical server instances are added alongside a single server as the user base grows. In this blog post, we’ll explore, through a real-world example, how to implement user communication via WebSocket connections in such a system: what kinds of issues are encountered and how to solve them.


What are WebSockets?

WebSockets are a protocol that enables bidirectional communication between a client and a server through a single persistent connection. Unlike the traditional HTTP protocol, where the client sends a request and the server responds, in a WebSocket connection, both the client and server can send and receive data at any time without the connection being interrupted. This makes WebSockets an efficient solution for real-time data transmission, where the server needs to send information to the client without separate requests.

The advantage of WebSockets is lower latency and message overhead compared to the HTTP protocol, as the connection is kept open continuously instead of sending repeated individual requests. A persistent connection saves bandwidth and resources compared to traditional HTTP requests, which must be sent repeatedly.

It is also possible to achieve similar behavior to WebSockets using the HTTP protocol through a technique called long polling. In long polling, an HTTP request is sent and left open for a long time until the server has data to return. However, this approach is more resource-intensive, introduces higher latency, and is less efficient in terms of bandwidth compared to WebSockets.

Comparison of HTTP and WebSocket protocols in client-server communication.

 

A common use case for WebSockets is implementing real-time chat functionality. In a client project we worked on, WebSockets were used precisely for this purpose. The project aimed to avoid HTTP polling, as in the case of chat, continuous HTTP requests from clients would have placed a greater load on both the server and the database compared to WebSockets. Additionally, the goal was to ensure messages were delivered between users as close to real-time as possible.

On the downside, WebSockets are a more complex protocol than HTTP, making their implementation and monitoring more complicated. For example, in unstable networks, the connection may drop, requiring the implementation of reconnection logic to account for this. WebSockets also require keeping the connection open for the entire session, which consumes server resources, such as memory. Since they are a newer technology compared to HTTP requests, older browsers may not fully support them. Moreover, scaling WebSockets is more challenging than scaling HTTP requests following REST architecture, as they break the stateless architecture model.


Scalable implementation

In the web application developed by Buutti, the WebSocket-based chat was implemented using the widely utilized Socket.IO library. The frontend of the application sends chat messages to the server, which stores the messages in a database and notifies all clients who should see the incoming messages. The long polling feature supported by Socket.IO was used as a fallback option for older browsers that do not support WebSockets.

For a single server, the WebSocket-based chat implementation was straightforward. However, more planning was required to scale the application horizontally so it could handle large numbers of users by flexibly adding new server instances alongside the original one. The application was built in a serverless manner using Google Cloud Run, which manages the scaling of instances automatically according to increasing user load. Cloud Run fully handles the server infrastructure, while the application itself runs in Docker containers.

Simplified diagram of the implemented horizontally scalable application.

 

A key challenge with WebSockets in a scalable environment is synchronizing data between servers. When clients establish WebSocket connections to different servers, these connections operate in isolation and do not share information between servers. For instance, if a client sends a chat message to server A, that message is only broadcast via WebSocket to other clients connected to server A. Clients connected to server B will not receive any information about the message sent from the client on server A.

This issue does not occur with HTTP polling. When a message is sent, the process proceeds similarly to WebSockets: the client connected to server A sends a chat message to the server, and the server updates the message data in the database. However, the client connected to server B sends periodic requests to the server, and server B returns the updated chat conversation from the database to the client.

In the case of WebSockets, server B has no way of knowing that server A has updated the database. As a result, it cannot inform the clients connected to it.

Comparison of inter-server communication when using HTTP polling and WebSockets without additional synchronization.

 

To address the server synchronization issue, a messaging layer was integrated into the application using Redis and its Pub/Sub mechanism. Redis Pub/Sub enables the creation of message channels, allowing multiple publishers and subscribers to communicate seamlessly through these channels.

Through Pub/Sub, the application’s servers can synchronize information about incoming chat messages with each other. Each server instance in the application subscribes to a Pub/Sub channel and can also act as a publisher when a chat message is received from a client. Since Redis Pub/Sub operates with very low latency, the servers can efficiently synchronize messages with each other and inform their connected clients via WebSocket about new chat messages. While the Pub/Sub mechanism does not guarantee message persistence, this is handled by the application’s database, where incoming messages are stored.

The synchronization between servers and clients via Redis Pub/Sub mechanism.


Testing the WebSocket solution

Reliably testing the WebSocket solution required more effort and thought from the development team compared to testing a purely HTTP-based application. While end-to-end (E2E) tests implemented with Cypress worked well for features based on traditional HTTP requests, verifying real-time communication proved to be a more challenging task. This was mainly because Cypress cannot handle two tabs or browser windows simultaneously. As a result, it was initially impossible to automatically verify that a chat message was received by another user after it was sent.

To resolve this issue, a separate WebSocket endpoint was implemented in the backend specifically for E2E tests. This endpoint is only active when tests are run, and messages sent to it are forwarded to clients connected to the chat. A custom Cypress command was also created, which sends a simulated chat message to this endpoint. Using this Cypress command and WebSocket endpoint in the E2E tests, the team was able to simulate user A sending messages and verify from the frontend, via Cypress, that the messages were successfully received by user B.

Testing real-time communication of a WebSocket-based chat with Cypress.

 

In addition, dedicated tests were implemented for the backend interfaces of the application. These tests subscribe to Redis Pub/Sub channels, send chat API requests to the backend, and listen if the corresponding chat message arrives on the Pub/Sub channel.

In the continued development of the application, testing will play a significant role in relation to WebSockets. The plan is to integrate tools like Artillery into the CI/CD pipeline to enable more complex load testing in the test environment.

Additionally, the goal is to create scheduled tests that simulate real users in the test environment, periodically checking whether the chat features continue to function as expected and reporting any errors. This approach helps verify and debug long-term instabilities in the application and WebSocket communication.


What other challenges were encountered?

In addition to scalability and testing, other challenges arose with the WebSocket solution. One challenge with using WebSockets in a local environment stemmed from the create-react-app project tool, which the application was initially built on. When developing locally, create-react-app runs the client and server on different ports. Therefore, when developing locally, a proxy server must be activated to redirect client requests to the server and avoid CORS issues.

However, the proxy configuration couldn’t be made to work with the WebSocket Secure protocol. By default, the proxy setting in package.json only applies to HTTP and HTTPS protocols, and the development server in create-react-app doesn’t allow reconfiguring the underlying Webpack Dev Server proxy. The issue couldn’t be resolved even with the use of the http-proxy-middleware package, resulting in the need to use impractical manually configured WebSocket paths, which were activated only when running the application locally. The problem was only fully resolved after migrating from create-react-app to Vite.

In production, it was also observed that the chat feature always stopped working for all clients connected to a single server after the server had been running for about a month and a half. This was suspected to be due to some sort of resource leak related to WebSockets or Socket.IO, though the root cause has not yet been identified. The issue was found to resolve on server reboot. Later, the server instances were configured to scale up and down at scheduled times according to the application’s peak usage hours, which also triggers reboots in Google Cloud Run.

One ongoing challenge with WebSockets is their monitoring within the Google Cloud Run Metrics view. It does not differentiate between WebSocket and HTTP requests, instead counting both as general request metrics. This creates the limitation that the Request latency view always displays the maximum value—i.e., the server’s request timeout setting—since WebSocket connections remain open continuously.

The article was co-written by Buutti’s consultant Ali Abdollahi and CTO Miikka Salmi.