I am currently researching large scale application optimisation and scaling, and through my research I have gotten to grips with the standard ways of doing large scale with DNS Round Robin for splitting the load across load balancers, using load balancing to divide traffic across web-servers like Nginx, which again uses load balancing techniques to divide the traffic on pools of application servers and so forth until you hit the databases.
This works well up onto a point where the load balancers them selves reach their limits. This is the point, I am told, where one typically implements event queues and workers in place of the load balancers. The workers pick jobs from one or more queues, they do the job and put the results in an outgoing queue. Other workers takes these results, processes them and puts that result in yet another outgoing queue and so forth until the final formatted data (JSON, HTML etc) is going to be sent to the client.
This is where I am a bit confused at the moment. When one goes to these lengths to have a super scalable, non-blocking architecture all the way from front to back, and to the front again, how does one take the final step and actually get the data to the user? The user connects to for instance server A, while the worker on server B might be the one finishing the data processing because it had spare time before server A did. How can server B get the response to the client without waiting for server A to get spare time?
I hope this makes sense. If not, feel free to ask follow ups.
Thanks!
Not sure what you mean by
Usually, you would still have several web servers behind a load balancer to handle high volume and when a user is navigating through a site the requests could be send to any of the web servers unless you specifically configured the load balancer to send the requests to the same server per user session which is not the best option from scalability perspective. So let's consider an example:
You have two web servers: Web1 and Web2
Two Worker services: Worker1 and Worker2
Two queues: Q1 for incoming requests and Q2 for the results.
When a user hits a button to save some changes the request is redirected to, let's say, Web1. Web1 puts the request to Q1 and sends OK status back to the client. Then any of the workers pulls the request from the queue, process it and puts the result into Q2. Then one of the web servers, let's say it's Web2, pulls the result from Q2 and send it to the client.
I think it's obvious that both web servers have access to the result queue and can send the request back to the client. As far as I understand you would like to know how the last step is achieved that is how the result is send to the client. There are a number of ways to do it:
I'm not going to describe each option as it's beyond the scope of the question but you can google to learn more on the topic. However, just to make it clear let's consider the same example from client side perspective with Long polling.
When a user hits a button, a progress spinner is displayed and a request is send to the backend. Then a quick response is recieved that the request is successfully queued. Then we do a polling request which could be send to any of the web server. The spinner is still displayed as we are interested in the result. Then as soon as the request is processed on the backend the polling request should return with the result and we can remove spinner now and probably display a message.
Please note, the senario described is just on of many possible ones but I hope it gives you a good explanation on how things work in this case.