Most cloud providers have an easy means to scale up, or scale out, a web application. “Scaling up” means using a “gruntier” server, “scaling out” means using more servers. What is applicable depend on how your application is architected. However, be warned that extra resource means extra cost! Before you even think about doing either of these things there are number of simple steps you can take to save you money.
We recently had a client who wanted their web application to be able to handle 10,000 concurrent users. The first step was to work out what it could currently handle. We were using an S1 standard Microsoft Azure server, which has 1 Core and 1.75 GB of RAM to host the application.
We put a test scenario together in our load testing tool of choice, Load Impact, and fired the test up. We managed to get to about 300 concurrent users before we started getting HTTP timeouts. 300 users. Holy smoke, that's pathetic!
Azure has some pretty useful tools for tracking down where the issue lay and we soon realised it was the database. Each request to the server made a fairly complicated read across a number of database tables and this was where the bottleneck was. Our first thought was to scale up the database.
We were running on a database tier S0 which allows for 10 DTUs (Database Throughput Units - no-one can quite explain what a DTU is apart from the fact that it's a relative unit of performance). So we tested a number of different database tiers with differing numbers of DTUs. What we found is that performance was linear in relation to the number of DTUs.
Unfortunately so is price. So we could get 600 concurrent users by going to 20 DTUs but were now paying $50/month instead of $25. In order to get to the type of performance we wanted we were going to have to ask the client to pay about $1,500/month for a premium database layer! Clearly, not an option.
Now was the time to start looking at caching strategies in order to reduce the number of database requests. Bearing in mind the client was on a tight budget we needed to find a simple strategy that was easy to implement.
We could have looked at caching database results in-between reads but by far the easiest option was to turn on page caching. We cached the page result for 10 seconds and re-ran the tests. The results were much better, we were now able to handle about 2500 concurrent users, an increase of ten times. Not bad for one line of code. But still not good enough.
The issue now was that the server was maxing out in the CPU department. Each user simulation that we were doing was asking for 7 resources from the server.
These consisted of images, CSS, javascript, html etc. While seemingly small this quickly adds up. Over the course of a 5-minute test up to 80 GB of data was being downloaded from the server (our tests assumed that each user was unique and that none of the files were cached on the browser).
We scaled the server up to an S2 tier, which doubled the RAM and the cores. Predictably we got double the throughput, but as you may have guessed, at double the price. Once again, to achieve the scale we wanted was going to lead to a silly cost for the client. We had to reduce the number of requests to the server.
The easiest way to reduce server traffic in this instance was to make use of a Content Delivery Network (CDN). These are essentially highly optimised web servers that can be used to server up static files. Azure makes it a doddle to set these up, and a short time later we had all our static resources being served up via a CDN.
Each static file was now taking about 10-20ms to download compared to the 300-400ms it was taking before. That's a phenomenal difference! Even more importantly it wasn't taking up any of our server CPU resources. So we held our breaths and re-ran the tests. The difference was dramatic. The server could now handle 7000 concurrent users without even breaking a sweat.
Unfortunately, 7000 users were as high as we could go with the pricing tier of Load Impact that we’re using but the results were so emphatic that we felt very happy that the website would perform as expected on the day.
So all in all it was a great result. The website is running on an affordable combination of server and database tiers, so monthly costs are low and thanks to page caching and CDN usage it performs at a very high level of throughput.