IdeaCloud’s social media processing and analysis requires intense CPU time and high-frequency I/O. To avoid straining the web servers, IdeaCloud needs to achieve high-scalability and must be able to distribute workload. With scalability in mind, IdeaCloud was developed on the Microsoft Azure platforms and fully leverages the power of Azure VM, Worker Roles, Queue, and Tables services.
Distributed Workload Architecture
Let’s look at the overall architecture of IdeaCloud on Azure. IdeaClouds front end website runs on ASP.net MVC 5 and were deployed on Azure VMs. While the VMs serves web content to the users, it is also responsible for preparing the overall workload requests. For example, when a user is viewing IdeaCloud, the website prepares different requests for pulling Twitter, Instagram, Facebook, Yammer feeds. Each of this request is then saved into the Azure Queue.
The Social Media Processor (SMP) is a Worker Role that reads the requests stored in the Azure Queue. Each SMP reads one message at a time and polls the social media feed, performs analytics, and then store all data into Azure Table. Azure Queue also guarantees each message can only be retrieved by one single SMP, and we won’t run into any race condition.
At last, the Azure VM (website) consumes the data stored in Azure Tables, and creates the HTML5 animated social feed for the users.
As a result of our architecture design, both the front-end (Azure VMs), and the backend SMPs (Worker Roles) can be scaled independently. Using Azure’s auto-scaling feature, the solution will automatically turn VMs on and off in response to front end traffic, and will independently deploy more SMPs in the event there is an increase in queued messages. This is very powerful and removes the need of having an operator to control the scaling.
In the event of a very sudden spike in traffic, auto-scaling will not react quickly enough. It will not cause much problems to our system as we have de-coupled the front end and SMPs with Azure Queue. As the front end is busy adding more and more requests to the Azure Queue, the depth of the queue will grow. However SMPs will eventually be scaled-out (more instances) and read messages faster and the queue will shrink again.
Another benefit of this architecture is high availability. Since each VM and SMP works independently as they only communicate via the Azure Queue, we can easily achieve redundancy by deploying multiple VMs and SMPs.
Microsoft Azure platform provides the right tools and services that make it easy for web application to achieve their scalability targets. As discussed in this post, IdeaCloud is a good example how to achieve that. In general, a distributed workload solution should inherit these important characteristics:
- Increase number of instances will able to increase ability to handle the workload linearly
- When a worker role crashes, another can pick the load and continue the work
- When workload increases, the system will scale automatically