A large number of websites that became popular and now attract thousands of users began as a simple LAMP architecture: Linux, Apache, MySQL and PHP. Facebook was no different than that. Its rapidly growth demanded solutions for faster and more effective data access than direct access to the database (which, in their case, was also MySQL). And memcache was the answer for that scenario.
How memcache works
Memcache is a software that stores key-value pairs in memory. It offers a number of advantages for architectures using it:
- Memory access is faster than disk access, so you provide a better experience for the users;
- Serves as a “shield” to database servers, which will receive fewer requests, since the access to the database is reduced to “only less frequently requested data”;
- As databases are less accessible, the hardware of these servers can probably be less powerful and have a longer life time. The combination of these factors contribute to lower costs with this kind of hardware that is generally more robust and therefore more expensive than the standard servers.
A consequence of using caching technology is that, with the increase of your infrastructure, the use of caching changes from an optimization into a necessity. And, as any part of an architecture that is vital to the operation of your system, the more clean and scalable the better. So how to grow the size of your cache when one computer is no longer enough to hold all the data you need?
Sometimes engineers tend to create more complexity than necessary by implementing logic to know where to send each request into the web application layer. At least the beginning is always like that. However, simplicity is needed to ensure scalability and also system level reliability.
How mcrouter works
At Facebook, the answer to this challenge was given with an engineering solution which I have been working with every day: mcrouter. Mcrouter is the memcache protocol router, which is used at Facebook to route all traffic to or from one of our cache servers. In peak situations, mcrouter routes little less than 5 billion requests per second. On September 15, it was released as Free Software and is available on GitHub under a BSD license.
Basically, mcrouter receives all memcache requests, and redirects them based on rules that can be created for this routing. These rules range from connection pooling – to reuse connections with memcache servers – to different schemas for data distribution among participants of your memcache cluster:
Using mcrouter, you can create the concept of replicated pools, a response to the challenge of high availability of a caching layer, by having mcrouter writing data in different replicas of a pool in parallel. It can also do routing based on prefix of stored keys on each server, and thus can combine similar workloads on selected servers for each of these load patterns or data types.
Mcrouter also monitors the target servers and mark those who are not “healthy” as unavailable. When this happens, it automatically does the failover to another target. Once the internal monitoring of mcrouter reports that the original destination is working again, the requests are again directed to it. Other important features are support for IPv6 (Internet Protocol version 6) and SSL (Secure Sockets Layer). For those interested in caching technology who like to play with large-scale and highly available infrastructures, I strongly recommend trying this “little toy” and analyze its benefits for your particular architecture.
And everytime you access Facebook, be sure that each cache request is going through that codepath.
More information is available at https://code.facebook.com/posts/296442737213493/introducing-mcrouter-a-memcached-protocol-router-for-scaling-memcached-deployments/