Wednesday, May 1, 2013

InfoQ: Scaling Pintrest

Some things i learnt from the a InfoQ video about Scaling Pintrest.

1. Clustering is complicated
This changed my whole point of view about High Availability. The comment made about how your application is depending on some complicated algorithm for synchronising and making sure every node is in a correct state is bad because if anything goes wrong you will have a lot of trouble. Which is a problem with a lot of clustering solutions, they are complicated to setup and if anything goes wrong you have a hard time figuring out what's wrong, most of the time I just pray and reboot the servers hoping that everything works.

Compared that to Sharding where you know how your data is stored and partitioned (You wrote the algo). But the problem with Sharding is that your application will need to be smarter on how to find the data, perform joins and you need to have a robust plan for data migration. You will need to plan for future capacity when you are designing the Shard so you don't have to perform migration too often.

The video does show how they designed their sharding algo and considerations made.

2. Keep things simple in the technology stack
This has got to be the rule that everyone knows but damn hard to follow. The Pintrest guys started off with a complicated stack but eventually they settle down to well known technology in terms of maturity, stability and simplicity. They eventually just settled on MySQL, Redis and Memcache. Which are all well know and simple to understand systems.

3. Cache, Cache, Cache
For any large system, caching your reads is important for performance. (But you already know that :))