The Ultimate Guide to Managing Millions of Key Value Pairs in Redis

Redis as we all know is an in-memory data store that is widely used as a cache, message broker, and database. One of the strengths of Redis is its ability to handle millions of key-value pairs in a performant and scalable way. When it comes to managing large amounts of data, Redis provides several features that make it an ideal solution. In this blog post, we will explore one such feature of Redis which helps in saving huge amount of data very efficiently using low memory space as we know memory is a key in such databases.

Whether you are building a web application or a real-time system that requires fast and reliable data access, Redis can be a valuable tool in your technology stack.

Problem Statement

Redis plays a key role in storing data that is accessed very frequently and requires blazing fast access. In order to provide result within a blink of the eye Redis needs to keep most of its data in memory. Now since memory is not cheap we have to be very careful in deciding what data to keep in Redis cache. For tech giants like instagram this problem is very genuine where they need to keep the response time to the minimum but saving huge amount of data in memory seems to get challenging.

Redis Hashes

The most general use case of Redis is to save data in the form of key value pairs but while prototyping a scenario the developers at instagram found that saving a million pairs of media_id and user_id as key value pair needed around 70 MB of memory. Extrapolating it to their actual use case of 300 millions keys took around 21 GB of memory which was beyond their threshold.

With the help of Pieter Noordhuis, one of Redis's core developers they started exploring Redis hashes. Hashes in Redis are dictionaries that are can be encoded in memory very efficiently; the Redis setting ‘hash-zipmap-max-entries’ configures the maximum number of entries a hash can have while still being encoded efficiently. After a few tweaks and tests this number came to be 1000 for best results as anything above this caused CPU spikes.

Saving data in Hash Buckets

The entire collection of media ids was distributed in buckets of 1000s (media_id / 1000). In the bucket media_id served as the key and user_id as its value. For example a media_id of 1153158 would fall into the bucket 1153158 / 1000 = 1153.

HSET "mediabucket:1153" "1153158" "939"
HGET "mediabucket:1153" "1153158" 
> "939"

But what did they get in going through all this trouble. Well if you remember the ultimate goal while using Redis was to save the data in the most optimised way. Using Redis hashes instagram was able to save 1 million keys in a shocking 16 MB of memory and 300 million in under 5 GB of memory. This means you can save such huge data in an economical m5.large elasticache server of AWS. And cherry on top of an ice cream cake is all this and still lookup is O(1) making it blazing fast.

Conclusion

In conclusion having a knowledge of such optimisations techniques under your disposal can help you in developing smooth and blazing fast systems while keeping the budget under the roof. I personally find exploring such optimisation techniques quite intriguing and can't wait to use this technique in some part of the system i design.

Rahul Srivastava

All Posts

The Ultimate Guide to Managing Millions of Key Value Pairs in Redis

Problem Statement

Redis Hashes

Saving data in Hash Buckets

Conclusion

Recent Posts

Comments