Hashing an email (or username) to store in redis hash buckets

1k views Asked by At

I am writing a node.js application that relies on redis as its main database, and user info is stored in this database.

I currently have the user data (email, password, date created, etc.) in a hash with the name as user:(incremental uid). And a key email:(email) with value (same incremental uid).

When someone logs in, the app looks up a key matching the email with email:(email) to return the (incremental uid) to access the user data with user:(incremental uid).

This works great, however, if the number of users reaches into the millions (possible, but somewhat a distant issue), my database size will increase dramatically and I'll start running into some problems.

I'm wondering how to hash an email down to an integer that I can use to sort into hash buckets like this (pseudocode):

hash([email protected]) returns 1234  
1234 % 3 or something returns 1
store { [email protected] : (his incremental uid) } in hash emailbucket:1

Then when I need to lookup this uid for email [email protected], I use a similar procedure:

hash([email protected]) returns 1234  
1234 % 3 or something returns 1
lookup [email protected] in hash emailbucket:1 returns his (incremental uid)

So, my questions in list form:

  1. Is this practical / is there a better way?
  2. How can I hash the email to a few digits?
  3. What is the best way to organize these hashes into buckets?
2

There are 2 answers

0
Tim Brown On
  1. It probably won't end up mattering that much. Redis doesn't have an integer type, so you're only saving yourself a few bytes (and less each time your counter rolls over to the next digit). Doing some napkin math, at a million users, the difference in actual storage would be ~50 mbs. With hard drives in the < $1 / gb range, it's not worth the time it would take to implement.
  2. As a thought experiment, you could maintain a key that is your current user counter, and just GET and INCR each time you add a new user.
1
NeiL On

Yes it the better way for saving millions of key value pair in hashes. You need to create the algorithm for yourself. For example - you can use timestamp for creating a bucket value which changes after every 1000 value. . There can be many other ways.

Read this article for more reference http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value