Database and cache design for a user system. This blog is based on our experience and materials collected online. Written by Les_baguettes and Charle.
Design User System
- Authentication service
- Friendship service
Scenarios
- Sign up, login, request user profile, update profile. Requesting the user profile is the most frequent operation; most operations need it.
- Support 100M DAU.
- Sign up, login, and update profile are less frequent.
100M * 0.1 (10 days once)/86400(seconds in a day) = 115
- Request user profile
100M * 100 (100 times a day)/86400(seconds in a day) = 115740
Peak = 100k * 3 = 300k QPS
Services
- Authentication service: Sign up, login
- User service: User profile CRUD
- Friendship service
Storage
- MySQL/PostgreSQL: 1k QPS
- MongoDB/Cassandra NoSQL: 10k QPS
- Redis/Memcached: 100k ~ 1m QPS This is an estimate; it will vary based on the data model, schema design, hardware, etc.
So what do we need for these two groups of services?
- Sign up, login, update profile: relational database (MySQL/PostgreSQL) is enough.
- Request user profile: This service has many more read operations than write operations. At 300k QPS, we can use a cache (Redis/Memcached).
Cache
Cache is a concept, not a specific technology. It can be in-memory cache (Redis/Memcached) or distributed cache (CDN). It can also be a file system; even the CPU has cache. Cache is not only for the server side; it can also be for the client side (browser cache, local storage).
Cache-aside pattern
Cache and database are two different systems. They communicate via a web server. First, check the cache; if not found, then get it from the database and set it in the cache.
cache.set("keys":"value")
cache.get("keys")
cache.set("foo","bar",time=60) # expire in 60 seconds
class UserService:
def get_user(self, user_id):
"""Memoization algorithm"""
key = f"user:{user_id}"
user = cache.get(key)
if user:
return user
user = database.get(user_id)
cache.set(key, user)
return user
def set_userA(self, user_id):
key = f"user:{user_id}"
cache.delete(key)
database.set(user_id)
def set_userB(self, user_id):
key = f"user:{user_id}"
database.set(user_id)
cache.set(key)
def set_userC(self, user_id):
key = f"user:{user_id}"
cache.set(key)
database.set(user_id)
def set_userC(self, user_id):
key = f"user:{user_id}"
cache.delete(key)
database.set(user_id)
Problem:
Concurrent update and read can lead to inconsistency between cache and database.
Scenario 1
sequenceDiagram
autonumber
participant R as Thread 1 (read get_user)
participant C as Cache (Redis/Memcached)
participant D as Database
participant W as Thread 2 (write set_user)
R->>C: GET user:{user_id}
C-->>R: miss
R->>D: SELECT user WHERE id=user_id
D-->>R: user=v1 (old)
Note over R,W: Context switch / interleaving to another thread
W->>C: DEL user:{user_id}
W->>D: UPDATE user(id)=v2 (new)
D-->>W: ok
Note over R,W: Context switch / interleaving to original thread
R->>C: SET user:{user_id}=v1 (old)
R-->>R: return v1
Failure mode (stale cache written after a concurrent update):
get_userreads a cache miss and fetchesv1from the database.- Before
get_usercan populate the cache, a concurrentset_userdeletes the cache key and writesv2to the database. get_userresumes and writes its previously fetchedv1back into the cache.- Now the database has
v2but cache hasv1, so most subsequent reads hit cache and see stale data until the key is invalidated again or expires (TTL).
Scenario 2
sequenceDiagram
autonumber
participant W as Thread 1 (write set_user)
participant C as Cache (Redis/Memcached)
participant D as Database
participant R as Thread 2 (read get_user)
W->>C: DEL user:{user_id}
Note over W,R: Context switch / interleaving to another thread
R->>C: GET user:{user_id}
C-->>R: miss
R->>D: SELECT user WHERE id=user_id
D-->>R: user=v1 (old)
W->>D: UPDATE user(id)=v2 (new)
D-->>W: ok
Note over W,R: Context switch / interleaving to old thread
R->>C: SET user:{user_id}=v1 (old)
R-->>R: return v1
Failure mode (read repopulates cache with stale data during a write):
set_userdeletes the cache key first (intending to force the next read to go to DB).- A concurrent
get_userlands after the delete but before the DB update; it misses cache and readsv1from the database. - The write completes and commits
v2to the database. - The read (still holding
v1) writesv1back into cache, recreating a stale cache entry even though the write succeeded. These four steps are not atomic, and there may be failures in between. So we may get an old user after setting a new user.
We can get data inconsistency and dirty data. For the two operations, we are not worried about the first operation failing because it will raise an exception and not dirty the data in the database. What we are concerned about is the second operation failing, which will cause data inconsistency between cache and database.
Consistency Solutions:
Can we add a lock to make these four steps atomic? The answer is no.
- The database and cache are two different systems; we cannot have a distributed transaction across them.
- A third-party distributed lock system like ZooKeeper will slow down performance and may also fail.
- The processes do not share the same memory space, so in-memory lock will not work.
Best practice:
- database.set(key, user)
- cache.delete(key)
Why set before delete
When using cache-aside, a write needs to update the database and invalidate the cache. Two common orders:
- Delete then set (invalidate → write)
cache.delete(key)→database.set(key, user) - Set then delete (write → invalidate) (recommended)
database.set(key, user)→cache.delete(key)
Assume:
t_cache= cache op latency (GET/SET/DEL), usually sub-millisecondt_db_r= database read latencyt_db_w= database write latency, typicallyt_db_w >> t_cache- Reads are much more frequent than writes (
λ_read >> λ_write)
Why “set then delete” is more optimal
- With delete → set, you create a cache-miss window of roughly
t_db_w(after the delete, before the DB write commits). During this window, a read is forced to the DB and can still see the old valuev1, then repopulate cache withv1. This can “poison” the cache with stale data after the write commits. - With set → delete, the database is already
v2before invalidation. A cache miss will readv2, so you avoid stale-cache poisoning under normal conditions. The remaining inconsistency is a short stale-read window where some requests can still hit old cache data until the delete happens (≈t_cache), or a rare case where the delete fails (then stale data lasts until TTL).
You can estimate “how often it happens” per write by comparing windows:
- delete → set: expected affected reads ≈
λ_read * t_db_w - set → delete: expected affected reads ≈
λ_read * t_cache
Since t_db_w >> t_cache, the first order exposes many more reads, and it can create longer-lived stale cache entries.
Potential inconsistency flows
A) Delete then set (invalidate → write): stale cache poisoning
sequenceDiagram
autonumber
participant W as Writer (set_user)
participant C as Cache
participant D as Database
participant R as Reader (get_user)
W->>C: DEL user:{id}
Note over W,R: During t_db_w (DB write not committed yet), in another thread
R->>C: GET user:{id}
C-->>R: miss
R->>D: SELECT user:{id}
D-->>R: v1 (old)
Note over W,R: Original thread successfully updated db
W->>D: UPDATE user:{id} = v2 (new)
D-->>W: ok
Note over W,R: Another thread already have old user data hence polluting the cache
R->>C: SET user:{id} = v1 (old)
Note over C,D: Cache now has v1 while DB has v2
In this order, cache inconsistency is common because the vulnerable window is about t_db_w (relatively large), and reads are frequent.
B) Set then delete (write → invalidate): rare inconsistency
sequenceDiagram
autonumber
participant B as Reader B (get_user)
participant C as Cache
participant D as Database
participant A as Writer A (set_user)
B->>C: GET user:{id}
C-->>B: miss
B->>D: SELECT user:{id}
D-->>B: v1 (old)
Note over A,B: Context switch / interleaving
A->>D: UPDATE user:{id} = v2 (new)
D-->>A: ok
A->>C: DEL user:{id}
Note over A,B: Reader B resumes with v1 in memory
B->>C: SET user:{id} = v1 (old)
Note over C,D: Cache now has v1 while DB has v2
Failure mode (reader overwrites cache with old data after invalidation):
- Reader B misses cache and reads
v1from the database. - Writer A updates the database to
v2. - Writer A deletes the cache key.
- Reader B (still holding
v1) writesv1into cache, making cache stale again.
For this to happen, Reader B’s end-to-end execution time must fully overlap and outlast Writer A’s work (DB write + cache delete). Under typical latency assumptions (t_db_w and t_cache are small compared to long-tail delays), this is usually rarer than scenario A, but when it does occur it is worse, because it recreates a stale cache entry (not just a transient stale read).
We need to set a TTL for the cache to avoid stale data. For example, TTL = 7 days; after 7 days, the cache will expire, and we will get the latest data from the database. The inconsistency will last at most 7 days. This is called eventual consistency.
Strong consistency vs. eventual consistency?
Cache can optimize read performance, but it cannot improve write performance because we still need to write to the database.
Cache-through pattern
All read and write operations go through the cache. The cache is the only entry point to the database. Redis is often the choice for this pattern. Cache and database will be in sync at all times.
Write-through bottleneck
We need more database capacity.
Authentication service
Session
After login, we need to create a session object for the user and send the session key back to the browser as a cookie. The browser will send the cookie back to the server for each request. The server will get the session key from the cookie, look up the user_id in the session table, and check if the user_id exists and is not expired.
Session table
The server will store the session info in session table.
| Field | Type | Description |
|---|---|---|
| session_key (Token) | string | A hash value, globally unique, unpredictable (UUID, etc) |
| user_id | Foreign Key | References the User table |
| expire_at | timestamp | Expiration time |
| device_token | string | optional, used to identify the device |
- The server will not delete the session key normally; it will just check whether it is valid or expired.
- Single-device login vs. multi-device login
- Single-device login: when a user logs in from a new device, invalidate the old session key. The server can also send a notification to the old device like what we have in WhatsApp or WeChat.
- Multi-device login: allow users to log in from multiple devices; each device has its own session key.
- What data system to use for session table?
- For a small-scale system, we can put the session table in cache; even if it disappears, users can log in again. We can issue a new session key based on user ID and password.
- For a large-scale system, we can put the session table in a relational database like MySQL/PostgreSQL and add cache.
- We can have an index on the user_id to quickly query all sessions for a user.
Cookie
Session_key will be stored in a cookie on the browser side. A cookie is a small piece of data stored on the client side (browser) and sent to the server with each request. It is used to maintain session state and store user preferences. You can treat it as a hash table stored on the client side. The cookie will be sent to the server for each request. So the larger the cookie size, the more bandwidth it will consume and the slower the request will be.
Friendship service
- Unidirectional friendship: follower/following model like Twitter, Instagram.
- Bidirectional friendship: like Facebook, WeChat.
- Mixed friendship: like LinkedIn; follow first, then become friends after acceptance.
unidirectional friendship
Friendship table:
| from_user_id | to_user_id | Description |
|---|---|---|
| 1 | 2 | 1 is following 2 |
bidirectional friendship
Once accepted, both sides will have an entry in the table.
strategy 1: one entry for each friendship. One strategy is to have only one entry with smaller user_id as from_user_id and bigger_user_id as to_user_id.
Friendship table:
| smaller_user_id | bigger_user_id | Description |
|---|---|---|
| 1 | 2 | 1 and 2 are friends |
strategy 2: two entries for each friendship. Change the non-directional friendship table to a bidirectional friendship table.
| from_user_id | to_user_id | Description |
|---|---|---|
| 1 | 2 | 1 and 2 are friends |
| 2 | 1 | 2 and 1 are friends |
The second one is easier to query the friends list, and it takes more storage space. But we prefer the second one. We treat space as cheap, and we can have faster query performance.
We need to use transactions to make sure both entries are created or none.
Cassandra
Cassandra is a three-layer NoSQL database:
- row_key
- column_key
- value
So the key = row_key + column_key, and one value per key.
How to store value: we need to serialize the value object (binary tree) to a string or byte array. Common serialization methods are JSON, Protobuf, Thrift, Avro, etc.
Row key
We call it a Hash Key or Partition Key. It depends on how the data will be stored and on which machine it will be stored. Cassandra will use consistent hashing to map the row key to a machine. So we need to choose a good row key to make sure the data is evenly distributed across all machines.
Cassandra does not support range queries on row keys. So we need to choose a good row key to make sure the data is evenly distributed across all machines. The User ID is a good choice for a row key.
Column key
Insert(row_key, column_key, value) The column key is sorted and it can be a composite key, so we can do range queries on the column key. Query(row_key, start_column_key, end_column_key) A good choice for a column key is a timestamp, so we can do range queries on timestamp.
example 1
How to get all friends of user_id = 1 between timestamp 2023-01-01 and 2023-02-01
CREATE TABLE relationship (
user_id text,
timestamp date,
friend_user_id int,
value text,
PRIMARY KEY (user_id, timestamp, friend_user_id)
);
flowchart TB
A[relationship table]
A --> B[user_id - partition key]
A --> C[timestamp - clustering key 1]
A --> D[friend_user_id - clustering key 2]
A --> E[value]
B --> F[Partition: all rows for one user_id]
C --> G[Sorted by timestamp]
D --> H[friend_user_id within timestamp]
- partition key: user_id
- row key 1: timestamp
- row key 2: friend_user_id
- value
The column key is now a composite key of (timestamp, friend_user_id), and it is sorted by timestamp first, then friend_user_id.
But in this way we cannot get all friends of user_id = 1 directly; we need to do a range query on timestamp. So we can set a very large range for timestamp to get all friends. So we need to create a new table to store all friends of a user without timestamp.
example 2
- Store all user data inside
UserTable - Store in Redis with key = tweet_id, data = tweet_data
Data can be stored in cache, so we just need to get owner_id from Cassandra, get tweet IDs, and get the data from the cache without going to Cassandra again.
SQL vs NoSQL
- SQL columns are predefined in the schema; NoSQL columns are dynamic.
- NoSQL is more scalable than SQL. Columns are dynamic and can be added or removed easily.
- A record in SQL is a row; a record in NoSQL is a key-value pair. In Cassandra, a record is a row_key + column_key + value.
- The column_key type should be defined in advance, but the column_key itself is dynamic.
Rules for choosing SQL vs NoSQL
- What type of database should be used for the friendship table? In most cases, both SQL and NoSQL can be used to store a friendship table.
- But if you need to support transactions, NoSQL cannot be used.
- SQL is easier for data model relationships and indexing (CP system); NoSQL is easier for scalability, flexible schemas, and replication (AP system).
- Normally a website will use several databases for different services. For example, the user service can use an SQL database, and the friendship service can use a NoSQL database.
Normally, for the user table, we will use an SQL database like MySQL/PostgreSQL. For the friendship table, we can use a NoSQL database like Cassandra/MongoDB, as the data structure is simple and there are only key-value requests.
Q1: Store unidirectional friendship in Cassandra
- Query follower list.
- Query following list.
- Query whether user A is following user B. List tables and table structure.
We need two tables to support these three queries. Followers table: From (Follower) -> To (Followed) -> Value userid_1 -> userid_2 -> {} => 1 Follow 2
Following table: To -> From -> Value userid_2 -> userid_1 -> {} => 1 Follow 2
Q1: query user_id in table 2 where to = user_id. Q2: query user_id in table 1 where from = user_id. Q3: query value in table 1 where from = A and to = B to check whether there is a value.
Redis:
key = user_id value = set of friend_user_ids Check whether user A is following user B: Use the SISMEMBER command to check whether user B is in the set of user A.
Cassandra:
row_key = user_id column_key = friend_user_id value = the data we want to store for the friendship, like the follow time; it can be empty.
Q2: NoSQL to store user profile
How to store a user profile in a NoSQL database that does not support multi-indexes? How to search for a user by email, username, phone ID?
- Have one table for user profiles with user_id as row_key.
- Create three tables for email, username, and phone ID; each stores the user_id of the user profile.
flowchart LR
C[(Cassandra: user_profile)]
R1[(Redis: email_lookup)]
R2[(Redis: username_lookup)]
R3[(Redis: phone_lookup)]
R1 -->|email -> user_id| C
R2 -->|username -> user_id| C
R3 -->|phone_id -> user_id| C
Q3: Mutual Friends
List mutual friends between user A and user B and the number of mutual friends. WeChat scenario
Friend table: UserA -> UserB -> Value userid_1 -> userid_2 -> {} => 1 & 2 Friend userid_2 -> userid_1 -> {} => 1 & 2 Friend
Use SQL, as we need transactions to make sure both entries are created or none.
select from friendship where from_user_id = A and to_user_id in (
select to_user_id from friendship where from_user_id = B);
-- alternative solution use with
with B_friends as (
select to_user_id from friendship where from_user_id = B
)
select to_user_id from friendship
where from_user_id = A and to_user_id in (select to_user_id from B_friends);
select to_user_id into #B_friends from friendship where from_user_id = B
-- use join operation
select f1.to_user_id
from friendship f1
join friendship f2
on f1.to_user_id = f2.to_user_id
where f1.from_user_id = A and f2.from_user_id = B;
If we use NoSQL
- Get all friends of user A
- Get all friends of user B
- Get the intersection of two sets to get mutual friends.
Q4: LinkedIn connection levels
1st degree: direct connection 2nd degree: connection of 1st degree 3rd degree: connection of 2nd degree …
Can we use BFS to find the connection level between user A and user B? Graph problem. BFS can find the shortest path between two nodes.
user -> set of friends
user, friends = level
Update every several days.
Scenario
- Find the degree between you and another user in the search bar.
- Number of users > 100m.
- Average friends per user: 1000.
- Expected DB query time < 20 queries (constant time).
We work on the fixed user list, as it is impossible to get all users’ friends.
BFS is not possible as the data is too large to fit in memory. For the first level, we need to get 1000 friends; for the second level, we need to get 1000 * 1000 = 1M friends; for the third level, we need to get 1000 * 1000 * 1000 = 1B friends, which is impossible. For database query, we need to have complexity to be O(1).
One possible solution is to take A and B, find A’s friends and then B’s friends, and then find the intersection of the two sets. If the intersection is not empty, then they are 1st-degree connections. If empty, then find A’s friends’ friends and B’s friends’ friends, and find the intersection again. Repeat until we find the connection or reach the max level.
What is 3+ degree connection? Beyond 3rd degree, we can just show “3+ degree connection” as it is not very meaningful to show the exact degree. So for the first operation, if it is not found, we can just return 3+ degree connection directly.
Method:
- Precompute and store 1st- and 2nd-degree connections for each user.
- First-degree table: user_id -> set of 1st-degree friend_user_ids.
- Second-degree table: user_id -> set of 2nd-degree friend_user_ids. These two tables can be built offline and updated every day.
- Query the first-degree table and second-degree table for A and query the first-degree table for B. We can have 10 B, B1, B2, …, B10 to check whether A and B are 1st- or 2nd-degree connections.
- Then we can get the 1st- and 2nd-degree connections for A and B_i respectively and find the intersection of the two sets.
We need 10 + 2 queries to find A and its 1st- and 2nd-degree connections and B_i’s 1st-degree connections.
This method is called bi-directional BFS.
For example, if you want to find how many degrees (1, 2, 3) for A and B, you can refer to the following diagram. This diagram summarizes the approach: precompute first- and second-degree sets offline, then answer online queries by intersecting A’s precomputed sets with B’s first-degree set to decide 1st, 2nd, or 3+ degree quickly.
flowchart TB
%% Offline preprocessing
subgraph Offline[Offline Precompute - daily]
U[(Users)]
F[(Friendship edges)]
T1[(First-degree table)]
T2[(Second-degree table)]
U --> F
F --> T1
T1 --> T2
end
%% Online query
subgraph Online[Online Query - A and B]
Q[Input: A and B]
A1[Fetch A 1st-degree set]
A2[Fetch A 2nd-degree set]
B1[Fetch B 1st-degree set]
I1{A1 intersect B1?}
I2{A2 intersect B1?}
R1[Return: 1st-degree]
R2[Return: 2nd-degree]
R3[Return: 3+ degree]
Q --> A1
Q --> A2
Q --> B1
A1 --> I1
B1 --> I1
I1 -->|non-empty| R1
I1 -->|empty| I2
A2 --> I2
B1 --> I2
I2 -->|non-empty| R2
I2 -->|empty| R3
end
%% Data sources for online query
T1 --> A1
T2 --> A2
T1 --> B1