When we talk about storage in the context of system design, we refer to the methods and technologies used to store data persistently. This is a crucial aspect of any application, as it determines how data is saved, retrieved, and managed over time.

Based on the requested data throughput, we need to scale the strorage horizontally or vertically. In the case of horizontal scaling, we can use distributed databases or sharding techniques to spread the data across multiple nodes. Vertical scaling involves upgrading the existing storage system to handle more data or faster access speeds.

Replication

The Replication is a way to ensure data availability and reliability by creating copies of the data across multiple storage systems. The most common structure for replication is the Leader-Follower paradigm.

Leader-Follower Structure

The leader node is responsible for handling write operations. If you have an application that you use to collect the data, the leader node will receive the data first. The follower nodes replicate the data from the leader and handle read operations. The huge volume of users that use your collected data will have their endpoints targeted to the follower databases. This setup allows for load balancing and fault tolerance as if one node fails, others can still serve requests. We can also point one follower as the new leader as it has almost the whole data. This is a very simple way to scale the read operations of your application.

Asynchronous and Synchronous replication

Asynchronous replication means that the leader node does not wait for the follower nodes to acknowledge the write operation before responding to the client. This can lead to faster write operations but may result in temporary inconsistencies between the leader and followers. The synchronisation is done by some scheduled task that runs periodically with a delay. Some classic scheduled tasks patterns are RT (really time, seconds, minutes), Incremental (days), and Full sync (weeks).

Synchronous replication, on the other hand, requires the leader to wait for all followers to acknowledge the write operation before responding. This ensures data consistency but can introduce latency in write operations. Sometimes we do that by having a API on the follower nodes, that we trigger the api when we add new data to the leader node and the follower nodes will update their data immediately based on the data’s primary key that we passed to it. But this will increse the latency of the write operations.

Leader-Leader Structure

In a leader-leader structure, multiple nodes can handle write operations simultaneously. This setup is more complex and requires conflict resolution mechanisms to ensure data consistency. It is often used in distributed databases where high availability and scalability are critical or each leader is setup for a region and they occupy the users from their own region.

Sharding

When the data volume is too large to be handled by a single database instance, sharding is used to split the data across multiple databases. Each shard contains a subset of the data, and the application logic determines which shard to access based on the data being queried. This allows for horizontal scaling and improved performance.

For example, if you have a user database, you can shard it by user ID, where users with IDs 1-1000 go to shard 1, 1001-2000 go to shard 2, and so on. This way, each shard only contains a portion of the total data, making it easier to manage and query. You can also shard by region, where users from Europe go to one shard, users from Asia go to another, and so on. This allows for better performance and reduced latency for users in different regions. Hashing is often used to determine which shard a particular piece of data belongs to, ensuring an even distribution of data across shards.

To design a good sharding strategy, it is somehow like to design a good partitioning strategy while the later is for a single database instance. You need to consider the access patterns of your application, the size of the data, and the expected growth rate. A well-designed sharding strategy can significantly improve the performance and scalability of your application.

It is also important to note that after sharding your data, the join operations between the shards can be more complex and may require additional logic in your application. This is because each shard is a separate database instance, and you cannot perform joins across shards like you would in a single database instance. Therefore, it is essential to design your data model and access patterns with sharding in mind to minimize the need for cross-shard joins.

CAP Theorem

CAP represents Consistency, Availability, and Partition Tolerance. It is a fundamental principle in distributed systems that describes the trade-offs between these three properties. But actually the Partition Tolerance is a must in distributed systems, so we can say that the CAP theorem states that you can only choose between Consistency and Availability when a network partition occurs. The CAP theorem states that in a distributed data store, you can only guarantee two out of the following three properties at any given time:

Consistency

All nodes see the same data at the same time. This means that when a write operation is performed on the leader node, all follower nodes will eventually reflect that change. In a consistent system, if you read data from any node, you will always get the most recent version of that data. This is crucial for applications where data integrity and accuracy are paramount, such as financial systems or inventory management.

Availability

Every request receives a response, either success or failure or the percentage of requests that get a response. This means that the system is always operational and can handle requests from clients. In an available system, even if some nodes are down or unreachable, the system can still respond to requests. This is important for applications that require high uptime and responsiveness, such as e-commerce platforms or social media sites.

If we have the partition torerance and we allow only user to read from the leader node as the follower nodes are not consistent with the leader, we can say that the system is consistent and partition tolerant, but the resquest goes to the follower nodes no longer have response.

Partition Tolerance

The system continues to operate despite network partitions, which means if we cut the internet connection between one or more nodes, the whole system can still work. For example, if we cut the connection between the leader and follower nodes, the leader can still accept write operations, and the followers can still serve read operations based on the last known data, in this case the system has partition tolerance and availability, but the consistency is not guaranteed.

Generally, in a distributed system, it is impossible to achieve all three properties simultaneously. Therefore, system designers must make trade-offs based on the specific requirements of their applications. For example, if your application requires strong consistency, you may need to sacrifice availability during network partitions. On the other hand, if your application can tolerate eventual consistency, you can prioritize availability and partition tolerance.

PACELC

The PACELC theorem extends the CAP theorem by adding a consideration for latency. It states that in the presence of a partition (P), a system must choose between availability (A) and consistency (C). However, even when there is no partition, the system must still consider the trade-off between latency (L) and consistency (C).

Object Storage

Object storage is a data storage architecture that manages data as objects, as opposed to file systems or block storage.

The objects are stored in a flat address space. For example on aws s3, each object is identified by a unique key, which is used to retrieve the object. The folder you see in the s3 bucket is just a prefix of the key, and it is not a real folder like in a file system.

When we talk about object in object storage, we talk about BLOB (Binary Large Object) such as Images, videos, backup files. We can Read and Read the objects, but we can not update and object. There is a global unique identifier for each object, which is used to retrieve the object from the storage system. This allows for easy access and management of large amounts of unstructured data.

The BLOBs are not friendly for the database as they are not structured data and need to have very big throughput to be able to read and write them efficiently. So write and read these big files should be done directly to the object storage system with a simple request like the http request to the object storage API.

Replication#

Leader-Follower Structure#

Asynchronous and Synchronous replication#

Leader-Leader Structure#

Sharding#

CAP Theorem#

Consistency#

Availability#

Partition Tolerance#

PACELC#

Object Storage#