Facebook uses a relational database to keep the primary data.
Facebook uses fork of MySql 5.6 to keep the social graph and facebook messenger data (more than 1B users). They use Innodb storage for the social graph (B+ tree index, fast reads and slow writes) and RocksDb storage for the messenger data (LSM tree index, fast writes and slow reads). MySQL is running on pure flash. Initially facebook messenger kept all data in HBase but was migrated to MySQL to save storage space, reduce read latency and improve support experience.
This is an example (probably outdated) of the objects that Facebook keeps in MySql:
You can find more information here. Performance numbers are impressive (as of 2014):
Query response times: 4ms reads, 5ms writes.
Rows read per second: 450M peak
Network bytes per second: 38GB peak
Queries per second: 13M peak
Rows changed per second: 3.5M peak
InnoDB disk ops per second: 5.2M peak
Facebook uses more than 1000 MySql servers to keep the data (it’s so called universal (multi tenant) database). The database is heavily sharded and replicated. Related data is kept together in a shard to reduce the number of shards they need to hit to find the data.
They also have a lot of binlog consumers. Binlog is a replication mechanism of MySql that independent of a storage type.
The MySql database is heavily cached so more than 99% requests hit the cache.
The backend looks like this:
HPHP is a frontend application, cluster and region caches are instances of memcache that use leases instead of locks to avoid race conditions.
Cluster cache keeps very hot information that can have 10 – 100 million accesses per second and these data are cached for each region. Region cache keeps less accessed data and they keep only single copy of the data.
TAO is graph service that is based on memcache server but understands the facebook graph. It allows to optimize the performance of the cache. It also has read after write consistency that means that if you update something, you must always see the result of this update (however other users may not see the most recent update).
Wormhole is publisher-subscriber system. MySql publishes binlog changes to this system so other applications can consume the data.
Other databases at Facebook.
Hadoop and Hive are used for aggregation and reporting. They stores less valuable data like user’s clicks.
LogDevice is a distributed data store for logs based on RocksDb.
HydraBase a database on top of HBase where Facebook used to keep messages (deprecated).
Beringei is an in-memory time series storage engine. Beringei stores up to 10 billion unique time series and serves 18 million queries per minute. It is used to track and store system measurements such as product stats (e.g., how many messages are being sent per minute), service stats (e.g., the rate of queries hitting the cache tier vs the MySQL tier), and system stats (e.g., CPU, memory, and network usage), so that we can see the real-time load on our infrastructure and make decisions about how to allocate resources.
Presto is an open source, high-performing, distributed query engine targeted at making SQL analytics over big data at Facebook. Facebook is using Presto to accelerate a massive batch pipeline workload in their Hive Warehouse. Presto is also used to support custom analytics workloads.
Twitter keeps primary data including interest graph, timelines, user data and tweets in their own version of MySql 5.6. Twitter stores data in hundreds of schemas and their largest cluster is thousands of nodes serving millions of queries per second. The graph service is called a flockdb.
MySql handles ~50M QPS, mainly reads and stores petabytes of data.
The database cluster is heavily sharded and Twitter uses its own sharding framework. They also use standard MySql replication.
The most of the data Twitter keeps in Manhattan and Hadoop.
Manhattan is a multi tenant, highly available low latency key value storage (replaces Cassandra that was used for the same purpose before). It is used to keep all tweets, messages and accounts at Twitter. It has three storage engines:
- seadb, read-only file format for batch processed data from hadoop
- sstable, log-structured merge tree based format for heavy write workloads
- btree, btree based format for heavy read, light write workloads
Like Cassandra, Manhattan is replicated and offers eventual consistency as the default mode of operation. Keys are mapped to shards and each shard is replicated based on a per-shard distributed log (built on top of Apache BookKeeper) and a consensus protocol. Like Cassandra, Manhattan has two-level keys, with the first key (partitioning key) identifying the shard and the second key (local key) identifying values on a shard.
Redis is used to keep your timeline in memory and fork of memcache (twemcache) to cache the data like users, tweets, etc.
Blobstore is a photo/video storage at Twitter.
Vertica is a column store that used as a backend for Tableau.
Relational databases, like MySQL and PosgreSQL are used where Twitter needs strong consistency like managing ads campaign, ads exchange as well as internal tools.
The backend looks like this:
The data distribution across all data stores:
For more details click to this link.
Google uses a lot of different databases for different products.
Google search uses Bigtable for web indexing and to keep unstructured data like web logs for Google Analytics. Google also used Megastore/MySql for many services but they were deprecated in favor of Spanner.
Spanner is a NewSQL database (means it is almost relational :)). Originally it was developed to keep data for AdWords. Since it supports ACID transactions and it’s pretty easy to develop web applications on top of it, it became very popular. Right now Spanner is available as a part of google cloud platform. Spanner has evolved from a Bigtable-like versioned key-value store into a temporal multi-version database. Data is stored in schematized semi-relational tables; data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage-collection policies; and applications can read data at old timestamps. Spanner supports general-purpose transactions, and provides a SQL-based query language.
Some applications at Google use both, Spanner and Bigtable to keep different type of data.