假设我关注了 1000 多个用户。当我登录 Twitter 时,我会看到我关注的 1000 位用户的按时间顺序排列的推文列表。 Twitter 如何获取这些推文。
我知道他们可以做这样的事情:
select * from tweets where userID in (id_1,id_2,...,id_n) limit 50 order by time;
但是,“in”子句可能真的很慢。
how twitter works, does it use "in" to select tweets for followers?
和
Twitter like Model using SQL Server/Azure or Graph DB
说出相互矛盾的事情,引用的链接已经过时了。
那么 Twitter 如何高效地获取我关注的人的所有推文?
最佳答案
有一个 blog post深入了解 Twitter 的数据库基础设施:
When you tweet it's stored in an internal system called T-bird, which is built on top of Gizzard. Secondary indexes are stored in a separate system called T-flock, which is also Gizzard based. Unique IDs for each tweet are generated by Snowflake, which can be more evenly sharded across a cluster. FlockDB is used for ID to ID mapping, storing the relationships between IDs (uses Gizzard). Gizzard is Twitter's distributed data storage framework built on top of MySQL (InnoDB). InnoDB was chosen because it doesn't corrupt data. Gizzard is just a datastore. Data is fed in and you get it back out again. To get higher performance on individual nodes a lot of features like binary logs and replication are turned off. Gizzard handles sharding, replicating N copes of the data, and job scheduling. Gizzard is used as a building block for other storage systems at Twitter.
您发布的第一个链接似乎是正确的,第二个链接只是建议您可以用来实现类似 Twitter 的数据库结构的模式。
因此,为了回答您的问题,所有推文都根据发布推文的时间存储在 block 中,并赋予唯一 ID,这些 ID 已编入索引。然后,FlockDB 管理 ID 之间的关系,即决定谁关注谁以及谁看到推文的关系。
关于mysql - Twitter 如何有效地检索用户的推文提要?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22390562/