Twitter, Rails, Hammers, and 11,000 Nails per Second
There’s an interesting kerfluffle going on regarding the scaling woes that Twitter.com is going through, especially since it’s built on Ruby On Rails. Here’s the original interview with one of the Twitter coders, the somewhat evasive reply by the lead Rails architect, and Mark Pilgrim’s cruel-but-funny dissection of the latter.
Ruby is a lovely language and Rails is a lovely framework, but both of them trade performance for aesthetics and convenience. That is, they’re slow. No problem, though — in the magical world of web apps, you just solve performance problems by throwing hardware at the problem, usually in the form of more and more web servers. (If I sound sarcastic, I’m just envious because I write client-side software, where we don’t have that convenient fallback.)
Then, since 99.99% of web apps use a SQL database, the next bottleneck is the database server that all the web servers are talking to. The next step of course is to buy more database servers, but then you have to distribute/replicate the database across those servers, which is even more challenging. (Or if you’re a crazed wunderkind like LiveJournal founder Brad Fitzpatrick, you invent a memory-based distributed hashtable as a cache to put in front of the database.)
SQL as the universal hammer
This whole thing has me wondering why it is that SQL databases are used as the all-purpose hammer for solving all data storage problems for web apps. To some degree this has to be a historical accident, because I don’t think it’s necessarily obvious. My guess is that (a) a lot of companies already used these databases for storing their, uh, data; (b) in the mid-’90s all these companies were desperate to Get On The Web; so (c ) countless web developers wrote “three-tier” frameworks for gluing those SQL databases onto web servers.
I’ve been using SQL a lot (via sqlite, as a data store for the Syndication framework) and it’s quite nice. But I don’t think SQL — or any sort of query-based database system — is the right tool for every job, in particular these kinds of social-messaging apps like Twitter.
“11,000 requests per second”
The figure that stands out to me is “…up to 11,000 requests per second”. Jesus Christ, that’s a lot. Where does that come from? Even assuming a million members who each post ten times a day, that’s only about 100 posts per second. If each member runs a client app polling for updates every 15 minutes, 24 hours a day, that’s still only about 1,000 hits/sec. There’s still an order of magnitude unaccounted for. (Maybe the apps poll every 1.5 minutes?)
But think about the polling again. Each of these requests just boils down to the client asking “what messages did I get since the last time I asked?” The SQL way of answering that question is to run a complicated join that looks up all of the members I’m subscribed to, looks up all the messages of those members, and selects the ones whose timestamp is later than the date of the client’s last request. Sure, there are indexes, and databases are optimized for this stuff, but it’s still going on a thousand times a second.
Now, a more sane way to do this (if I may be so bold) is to keep a message queue for every member, and whenever someone posts a new message, copy a pointer to that message into the queue of every member who’s subscribed. Then when someone polls, you just have to get the messages out of their queue and send them along. If that sounds kind of familiar, it’s because that’s exactly how email works — the Twitter posting interface is like a listserv, and the polling interface is like a POP server.
The cool thing is that you don’t even need a database to do this. You can store all of this stuff using very simple file formats, with one file per message queue. Since this is so much simpler than triple-decker SQL joins, one server can handle more requests; and if that’s too much, you can use a networked filesystem (or even a SAN) to distribute it across servers.
And of course you could keep going with this novel “distributed” idea, and make this into an actually-distributed system, where individual users (or groups of them) can run their own Twitter servers that queue their incoming messages and relay their outgoing ones. Imagine!
“We’re all doomed,” sighed Eeyore
The trouble is, the idea of centralizing is so entrenched, and there’s so much industry support for “scalability” by means of piling on more servers and moving to bigger data-centers with fatter pipes, that I can see that this is going to go on for some time. (I’ve heard that AIM handles over a billion IMs a day, every one of which streams in and out of an unfathomably fat pipe in a huge AOL data center in Virginia. Shudder.)
In some moods (the same sorts of moods where I fantasize about how $5-per-gallon gas would kick-start desperately needed changes in our transportation system) I think about what would happen if all the DNS servers mysteriously vanished, all the data centers blew up, and we had to re-implement everything in a distributed peer-to-peer fashion. It’s actually really interesting …
May 1st, 2007 at 1:18 PM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second — Thought Palace There’s an interesting kerfluffle going on regarding the scaling woes that Twitter.com is going through, especially since it’s built on Ruby On Rails. Here’s the original interview with one of the Twitter coders, the somewhat evasive reply by the le (tags: mooseyard.com 2007 twitter ruby_on_rails blog_post) […]
May 4th, 2007 at 11:49 AM
Relational database management systems are not the only kinds of databases out there. If you need persistence, scalability and performance, there are lots of products that fit the bill … object-oriented databases. What’s the first thing Twitter had to do? Put an object cache in front of their RDBMS. Why not avoid that pain from the outset and start with an OODBMS?
May 29th, 2007 at 10:29 PM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second — Thought Palace (tags: design rails performance) […]
July 5th, 2007 at 5:53 PM
One of the reasons that SQL is popular on the web is the existence of default connectors from the application side. If the default connector was to a navigational dbms, then that is what might have become more prevalent.
At the inception of the web, SQL was NOT as well used as some would have us believe. One billion dollar company had only 1 SQL database for a minor support application, everything else was run on the AS/400 native file system, where everything including the file system is a database. So, if you sell a billion dollars worth of goods through a 300 location discount store operation, and *every* item is a line item entry taken from point of sale reporting boxes, that’s a *lot* of inserts. No way was SQL ever going to handle this volume in a 5 hour processing window.