Twitter, Rails, Hammers, and 11,000 Nails per Second
There’s an interesting kerfluffle going on regarding the scaling woes that Twitter.com is going through, especially since it’s built on Ruby On Rails. Here’s the original interview with one of the Twitter coders, the somewhat evasive reply by the lead Rails architect, and Mark Pilgrim’s cruel-but-funny dissection of the latter.
Ruby is a lovely language and Rails is a lovely framework, but both of them trade performance for aesthetics and convenience. That is, they’re slow. No problem, though — in the magical world of web apps, you just solve performance problems by throwing hardware at the problem, usually in the form of more and more web servers. (If I sound sarcastic, I’m just envious because I write client-side software, where we don’t have that convenient fallback.)
Then, since 99.99% of web apps use a SQL database, the next bottleneck is the database server that all the web servers are talking to. The next step of course is to buy more database servers, but then you have to distribute/replicate the database across those servers, which is even more challenging. (Or if you’re a crazed wunderkind like LiveJournal founder Brad Fitzpatrick, you invent a memory-based distributed hashtable as a cache to put in front of the database.)
SQL as the universal hammer
This whole thing has me wondering why it is that SQL databases are used as the all-purpose hammer for solving all data storage problems for web apps. To some degree this has to be a historical accident, because I don’t think it’s necessarily obvious. My guess is that (a) a lot of companies already used these databases for storing their, uh, data; (b) in the mid-’90s all these companies were desperate to Get On The Web; so (c ) countless web developers wrote “three-tier” frameworks for gluing those SQL databases onto web servers.
I’ve been using SQL a lot (via sqlite, as a data store for the Syndication framework) and it’s quite nice. But I don’t think SQL — or any sort of query-based database system — is the right tool for every job, in particular these kinds of social-messaging apps like Twitter.
“11,000 requests per second”
The figure that stands out to me is “…up to 11,000 requests per second”. Jesus Christ, that’s a lot. Where does that come from? Even assuming a million members who each post ten times a day, that’s only about 100 posts per second. If each member runs a client app polling for updates every 15 minutes, 24 hours a day, that’s still only about 1,000 hits/sec. There’s still an order of magnitude unaccounted for. (Maybe the apps poll every 1.5 minutes?)
But think about the polling again. Each of these requests just boils down to the client asking “what messages did I get since the last time I asked?” The SQL way of answering that question is to run a complicated join that looks up all of the members I’m subscribed to, looks up all the messages of those members, and selects the ones whose timestamp is later than the date of the client’s last request. Sure, there are indexes, and databases are optimized for this stuff, but it’s still going on a thousand times a second.
Now, a more sane way to do this (if I may be so bold) is to keep a message queue for every member, and whenever someone posts a new message, copy a pointer to that message into the queue of every member who’s subscribed. Then when someone polls, you just have to get the messages out of their queue and send them along. If that sounds kind of familiar, it’s because that’s exactly how email works — the Twitter posting interface is like a listserv, and the polling interface is like a POP server.
The cool thing is that you don’t even need a database to do this. You can store all of this stuff using very simple file formats, with one file per message queue. Since this is so much simpler than triple-decker SQL joins, one server can handle more requests; and if that’s too much, you can use a networked filesystem (or even a SAN) to distribute it across servers.
And of course you could keep going with this novel “distributed” idea, and make this into an actually-distributed system, where individual users (or groups of them) can run their own Twitter servers that queue their incoming messages and relay their outgoing ones. Imagine!
“We’re all doomed,” sighed Eeyore
The trouble is, the idea of centralizing is so entrenched, and there’s so much industry support for “scalability” by means of piling on more servers and moving to bigger data-centers with fatter pipes, that I can see that this is going to go on for some time. (I’ve heard that AIM handles over a billion IMs a day, every one of which streams in and out of an unfathomably fat pipe in a huge AOL data center in Virginia. Shudder.)
In some moods (the same sorts of moods where I fantasize about how $5-per-gallon gas would kick-start desperately needed changes in our transportation system) I think about what would happen if all the DNS servers mysteriously vanished, all the data centers blew up, and we had to re-implement everything in a distributed peer-to-peer fashion. It’s actually really interesting …
April 19th, 2007 at 9:06 AM
addy: “Notice that you have upped the storage requirements by a factor of 15 (!!), not to mention the increased IO throughput needed to support writing all that data.”
And, you destroy your credibility. You were so intent on cheerleading for RDBMS (which needs no cheerleaders, it’s already omnipresent) that you missed what the author plainly wrote. The idea was compared to email, but it was specifically stated that the queues in this case would contain POINTERS. You’re conflating number of “inserts” with storage capacity, putting them on a 1-to-1 standing, when that is simply wrong. There would not be 15 full copies of every message.
The rest of your diatribe was just as silly, but there’s a glaring example just for any impressionable readers who might think you’re right just because you’re self-assured.
April 19th, 2007 at 9:07 AM
Databases are useful for general-purpose storage, but they tend to fall down because they’re very general-purpose. After reading through the BigTable paper once I thought “what is the point?” but after reading through it a second time I can see the brilliance — they’ve kinda isolated out the important bits of a database (primary keys, a magical “store this data for me, I don’t care where” API) and left out all the rest to get something much more performant and scalable.
April 19th, 2007 at 12:06 PM
I recently posted an article on Rails vs PHP Scaling after I had reading the original twitter article about scalability problems. I know they do have quite a lot of traffic but to be frank the type of page they are rendering does not require much power. They have very simplistic templates and the AJAX requests are tiny. I think its actually an abuse of Rails as the site does not take advantage of the power of Rails.
I completely agree that it could be implemented using a filesystem, its a similar approach to large sites that move out their image serving onto a seperate faster http server (from apache) because the images are static and just get served up again and again. Twitter is one big rolling log file (as such).
April 19th, 2007 at 1:31 PM
[…] April 19th, 2007 in Links Twitter, Rails, Hammers, and 11,000 Nails per Second pokes the thinking stick at the Twitter + Rails-suckfest by doing the arithmetic. The claim is that Rails sucks based on poor performance at 11k “somethings” per second, but 11k of what? It’s a good question. […]
April 19th, 2007 at 2:18 PM
Ok, several people have already addressed this, but still:
We *used* to store data in the way you describe. Back in the ’70s. We had many ad hoc systems for storing data in text files, the file system etc. It was a bloody mess. Everyone had to solve the same problems for each application. Some wheren’t even solvable.
That’s why we evolved and invented RDBMSes. Not only a lot of thought have gone into their design, not only they are scalable, support data integrity, automatically manage constraints and concurrency etc, but they are also the only storage mechanism based on a *strict*, *mathematical* model for data and querying, namely the relational model. Relational modelling is mathematics. Everything else is ad hoc junk glued together.
Seems that those that don’t know history (i.e the dark ages of application specific storage before DBMSes), are doomed to repeat it.
Also, your take on twitter is quite narrow. Today, a simple text file or filesystem strategy might work. Can you really guess their tommorow needs? What about having the same data used by other web applications? How about adding several additional features?
Here are some further pointers:
A Call for Knowledge and Reason
DBDebunk
April 19th, 2007 at 3:08 PM
Twitter doesn’t just use mysql as a backend. They’ve also got memcached and a custom written version of map-reduce. The stuff you get on the first page, your last 24 hours of updates from friends, and the most recent updates by anybody are NOT pulled from the database. Those are stored in distributed in memory caches (object stores really).
If you want to know how twitter scales, come to the talk Blaine is giving at the Silicon Valley Ruby Conference on Sunday in San Jose.
April 19th, 2007 at 3:12 PM
@foljs - You are kidding right? RDBMS’s were designed more than a quarter of a century ago, dealing with things like low available memory — 640K ought to be enough for everyone. RDBMS’s were designed with transactional capabilities in mind - I would argue that in many of todays applications ACID semantics for transactions has no meaning. OK, so you drop one out of a 1000 messages, or (more likely) send them in the wrong order. Geez, would that cause a difference in the quality of user experience?
I dont think anyone would advocate a pure file system based approach, but there are more elegant design decisions possible using a combinary of memory based offloading and database /file system backends.
April 19th, 2007 at 3:43 PM
[…] More on Twitter, Rails, and Scaling Filed under: Dynamic Languages — ashebanow @ 2:45 pm Just a brief update to point to some more good discussion on the issues with scaling Twitter to handle its load better, as discussed in my earlier post on Friction(less) Rails. The first thing I wanted to point out is that a lot of people have called Twitter’s purported load into question. The original source for the 11,000 requests/second number was DHH’s original post as far as I can tell. That’s a heck of a lot of load, and as others have pointed out, its a nice problem to have. Jens Alfke of Apple has written a nice post describing the problem and ways to handle it that do not involve databases. Its an interesting read, and the comments are worth a look as well. […]
April 19th, 2007 at 3:51 PM
Can’t knock you for trying to challenge the established thinking, at least :)
April 19th, 2007 at 5:45 PM
I liked the post. I think it makes good sense, to at least evaluate the filesystem. Specific problems have specific solutions. Rails is a generic web-app framework. I’m not an expert in database design, but to me it seems that the defacto rails database schema is not optimised for large number of queries. Maybe the flat file or flat db (eg star schema) structure for the read-bit of the database would be the way to go?
April 19th, 2007 at 6:25 PM
First of all, the 11,000 requests per second number is a mis-quote. Facebook does 17,000 requests per second (1.5 billion page views per day). We’re not that big, yet.
As Rabble points out, we’re not just using MySQL - Jens is absolutely correct that using SQL exclusively is the wrong approach here. There are simpler/better solutions than using the filesystem, however. I’ll be talking about these and other problems at the SDForum Silicon Valley Ruby Conference this weekend.
Re: the pull versus push question, we already do push - we support Jabber and AIM, and will be adding support for additional protocols soon. I’ll be talking about using Jabber for Social Software at XTech in Paris next month with Kellan Elliott-McCrea of Flickr.
Further, regarding centralization, I’d note that we do nothing to constrain the construction of bridges between social messaging applications. Jabber is an open protocol, and we are de-facto federated with all other open Jabber providers. The sky’s the limit, as they say.
April 20th, 2007 at 12:20 AM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second — Thought Palace Didn’t hierarchical storage die off with mainframes? (tags: twitter rails hierachical mainframe storage scaling performance via:afongen) […]
April 20th, 2007 at 12:34 AM
[…] I have not got involved in the Ruby/Rails community purely down to time constraints and just keeping up with PHP is enough for me at the moment. But this has given me a bit of an insight into how Rails is developing. The case for a file-based database for Twitter continues to rage on Thought Palace […]
April 20th, 2007 at 2:51 AM
I’ll add a real world example - a team where I work decided, appropos of nothing, to ‘avoid going to the database’ from their web app, and to store data locally in files. The next point was realising that requests could come in to any web server, so the files needed to be replicated or on a shared directory.
After a few months the customer started to complain about slow-down - which was eventually tracked down to the speed of file access - unless you’re using an indexed file system, opening file X in a directory containing 10000s of files, takes longer than opening file X in a directory of 10. Ditto having one folder per user. And if you have an indexed file system, you’re a good way towards re-implementing a database.
The best bit of the story is that the customer then came up with a requirement for a summary view, so we ended up having to open 100s of files to get out 1 value per row on the summary. At which point it really made sense to move over to a database table.
I’d guess that at the very least if you’ve abstracted your object persistence, switching from file storage to D/B storage should be less of a problem. (On the other hand, most ORM systems add a significant overhead anyway).
I’d also add that your traditional enterprise RDBMS systems tend to have memory cacheing, etc, in there already, so depending on which system you use there may be no advantage to putting a cache in front.
Which isn’t to say that there aren’t more effective architectures for Twitter - I don’t know a whole lot about it - but just that there are good reasons why the D/B - as a whole - is in the position it’s in, which I believe people need to understand before they choose to go with file storage or d/b storage. Informed decision making and all that. (Interesting to note how many of Apple’s client products have started making use of a D/B too).
SQL on the other hand, is another story - SQL is where it is by historical accident (being the language used by the first succesful implementation of an RDBMS) - much to the frustration of relational theorists, who would like to see the death of ‘NULL’.
April 20th, 2007 at 10:45 AM
[…] I saw an interesting update on the Twitter story. One of the best parts though was the link they provided to this article. […]
April 20th, 2007 at 4:37 PM
I think your analysis is short sighted and devoid of any reasonable knowledge of the problem at hand. First, we don’t even know what twitter’s architecture even looks like! How can you assume that the clients are making a request every 15 minutes? Did you know that Twitterrific default is ONE minute???
The original Twitter complaint was that scaling the DB tier is difficult with Rails OOTB, not whether or not it should use a DB. I am still stunned by this suggestion for a site which is getting 11000 hits per second and even more discouraged by the number of people who think using files and filesystem in the manner described is a viable architecture option - let alone an alternative - for a Twitterlike site.
April 21st, 2007 at 12:20 AM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second — Thought Palace […]
April 21st, 2007 at 3:24 AM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second […]
April 21st, 2007 at 3:37 AM
After reading the comment from ‘Little Boy Jebu’ I had to write a followup.
April 25th, 2007 at 9:05 AM
[…] Twitter, Rails, Hammers, and 11,000 Nails per Second — Thought Palace: […]