I gave another talk about Couchbase/CouchDB at the Keeping It Realtime conference this week in Portland. This one is titled “_ch_ch_changes: CouchDB/Couchbase Notifications And Replications”, and the slides are now up on slideshare.
I had a great time. The conference itself was pretty exciting, even if some of the content was over my head (I’m not primarily a web developer, server-side isn’t how I roll, and I’ve only just started learning about node.js this week!) Plus: Portland. OMG, I love Portland.
The documentation for Rdio’s new API begins:
It’s simple to make requests to Rdio’s REST API. It’s built on widely used standards and conventions so there are libraries for most common web development platforms. All method calls are made as POST requests to http://api.rdio.com/1/. Arguments are sent as application/x-www-form-urlencoded, just like when a browser submits a form. The name of the method is passed as the ‘method’ argument. [Emphasis mine.]
What’s wrong with this? Well, the first bolded point is immediately contradicted by the ones that follow. Specifically, this cannot be a REST API, because it uses only one URL and one HTTP method. Two of the key features of HTTP-based REST are that
So in a real REST API there would be a URL representing “my friends”, and a client could GET that URL to retrieve a list of friends, POST to it to add a friend (resulting in a new URL/resource representing that friend), PUT to a friend’s URL to update details, and DELETE to that URL to remove the friend relation. Instead, the actual Rdio API has a couple of dozen ad-hoc verbs including “addFriend” and “removeFriend”. I didn’t see one to get a list of friends or get info about a friend, but there is an Rdio-specific “get” verb that might work for those things. “Get” seems nicely general, but then for some reason there are also sixteen other specific getters ranging from “getActivityStream” to “getTracksInCollection”.
So it’s clear that Rdio’s “REST” API, like many other recent “REST” APIs such as DropBox’s*, isn’t REST at all. It’s more of an ad-hoc RPC scheme, which is ironic because there’s traditionally been a lot of enmity between REST proponents vs. those of RPC protocols like XML-RPC and SOAP. The SOAP boffins must be chortling behind their WSDLs at this.
Maybe we should just give up on the term REST, since it’s become so diluted as to mean nothing more than “HTTP API that’s not as hard to use as SOAP”?
I have backed up all the tweets from my Twitter account (@snej) to a local file, and am now mass-deleting all of them. This is a venerable form of protest that goes back to early BBSs like the WELL. Basically, I am no longer willing to donate my ‘valuable’ user-generated content to a centralized service that issues fuck-yous of this magnitude to its developers and users.
I could rant at length about the arrogance, stupidity and just plain creepiness of that message and the policies behind it, but I don’t know that it’s even worth it. Others have already done a pretty good job of deconstructing its marketroid Newspeak. I just can’t resist pointing out that two of the major components of Twitter’s content model—the @-mention and the #hashtag—were invented by early users and app developers, not by Twitter itself, then later integrated directly into the system to make them more useful. That’s a great example of collaborative development. Now, perversely, Twitter sees fit to tell app developers exactly how they can and can’t represent those same features in their UIs.
And yes, this is enforceable, because thanks to OAuth they can and will revoke an app’s access to Twitter at the flick of a switch. They brag about how they “revoke literally hundreds of API tokens / apps a week” [ibid]. I just now realized the implications of this, actually. OAuth may be more secure than traditional HTTP auth in that it doesn’t give apps access to your account password, but the centralization of control that it gives to service providers is really disturbing.
“But Jens”, you say, “you still have accounts on other centralized social networking sites such as Facebook, Tumblr, LiveJournal and flickr, many of which have also shown a similar disregard for users and developers. Why aren’t you deleting those accounts?”
Good question, anonymous readership. It comes down to three factors:
The big question in my mind is what to replace Twitter with. Ironically (and perhaps pathetically) I think I will end up reading Facebook more, because some of my Twitter friends are also there. At least until the next time Facebook does something egregiously evil.
In a larger sense, it should not be rocket science to build some plumbing that does what Twitter does—publish and subscribe small blobs—with an actually-decentralized architecture. There are a lot of smart developers out there, but to some extent we’ve been seduced into suckling at the proprietary API teats of big providers, at the expense of developing the next generation of open protocols.
Yeah, in my current day job I’m as guilty of this as anyone else. But at home I’ve got a garage full of various pieces of half-built tech that attempt to solve that problem in one form or another, if I could ever finish any of them. A lot of the trouble is motivation. Anyone want to help out?
Brent “NetNewsWire” Simmons raises the idea of an open protocol for syncing RSS/Atom subscriptions, that is, a way of keeping multiple local newsreader apps (like on a Mac and an iPhone) in sync with each other, so that they share the same set of subscribed feeds, and remember which articles have already been read. You can think of it as “IMAP for RSS”.
NetNewsWire already does this using Google Reader as an intermediary, and Apple’s PubSub framework (which is what Safari and Mail use) shares the read/unread state using MobileMe. But it would be nice to have an open protocol.
I have some experience with this, having implemented the sync system used by PubSub. It’s an interesting problem—you might think I would have just used Apple’s SyncServices, and it’s true that it would have worked great for the subscription list, but it doesn’t scale well to huge numbers of rapidly-changing “read/unread” flags.
I have two suggestions (which I would have made on Brent’s blog, except he doesn’t allow comments anymore.)
CouchDB is an awesome web-centric database engine. It doesn’t use SQL; instead, it’s a glorified key-value store whose values are arbitrary JSON objects, and which uses map-reduce for efficient querying. The basic API is pure REST, though glue libraries for many languages exist.
CouchDB natively supports syncing data through distributed groups of servers. It’s sort of like the way distributed version-control systems like Git or Mercurial work: multiple CouchDB instances each store a replica of the same data set, but can “pull” changes from each other over HTTP to stay in sync.
CouchDB is pretty lightweight and is already being used on the desktop by client apps: GNOME has been integrating it into the Linux desktop to use as a shared store for user data like contacts and bookmarks. It plays a similar role to SyncServices on Mac OS, but it’s all open source and any two instances can sync with each other instead of requiring a proprietary server. I hear this is already shipping in the latest Ubuntu releases.
It doesn’t look as though anyone’s designed a schema for storing RSS subscriptions this way, but it would be pretty easy to define one. You then need a local agent running CouchDB (it can be stripped down to be pretty small), a client library for Cocoa apps, and an upstream CouchDB server to sync to.
This protocol is similar to what I came up with for PubSub. It’s a simple extension of REST, but I haven’t heard of it being used elsewhere. The idea is that you model an append-only log file as an HTTP resource. The items that are logged are ‘events’ describing changes in the data model, in this case the subscriptions and articles.
The sync algorithm looks like this:
You can think of the log file as a queue or message stream that’s being collaboratively read and written by all of the clients. This sounds like something you’d need a fancy web-app to manage, but it turns out that all it takes is a typical HTTP 1.1 server and a trivial server-side script.
The download is a conditional GET, as used for fetching feeds themselves. The difference is that you use a “Range:” header to request only the bytes past the last known EOF. For example, if the last time you read the log it was 123456 bytes long, you add the header “Range: 123456-” to the request. This ensures that you only get back the new bytes that were added to the end. (And since this is a conditional GET, if the file hasn’t changed at all you just get back an empty 304 response.)
That’s all you need to do to track changes. Since the file is append-only, the only bytes you need to read are the ones added to the end. This request efficiently sends you just those bytes.
What’s cool is that this require no server-side software. If the log is a static file, any regular HTTP server like Apache will automatically handle GET requests for it, even byte-range ones. (Ranges are already used by browsers to resume interrupted downloads.) And it sends the response at high speed, since the server’s just streaming from a file, without multiple back-and-forth requests and without expensive database queries.
How about writing? Ideally you’d use the same approach, with a byte-range PUT that specifies that the request body should go at the end of the file. Unfortunately most servers don’t support this for static files, even though it’s basically just HTTP 1.1. But it’s really easy to implement. Any PHP crufter should be able to whip up a one-page script that simply responds to a POST by reading the request body and appending it to a local file (while doing the necessary ETag and range verification.) The great thing is that this script doesn’t have to know anything at all about RSS or subscriptions or unread counts; it’s completely generic. You can upgrade the data model without having to touch the script, and you could use the same script to sync anything, not just RSS.
(Yes, there is a semi-obvious drawback to this protocol: the file grows without limit. Surprisingly, this is not a problem most of the time, since clients only upload or download new data; the only real limit is the maximum file size or disk quota allowed by the server. But it does present a problem for a new client, whose first-time sync would download the entire file. This can be worked around by having new clients ignore very old data (only download the latest 10MB, say) or by periodically writing a compact subscription list to a separate URL.)
As everyone knows who works in the pet-food industry (or computer software for that matter), it can be hard to start eating your own dogfood. Case in point: I just this week set Chrome to be my default browser, though I’ve been working on it for four months now.
Partly that’s because when I started in July the Mac version of Chrome was too immature; and partly it’s because a web browser is something you need to have running and working all the time—especially since the Chrome project’s bug tracker and code-review tool are web-based.
But Mac Chrome is quite stable enough to use now, and as I haven’t been doing much Chrome development on this MacBook Pro lately (it takes too long to compile compared to my souped-up Mac Pro) I’ve installed the latest dev-channel build and replaced Safari with it in my Dock and as my default browser.
It’s hard to get used to a new browser, after all these years. I remember that I dropped IE 5 like a hot rock as soon as Safari became useable, but that’s because IE sucked so badly on OS X (as you youngsters may not remember.) But Safari is a great browser. Chrome’s great too, but in different ways, and the Mac version’s not finished yet so there are some missing bits.
I should note that I generally don’t work on the user-visible parts of Chrome, rather the underlying WebKit engine; so I haven’t been focusing on the UI much, or noticing the features being added, until experiencing them as an end-user.
In Chrome’s favor:
Rough edges (remember, this is a pre-beta build):
I’ve been impressed by Chrome’s stability too, for a pre-beta development build. The app hasn’t crashed once, and I haven’t even gotten the “Oh, snap!” page that shows that a renderer process crashed. (I’ve seen plugins crash a few times, but that’s probably Flash’s fault, and as in Safari on 10.6, this doesn’t affect the browser or even the rest of the page.)
One thing I’m really looking forward to is extensions. Safari’s a closed system, and I’ve long been envious of the plethora of cool plug-ins available for Firefox. I’m looking forward to using, and maybe developing, extensions for Chrome. (In the current dev channel Mac release, extensions can be installed, but the ones I’ve tried don’t do anything yet.)
I’ve been working at Google since last August. The Big G’s hiring process is rather weird—when you interview, it’s not for any specific team. It’s only after you get an offer that you decide which team to join, of the ones with open positions.
I decided on Google Sites, which I knew and liked from its days as JotSpot, a hosted wiki with some powerful features. It ended up not being the right place for me, for a couple of reasons:
The good news is that Google encourages transfers between teams, and makes it easy to do so. The near-total transparancy inside the company makes it easy to find out everything that’s going on, and there’s a well-designed website for engineering transfers that helps you find teams that need people. I even went to an internal job fair.
I’ve now ended up working on Chrome, Google’s web browser. The team I’m on is responsible for implementing HTML 5 features, as well as designing and implementing other new features (for standardization) that will help web apps become as powerful as native apps. Much of what we do will go into the WebKit source tree, where it will also directly benefit Safari, Android, the Palm Pre, and other WebKit-based browsers.
In fact, everything I work on (more or less) is going to be open source. Both the WebKit and Chromium source trees are public. You can view, if you care to, the one patch I’ve contributed so far and the one that’s currently out for review. That’s kind of mind-blowing, in a good way, to me, steeped as I am in the secrecy of Apple.
I’m pretty excited by this. There are quite a lot of things I’m interested in working on—client-side storage, local apps, drag-and-drop, better font support, menus, even far-out stuff like peer-to-peer networking. Forward in all directions!
So, Web 2.0’s heyday is over, and somewhere out there, Web 3.0 is slouching toward us waiting to be born. What will it be?
There’s really no such single thing as “Web x“, of course. And all predictions are really just wishes. That being said, my wish is that Web 3.0 will be about distributed systems. To oversimplify:
Web 1.0 built up big brand-name websites with their own content—things written by them, or repurposed from the media companies that owned them, or stuff to buy.
Web 2.0 embraced “user-created content” and interaction between users. The content creation has become less centralized, outsourced to whomever wants to register an account and post stuff, but the sites managing, storing and serving the content are still centralized.
Web 3.0, I hope, will take the decentralization to the software, and the storage. Monolithic web apps run by huge server farms—Facebook, Blogger, Twitter, Flickr, etc.—will be at least in part supplanted by apps that users run locally (or at least ‘nearby’) and which share data among each other.
Decentralized systems need well-defined protocols and data formats for communicating. We’ve been making headway with that as part of Web 2.0—there’s an arsenal of technologies like REST, Atom, AtomPub, OpenID, OAuth, RDF, JSON and so on—but they’re not well integrated with each other. And we need higher level abstractions.
I’ve been researching CouchDB this week, and I’m getting more and more excited by it the more I learn. It combines data storage, REST-based APIs, scalability and data propagation through replication, and even application hosting. It’s actually a lot like Google’s internal infrastructure, but in an open and modular form.
You can use CouchDB as the back end of a traditional web service, glomming more and more instances of the server together for scalability; that’s the kind of architecture that Google and Amazon use. But you can also run instances independently from each other, and have them pull data from each other, very much like the way distributed version control systems like Git and Mercurial operate. As I’ve said before, once you have a decentralized system, you can easily design centralized systems of any form as special cases.
Since each CouchDB instance also runs as a web server, that means I can run my social network from my machine, and you can run yours from yours, and yet they can be the same social network. But I can keep my private data private, and I can hack on my software if I want, and the load on my server only scales with the size of my friend list, no matter how big the entire global network grows.
These are things I’ve been thinking of for a while (and my unfinished Cloudy app includes some of them), but CouchDB comes closer than any other software platform I’ve seen to making them implementable. It’s still unfinished (nearing version 0.9 right now), and some of the authentication and replication features that would be needed for this aren’t ready yet, but it really sounds like the people developing CouchDB Get It, and are working to make this vision of Web 3.0 come true.
[If this sounds interesting to you, go and read the preliminary draft of the upcoming O’Reilly book on CouchDB. Only the first few chapters exist yet, but they’re well-written and lay out the basics pretty well.]
Ruby has a wide variety of HTML/XML templating engines, but none of the ones I’ve found work the way I’d like. It’s quite possible I’ve overlooked some, though.
My current gold standard for templaters is the Python library Genshi (which was inspired by Kid)—what I like about these is that your templates are valid XML: the parameters and control structures are expressed as special XML attributes and tags. This makes it easy to edit your templates in syntax-checking XML editors, and guarantees that your app serves valid XML or HTML.
But as I said, none of the Ruby template engines I’ve seen work this way. They’re all either generic macro systems that intermix markup and Ruby (like ERB), or they’re ruby APIs that output markup (like Builder, HAML and Markaby). I don’t really like either approach.
If you’ve got a good suggestion, I’ve got a gold star ready to stick on your forehead!
I haven’t used Chrome yet, though I know people who work on it, and it looks like a good browser with some good new ideas. But I’m unsure of the benefits of one of its main talking points: that what web applications really need is to have less browser “chrome” around them. As I put it in an IM to Julian Missig yesterday:
I think the problem isn’t that the browser chrome has too much [UI], it’s that the apps inside have too little.
Too little what? What are the web apps lacking? Since you ask:
A menu bar. Despite today’s fashion to get rid of them, menu bars are really useful. They’re a great interface for discovering the capabilities of an app, and for accessing rarely-used but important features. That’s why they were invented. Denied standard menus, web apps have tended either to splatter buttons all over the screen, or implement their own menus out of DHTML (usually doing a terrible job of even basic usability.) Keyboard shortcuts, if present, are divorced from menus so they’re not discoverable; and on the Mac at least, they awkwardly use a different modifier key (Ctrl, vs. Cmd) than “real” shortcuts.
A filing system. Web apps let you manage content, which often takes the form of documents, aka files (word processing, pictures, etc.), but there’s no standard mechanism to manage these files—the whole document model introduced by the Xerox Star and popularized by the Mac’s Finder doesn’t exist. Instead each such web-app has its own incompatible (and mostly lame) way to list, rename, copy or delete documents. If you’re lucky, it will support some modern conveniences like direct manipulation or a trash. What none of them give you is interoperability: if you want to take a document from one web-app, like Flickr, and manipulate it using a different app, like GMail or Photoshop Express, you can’t drag it across, or even use a standard file picker dialog box. Your only options are to copy and paste a long cryptic URL (and deal with cross-site authentication), or to take the document on a scenic detour through your computer’s real filesystem, downloading it from one app and uploading it to the other.
Offline access. There are in fact still times when your computer doesn’t have access to a network. (Even at über-connected Google, the commuter buses are prone to WiFi outages.) Or the network might just be infuriatingly slow or laggy. Losing access to your data in those situations is bad. Particularly in the case of email, I’ve been spoiled by IMAP, and Mail.app’s excellent offline support, long enough that the idea of using webmail is a complete non-starter for me. (This problem, fortunately, is being addressed, both in the form of Google Gears and in the parallel, but hopefully converging, offline-storage functionality being drafted for HTML 5.)
More than four fonts. The last time I had only four fonts available for documents was in 1985, before the LaserWriter Plus arrived. Currently on my computer I have an embarrassing overabundance, several hundred. Too bad web apps can’t actually discover what fonts are installed; instead they fall back on the deathly dull overused set of “safe” fonts like Times and Arial.
I’m sure there’s more to add to this list, but I’ll stop now. Suffice it to say that in too many ways web-apps are still like the Emperor’s new clothes, or like Samuel Johnson’s dog walking on its hind legs (“it is not done well; but you are surprised to find it done at all”), and the enthusiasm for them often seems to be proportional to the shittiness of the native UI of the enthusiast’s OS platform of choice. (I first noticed this during my brief stint at Sun, whose workstations all ran Motif, which made even Web 1.0 UIs look like a lovely opium dream by comparison.)
Don’t get me wrong, some web apps are great, and they all have a lot of potential, but taking away features from the browser wasn’t high on my list of what they chiefly need.
FYI, I ended up taking the position at Google. I started two weeks ago, and it’s been quite exciting, despite (or because of) the “drinking from a fire-hose” aspect of learning my way around the big G.
I’m on the Google Sites team. I’ve been interested in wikis for years, and now I get to actually work on one. (Although Sites, née JotSpot, is not a typical wiki.)
I could write a lot about my experience of Google so far. It’s quite an interesting place. Merely learning about how some of their internal systems operate has been jaw-dropping. (Do you have any idea how much hard disk space Google has? Or how many CPUs? Or how many search queries they handle? Unfortunately I don’t think I’m allowed to tell…)
For now I just wanted to say that I’m not in the job market anymore. Also, that I really like all the free food :-d