Facebook and Decentralized Identifiers [Thought Palace]

I finally made myself a Facebook account, mostly to see what it’s like. Overall, I’m pretty impressed: the UI is nicer than most such sites, particularly the still-antiquated LiveJournal and the disaster that is MySpace. The biggest issue there seems to be that the main profile page absolutely doesn’t scale up to handle the exploding number of apps/widgets people are stuffing into it, so you end up with mile-long profiles containing box after box of junk.

But the most interesting thing I noticed is how the service has no visible identifiers for user identities. Unlike most centralized services, there’s no unique username to pick. I assume that, internally, each account requires a unique email address, but that address plays very little role in the user experience, apart from its use in helping people find their existing contacts’ profiles. The service does assign a unique number to every profile, and this shows up in profiles’ URLs, but it never seems to appear in the page itself. So there’s no obvious way to say “this is my Facebook ID”, other than pasting in the completely non-mnemonic URL of your profile page. And conversely, the visible identifiers you see for other members are simply their real names (plus photos/icons.)

This makes sense, since Facebook is all about the social network. It may have eight hundred million members (assuming the account IDs are assigned serially) but there’s generally no need to know about, identify or refer to some random member, since it’s almost certainly someone you don’t know or care about. The people in your social network can be referred to by real name, because at that scale the number of name conflicts is so much lower, and you or your friends presumably ruled out spoofs before friending that person. Real names work for the same reason that they work in real life social interactions.

This avoids a lot of problems. Unique usernames, while deliciously old-school-geeky, don’t scale well to the modern Internet, as has been apparent since the rise of AOL 15 years ago. A username like “snej” or “JohnWilson” has great mnemonic value, but after a while everything’s already been taken but “fuzzi_bunney_37327” or “JohnWilson9284”, which are far less useful.

Using email addresses as IDs adds more room with a second level of namespace, and removes the need for people to make up new IDs for every site, but in practice it exposes the existing flat-namespace problem of the big ISPs and portals. “fuzzi_bunney_373@gmail.com” isn’t much of an improvement. There’s also the privacy and spam problem of exposing those emails to others.

There is no unique identifier.

It turns out there really isn’t any good solution for universal unique personal identifiers. This has been one of the factors blocking the utopian goal of a global Public Key Infrastructure (PKI): to be useful, a certificate has to assure you of its owner’s identity, which means it has to contain some signed identifier that you can associate with a real person (or persona). But, as argued convincingly by Carl Ellison in his paper Improvements on Conventional PKI Wisdom, there isn’t any such identifier that always works. Real names don’t work because they’re not unique enough, not globally or even in any large organization. Ellison calls this the “John Wilson problem”, named after eight Intel employees with that name, who kept getting misdirected email. (I think of it as the “Steve Smith” problem, after an old co-worker of mine at Apple.) Adding middle initials doesn’t help, of course, because most people don’t know their friends’ or co-workers’ middle initials. Ellison concludes:

“Human beings do not use names the way we want them to. … Computer developers … think of names the way we do variable names or path names. That is, a name is some string, unique within its block or directory or context, that unambiguously identifies some object. … Compilers and operating systems may behave this way, but human users do not.”

Of course, the certificate can include more forms of identification. A photo would help a lot, or email address. So would the name of your employer and (for co-workers) what department you work in. So would your hometown and phone number. Or your SSN or driver’s license number. But no one form of ID is sufficient for everyone: I don’t know most of my co-workers’ home phone numbers. I have personal friends whose current employment I’m not sure of. I have online friends who are nonetheless pseudonymous enough that I don’t know what they look like.

And you can’t just put in all (or even most) of those forms of ID, or your certificate becomes a privacy nightmare, a dossier worthy of a police state. The conclusion? There’s really no good way to prove identity using a self-contained certificate. QED.

This quandary has been expressed as Zooko’s Triangle, which in textual form says: No single form of identification can be simultaneously globally unique, decentralized, and human-meaningful. It can only have two of those attributes.

Distributed identification

The best approach to identification seems to be to give up on having the identifier try to prove its own relationship to a “principal” (person or entity). Instead, the relationships are given as external statements made by other principals. The identifier itself can then just be a random unique number (such as a public key, which is easy to generate) with no intrinsic mnemonic value.

So for example, if I get a message from F837CA77B6, I have no idea who that is. But if I already know Karen, and Karen makes a signed assertion that F837CA77B6 is her new husband Michael, then I can trust that the message is from Michael. I can add him to my address book as “Michael Jones”, and never have to see that random number again. It doesn’t matter if there are thousands of other Michael Joneses in the world, because I only know one. Even if some of my friends know another Michael Jones, I just have to refer to him as “Michael Jones, my sister Karen’s husband” to disambiguate.

In a nutshell, this approach uses the social network to manage identity, by reducing the size of the problem space by about seven orders of magnitude. It’s perfectly feasible to keep track of the identity of a few hundred people using familiar attributes like names, faces and personal relationships: humans have been doing it for literally millions hundreds of thousands of years. Evolutionary biologists argue that managing social relationships in tribes was a key factor in the development of human language: in other words, gossip was the killer app for the neocortex.

Steve Dohrmann and Carl Ellison’s 2002 paper Public-key Support for Collaborative Groups describes a prototype system that uses such a distributed identity system to implement a peer-to-peer collaborative space that provides access control and trust without needing any centralized PKI.

Marc Stiegler, in his 2005 essay An Introduction To Petname Systems, has coined the term “petname” for such a private mnemonic name each person locally assigns to their contacts; for example, the Michael above might be “Michael” to me, while I’d refer to my friend Michael as “Mikey”. Petnames are (locally) unique and memorable, and make it easy to work with identifiers that are intrinsically just long strings of digits. The use of petnames and raw identifiers together gets around the limitation of Zooko’s Triangle.

Bryan Ford et al, in Persistent Personal Names for Globally Connected Mobile Devices (2007) describe an ongoing project called Unmanaged Internet Architecture that provides a more general naming, discovery and communications protocol for people’s multiple networked devices and those of their friends and colleagues; it’s a bit like Apple’s Bonjour, but it follows your social network, not Ethernet.

Oh yeah, we were talking about Facebook

Facebook, of course, is centralized. (Well, the add-on apps aren’t, but even they get proxied through facebook.com.) But since a central aspect of its data model — user identities — works in much the same manner as these distributed systems, I believe that implies that a very Facebook-like social network could be built as a distributed architecture that didn’t rely on a central server or organization. It might not even look that different; you’d just notice (or not) that, as you surfed from one friend’s profile to another, the domain name in the address bar changed.

So what? Why does this matter if it doesn’t change the user experience, since existing monolithic sites can already do this, without any need for fancy new protocols?

Because there are too many of these sites, and none of them speak to each other; so you either have to make redundant accounts on all of them, or just not connect with the friends who aren’t on the one you use.
One reason there are so many is because users are fickle and the sites keep booming and busting. Friendster famously imploded; Orkut became Big In Brazil but not elsewhere; MySpace was famous yesterday; FaceBook is the darling today.
Actually, the users aren’t just being fickle. They leave sites because of real problems. Sometimes they’re technical: running a website with a complex data model, highly sticky content, and millions of users is very difficult. Half of what killed Friendster was an inability to keep the servers running reliably. Today’s Facebook app developers run into the same issue, as any halfway-popular app generates enough traffic to swamp most hosted servers.
And then there are the social and legal problems. Friendster’s other fatal problem was that it drove away its most rabid users by deleting their more creative profiles. MySpace and Facebook are currently testing the limits of how much their users’ eyeballs will tolerate being advertised to. LiveJournal keeps making ham-fisted crackdowns on inappropriate content, to the outrage of some of its core constituencies. Given the social, monetary and legal issues faced by huge public websites, these kinds of behavior aren’t going to go away.

“Just get a blog, dude” (or not)

At this point the elite members of the “blogosphere” [sorry] will be nodding smugly. You don’t need these monolithic services with all their problems if you just set up your own blog on your own website. All you need is a little bit of skill with FTP, and MySQL, and PHP, and HTML, and CSS, and that’s it! You’re free! No more being hassled by The Man.

Standalone blogs just don’t have the social features, though — they’re too standalone. I’ve had this blog for years, and it feels like a soapbox in a big city, where I can occasionally rant about something. I’ve also had accounts on one or two social-network sites for years, and those feel very different: they’re deeply personal, and the presence of friends (and their friends) is always very apparent. They’re like town squares, or big cocktail parties, where I can talk with a few friends or listen in on the buzz going on nearby.

The standalone blogs just don’t have enough protocols to enable the kind of rich interpersonal interaction that centralized social-network sites have. The venerable protocols like RSS/Atom, ping, and trackback are the kind of stuff described as the simplest thing that could possibly work (if you’re charitable) or “something scrawled on the back of a napkin in crayon” (if you’re not.) More recently, OpenID is a big step forward in making identities transportable between sites. Google’s OpenSocial might help; I confess I haven’t yet examined it closely enough to tell if it’s useful for more than just creating yet more Facebook-style widgets.

In any case, there’s more work to be done to bring social networking into the decentralized blogging environment. I think we may be at a tipping point now — people seem to be increasingly aware of the problems that centralization brings, and in their aftermath I keep running across discussions of how it should be possible to do this stuff without pesky corporate overlords messing it up. So there may be enough demand, now, to balance out the intrinsic difficulty of building open standards. I’m hopeful!