SIDEBAR
»
S
I
D
E
B
A
R
«
Gossip For Lakitu
Aug 16th, 2009 by jens

Last year I wrote a series of blog posts about a peer-to-peer system called Cloudy that I was developing. I was going up the stack, from messaging to identity, but didn’t finish documenting all the layers I’d built. I mostly stopped working on Cloudy after I went back to gainful employment, but I keep thinking about this stuff.

“Lakitu”?

I’ve since heard about another unrelated project nicknamed Cloudy; and the whole term “cloud” has gotten so debased in the past year that it now stands for outsourcing to giant hidden server farms, which is the antithesis of what I stand for. So I’ve decided to use the name Lakitu instead. Nintendo fans will recognize Lakitu as a bit character in the Mario games—he’s a goggled turtle who rides a little one-seater cloud. This makes him an appropriate mascot for P2P technologies, I think.

[I’m sure Nintendo has a trademark on the character, but they don’t appear to have copyrighted the word “Lakitu”. He’s not even known by that name in Japan, where he’s called “ジュゲム” or “Jugem”. I have been unable to find out what “Lakitu” means or why they decided to use it in the English translation. I could also note threateningly that I have some intellectual-property issues of my own with Nintendo’s depiction of Lakitu’s smiling cloud, which is clearly infringing on my son’s comic-strip character Cloudy. So let’s call it a draw, Iwata-san?]

My last Cloudy post was about verifying people’s identities, and the next one was going to be about gossip. I’ve become unhappy about the rather kludgy way I designed gossip in Cloudy, so yesterday I started designing a new protocol for it, which I’m going to write about.

“Gossip”?

A gossip protocol is a means of broadcasting information in a distributed system. Pairs of computers periodically connect and swap new bits of information with each other; the result is that the information gets dispersed through the whole network (provided it’s a connected graph.) The tricky part is avoiding infinite loops and combinatorial explosions, and optimizing the way pairs of computers swap messages so it scales well.

I started defining a protocol, based on stuff I’ve been thinking about for a while. I don’t think it’s as advanced as what’s reported in research papers, but I’m hoping it will work well enough when used in a socially-driven network—one where the connections between machines are driven by the social connections between their users. Social networks have short horizons, so any particular participant only “sees” a constrained number of near-neighbors even though the entire network may be huge.

I’m making this protocol agnostic as to the type of messaging being used. BLIP will work well, but it ought to be possible to use Jabber or even email; anything that can send messages between two participants. It’s also agnostic as to message content, beyond a few simple assumptions that a message has an author, a timestamp, and some arbitrary “topic” tags.

For example, it ought to work fine at distributing tweet-like micro-blog posts.

Right now I have the protocol written down as an outline in Notebook. I’ll flatten it out, expand it and post it here in a day or two.

I’m Building Me A B-Tree
Aug 14th, 2009 by jens

The other day I took it into my head to implement a B+tree. Why? Because they sound neat, and I’ve done hardly any serious programming with trees in my career. (Someone, I think Buzz Andersen, once noted that there are two kinds of programmers: those who do think in terms of trees, and those who do everything with hash tables. I’m in the latter camp.)

And also because I’m a big fan of CouchDB, and really admire its elegant storage model. It’s an on-disk B-tree—no surprises there—but the file is append-only, which both makes it impervious to crash-related corruption, provides nearly lockless concurrency, and makes it easy to access earlier revisions.

[In a nutshell: Updated data values or tree nodes are appended to the file instead of overwriting the earlier versions. Since updating a node changes its location, its parent node needs to be updated too to point to the new location. This recurses up the tree, meaning any change ends up with a new root node written at the very end of the file. In fact, when you open the file you find the root by looking at the very end. Since no data is ever changed, once you open the file you’re impervious to changes made by other writers since they don’t affect anything you’re looking at.]

I’d love to use something like that for various projects, but as CouchDB is implemented in the exotic functional language Erlang, I can’t really use its storage layer as-is. So: could I implement something like it in C++?

Thus far I have a working in-memory B+tree implementation. Inserts and deletes work, and I’m working on the iterator. Even this much was harder to get working than it should have been, or so it feels. But that always seems to be true—algorithms sound straightforward when you read about them, but putting them into practice exposes you to all the details and subtleties inherent in hand-waving like “now merge the node with a neighbor”.

Actually I haven’t implemented a straight B+tree, rather a ‘top-down’ variant described by Ohad Rodeh that’s better suited to this type of application because it changes fewer numbers of interior nodes during an update.

What’s next?

  • Support for string keys (so far it just handles ints)
  • Serializing nodes to/from disk
  • Keeping track of which nodes are touched during an operation, and appending those to the file
  • Writing a trailer to the file to mark a successful update (and to link back to the previous trailer, for historical purposes.)

Sounds straightforward, but of course the devil’s in the details.

iTunes 9 Deja Vu
Aug 11th, 2009 by jens

AppleInsider reports on the iTunes 9 rumors:

“The social networking integration that we reported iTunes 9 would have seems to be part of a bigger social networking push by Apple,” the report states. “We’ve been informed that Apple has plans to tie iTunes 9 into a “Social” application that they plan to release in the future.”

This sounds like the kind of app (though separate from iTunes) that Jessica Kahn and I kept trying in vain to get Apple to build, circa 2003-2005. Maybe they’ll get some use out of our abandoned prototypes.

The report goes on to say that the new application would allow users to share their listening habits with friends [and] send music to friends”

Mike Estee and I had actually prototyped this in iChat in 2003, but the feature never got approved since there were so many more important things to add, like 3-way video conferencing. (Plus the fact that Apple execs turned white as a sheet if you said the words “send music” near them.)

Anyway, personal bitterness aside, I think it’s really amusing that Apple keeps shoving the kitchen sink into iTunes, since that has to be the single nastiest, hardest-to-extend codebase they have — it’s their last remaining Carbon app, with a foundation that dates back to Casady & Greene’s SoundJam, circa 1998.

The Exact Inverse of GeekGameBoard
Aug 11th, 2009 by jens

iPhone playing cards by Meninos:

Security: Not Quite Getting It
Jul 13th, 2009 by jens

I got an iPhone 3GS yesterday (yes, it totally rules.) While setting up online account access for billing, AT&T had me enter a password.

There was one of those colored password-strength meters next to the text field, and it said the password I entered was “weak”. Alright, I changed it to add some commas and dashes.

Then I hit Submit, and was told that passwords can only contain letters and digits.

sigh.

The Subtle Dangers Of Distributed Objects
Jul 5th, 2009 by jens

Introduction: I wrote this as part of a reply on Apple’s bonjour-dev mailing list, then decided it might be worth publishing more visibly. I’ve found that Cocoa’s Distributed Objects technology is immediately attractive to many developers, while those who’ve used it end up finding that it’s much more complex than it looks. But I haven’t seen much written about the caveats of using it.
I am not saying “don’t use DO” or “DO is broken”! It has valid uses, and it works as designed. But you should be aware of the less-obvious complexities. If you have a single GUI app and a single background agent, that’s a great use-case. If the agent communicates with multiple apps (like the iChatAgent), things get trickier. If you’re going to use DO over the network, you’ve got to be really, really careful.

Distributed Objects is not as simple as it looks at first glance, especially for use over a network. Here are some of the issues I’ve run into:

  • Ref-counting bugs can be really hard to track down, because remote objects can be holding onto references to your local objects via their proxies. It’s possible to have reference-loops that span two machines! Note that this means a buggy client can cause its server to leak memory.
  • Any message sent to a remote object can potentially throw an exception if there’s a network problem or the remote peer disconnects. To make your app robust you have to handle all such exceptions and clean up gracefully. (A nasty case of this that I’ve seen is where a client’s crash causes the server to crash, which then causes all the other clients to crash…)
  • Sending a non-oneway message to a remote object blocks the thread indefinitely until the remote peer sends a response. This effectively lets the peer hold your thread hostage, and can cause your app to lock up if the peer is overloaded or buggy or actively hostile. You can also end up with deadlocks that span multiple computers—good luck debugging those! (In the app I shipped that used DO, we ended up using only oneway messages in our API for this reason.)
  • Even oneway message sends can fail if DO’s send buffer fills up. The Mach queue is fixed size; I’m not sure if this applies to TCP too. We found it necessary to build a wrapper layer for sends that would catch the resulting exception and re-send the message again after a delay.

It gets worse if you’re using DO over a network. In most cases, especially in a P2P app, you have to consider the possibility of malicious peers. (Even if your app will only be used in controlled environments, a buggy peer can have similar effects.) This means you can’t trust any input you get from a peer without validating it first. A distributed object API can be really dangerous in such an environment because it blurs the line between local/trusted and remote/untrusted code and data. It makes it harder to identify the points in your code where you have to verify.

Here are some of the possible security problems:

  • Any remotely-accessible method has to handle arbitrary parameter values without ill effects. If it takes an NSString*, it has to survive being passed nil. If that causes a crash, it’s a denial-of-service attack. If it throws an exception, you have to make sure all of your code cleans up state on the way out, otherwise corrupted state could lead to denial of service or worse.
  • If a remotely-accessible method allocates nontrivial amounts of memory (like creating new objects), then a malicious peer could call it in an infinite loop and run your app out of memory, most likely crashing it. Another DOS attack.
  • It’s easy to fall into the assumption that a remote object behaves the way your implementation of it says it does. This isn’t true, in the malicious case, because an attacker could implement their own version of the same interface with arbitrary behavior. A particularly stupid example would be a RemoteClient interface with a boolean isLoggedIn property. You expect that this will return NO until you set it to YES, but what if someone implemented it to always return YES?

You can argue that this just calls for good unit testing and black-box testing as with any public API. Which is partly true; except that you can’t get away with simply stating that “nil values are not allowed for this parameter” or “the effect of calling this twice is undefined”. You have to expect anything. And worse, any bugs not found in testing are not just mundane customer-support issues, but potential priority-zero security holes that could cause really serious problems.

The end result of my experiences is that I don’t think I would use DO again. By the time you’ve refactored the API to be all-oneway, and written wrappers to delegate messaging to background threads, it doesn’t look like regular message-sends anymore. In other words, instead of writing
result = [remoteObject doSomething: param using: param2];
by the time you’ve added the delegation, asynchrony and error handling you end up with something like:
NSError *error;
if (![dispatcher sendMessage: @selector(doSomething:withObject:) toObject: remoteObject withObject: param withObject: param2
target: self action: @selector(didSomething:) error: &error])
[self handleError: error];
// now keep going while you wait for the -didSomething: call ...

	

...

- (void) didSomething: (NSString*)result {
//...now handle the result
}

So you might as well use something lower-level to send the commands over the socket and save yourself a lot of complexity.

Career Update, Part ++n
Jul 3rd, 2009 by jens

I’ve been working at Google since last August. The Big G’s hiring process is rather weird—when you interview, it’s not for any specific team. It’s only after you get an offer that you decide which team to join, of the ones with open positions.

I decided on Google Sites, which I knew and liked from its days as JotSpot, a hosted wiki with some powerful features. It ended up not being the right place for me, for a couple of reasons:

  • Currently, Sites’ priorities are in website publishing, as a replacement for Google Page Creator (which is being phased out soon.) It’s quite good at it, but I’m less interested in that than in collaboration features.
  • Google’s server-side infrastructure is really, really, really huge and complex. There is an endless landscape of internal technologies and tools—the few that have been described in public (MapReduce, BigTable, Chubby, etc.) are just the tip of the iceberg. I have discovered that I am not very interested in this kind of stuff, and I quickly became frustrated by the deluge of technologies I needed to learn to get things done.
  • Running a large web service is like running a nuclear power plant or an electric power grid. It requires 24/7/365 monitoring, and at that scale, anything that can go wrong will go wrong, and frequently does. I do not have the right temperament for working with this, especially not when it comes to taking turns carrying a pager that wakes me up at 4AM because some service’s latency has gone above 300ms.
  • I’ve been vocal about my frustration with centralized systems; Google’s websites are kind of the ultimate in that (even though, ironically, they’re implemented as P2P-like networks internally, as I believe all large web operations are today.)

The good news is that Google encourages transfers between teams, and makes it easy to do so. The near-total transparancy inside the company makes it easy to find out everything that’s going on, and there’s a well-designed website for engineering transfers that helps you find teams that need people. I even went to an internal job fair.

I’ve now ended up working on Chrome, Google’s web browser. The team I’m on is responsible for implementing HTML 5 features, as well as designing and implementing other new features (for standardization) that will help web apps become as powerful as native apps. Much of what we do will go into the WebKit source tree, where it will also directly benefit Safari, Android, the Palm Pre, and other WebKit-based browsers.

In fact, everything I work on (more or less) is going to be open source. Both the WebKit and Chromium source trees are public. You can view, if you care to, the one patch I’ve contributed so far and the one that’s currently out for review. That’s kind of mind-blowing, in a good way, to me, steeped as I am in the secrecy of Apple.

I’m pretty excited by this. There are quite a lot of things I’m interested in working on—client-side storage, local apps, drag-and-drop, better font support, menus, even far-out stuff like peer-to-peer networking. Forward in all directions!

Is There Any Point To Using The Keychain API On iPhone?
Jun 14th, 2009 by jens

I’ve always liked the Keychain technology in Mac OS X. Sure, the API is notoriously confusing and awkward, but the end-user benefits are compelling:

  1. Secure, encrypted storage for all passwords and keys.
  2. Items can be shared between applications—so in principle you don’t have to enter a given password more than once, since other apps will find the existing item in the keychain.
  3. Items have access control lists, so they can be restricted to certain apps.
  4. The user can “lock” the keychain, requiring a passphrase to be entered before there’s any further access to it. This happens by default when the system goes to sleep, which is a good security feature especially for laptops.
  5. If an app’s code changes, it has to ask permission to use the keychain again (protects against malicious code patches)

For the past few weeks’ worth of Copious Spare Time, I’ve been trying to get my MYCrypto framework, which is in part a friendly API to the Keychain, to run on iPhone. The iPhone has a Keychain API, but it’s a different API than the Mac OS one. At first glance it looks simpler and easier to use, and maybe it would be if it were properly documented, but in practice the item-storage part of it (the SecItem* functions) is incredibly frustrating because the documentation is both incomplete and just plain wrong.

Currently I’ve gotten a lot of it working, but I’m stuck on some issues that seem like either major Keychain bugs or philosophical differences (parts of the API don’t seem to work at all with items that exist in memory but haven’t been persistently added to the Keychain store.) I’ve filed at least six bug reports to Apple in the last week, including the kind of basic unit tets that I would have hoped Apple QA engineers would have written before iPhone 2.0 ever went to developers. I’m very frustrated.

All this for what?

After finishing the bug reports, I had the crazy idea: why should I be using the Keychain store at all on iPhone? Going through my above list of benefits, I realized that hardly any of them apply:

  1. The iPhone security model relies on app sandboxing to protect data. Even malicious app code can’t reach the keychain file because it’s outside the app sandbox. (I have some data that implies that the file is in fact just a plaintext SQLite database, not the fancy encrypted store it is on OS X.) [ Update: I now have confirmation that the Keychain file is encrypted on the device and in backups, making it secure against most attacks.]
  2. iPhone apps can’t share keychain items. Every app effectively has its own sandboxed keychain. So there’s no usability benefit of putting passwords in it.
  3. No sharing means no point in access control lists, of course.
  4. There’s no keychain passphrase or lock/unlock behavior on iPhone. Once you unlock the iPhone itself, all apps can access their keychains freely.
  5. Since all apps are signed, there’s no question of malicious patches.

So much for that. What you’re left with is a rudimentary flat-file database, specialized for just a few data types, with a really clunky and badly documented API. Other than the fact that it happens to already exist, there’s nothing about it that’s as good as something you could write in a few hours using CoreData, or your favorite high-level SQLite API like FMDB or QuickLite. Heck, for simple needs you could just use a property list, for example an NSDictionary mapping URLs to [username, password] pairs. It ought to be just as secure, because the sandbox prevents any other apps from being able to access the file.

Updated: What you’re left with is an encrypted flat-file database, specialized for just a few data types, with a really clunky and badly documented API. As I wrote above, its functionality could be duplicated, with a better API, without much effort. The encryption part is significant, though, since its primary purpose is to keep keys and passwords safe. A DIY key database could be protected by encrypting it with a symmetric key, and then putting that key in the Keychain.

I’m not sure where that leaves MYCrypto. I’m not sure I’m motivated to write a general-purpose version of this data store myself. If I can get some good answers and workaround to my Keychain API bug reports, I may just continue to tough it out.

Chatty + MYNetwork
May 24th, 2009 by jens

As foreshadowed, I’ve created a modified version of the Chatty iPhone sample app, which uses the MYNetwork library instead of custom networking code. You can get it off of Bitbucket.

Apple Never Promised Us It Wouldn’t Be Evil
May 22nd, 2009 by jens

Here is the latest absurdity to come out of Apple’s deeply, endemically fucked-up App Store approval process: Jamie Montgomerie’s Eucalyptus app, an e-book reader that can download public-domain books from Project Gutenberg—about the most innocuous thing you could imagine, right?—gets rejected not once but three times for containing “obscene, pornographic, offensive or defamatory content”.

Leaving aside the issue of whether Apple has any business deciding what constitutes obscenity (a task that’s driven grown Supreme Court justices to drink)—
And leaving aside also the fact that Apple’s censors have three times now been too dim to comprehend that the application does not contain any books, obscene or otherwise, but downloads them from the Internet much like Safari—
No, the really outrageous issue is that the supposed obscenity here consists of a text-only English translation of the Kama Sutra. Apple specifically called out some pages of steamy advice for “when a man wishes to enlarge his lingam“. (No, really.) [1]

Now, Richard Burton had to get his 1883 translation of this ancient text printed privately when no publishers would accept it, but that was in the Victorian era. The current authoritative translation is nowadays published by that infamous smut peddler, Oxford University Press. Much harder-core fare like Ulysses and Lady Chatterley’s Lover—books which go so far as to use recognizable English names of the naughty bits—were judged after some controversy to be free of obscenity in the 1930s. By 1970 the obscenity statutes had been lifted from nearly all printed material, and nowadays anything goes—take a look at some of the e-books available for sale on the iTunes store.

Montgomerie has now, humiliatingly, been driven to self-censorship: his latest message to Apple states “I have now submitted a new version that specifically blocks access to the Kama Sutra book you identified. Is this what you mean?”

[Update, May 24: Mongomerie reports that on the 23rd he “received a phone call from an Apple representative. He was very complimentary about Eucalyptus.” The whole matter was, of course, resolved, and Eucalyptus is now available for purchase. A happy ending, certainly, and congratulations on the release! ...But I don’t believe it invalidates my argument below. After all, this isn’t the first time an outrageous rejection has been reversed after mass humiliation of Apple. It’s the overall default policies and behaviors, and their chilling effects, that I’m complaining about, and there’s still no sign of those changing.]

The “E” word.

I don’t think anyone but a card-carrying member of the Christian Coalition or Taliban would disagree that this was a stupid decision on Apple’s part. (And it was a considered decision, not a ‘glitch in the approval process’, given that Apple repeated it twice.)

I feel the need to step up to a stronger word. How does evil sound?

Hear me out. I’m not talking “evil” as in killing babies or nuclear blackmail, rather in the sense it’s meant in Google’s corny motto “Don’t Be Evil”, or alluded to in the older proverb “With great power comes great responsibility”. But yes, I do mean “evil” as malign, the opposite of good, etc. etc.

Last year Apple put itself into an ethically very delicate situation with the App Store, by creating a market in which it has the sole power to make 3rd party software available (or to take it away). As has been amply discussed before, iPhone developers have no choice (if they want to be iPhone developers) but to put tremendous effort into developing the product, only finding out at the very end whether or not Apple will let it be sold.[2]

There are definitely some good reasons for such a model, primarily that it helps keep the platform secure from malware, and that by preventing piracy it allows developers to collect a lot more revenues per user, allowing them to set prices far lower than those in other software markets.

But in return Apple had the obligation to be very, very careful to be ethical, upright and transparent in its dealings with developers and the public, to minimize the dangers (of censorship, of conflicts of interest, of stifling innovation) inherent in its position.

And being Apple, it completely and utterly fucked it up. Because it’s in Apple’s genetic code to be about as transparent as a lead brick. This has always annoyed the press, and it has frequently enraged developers, who suffer from the consequences of blank silence from Apple in between carefully-scripted WWDC keynotes and PR-scrubbed announcements. But in the context of the App Store, Apple’s inscrutability and arbitrariness has become actively malign.

Evil is as evil does.

I’m not saying that Apple is evil; it isn’t run by bluestockings or monopolists or cackling supervillains. But evil is a result of what you do [3], and actions are not excused by good intentions; in the real world those who do evil (excepting psychopaths) uniformly believe they’re working for the good.

Apple’s App Store approval process has, over the past year, shown that the company is:

  • acting like a Victorian-era book censor;
  • quashing competition by blocking apps that improve on Apple’s products;
  • blocking innovation by denying 3rd party apps access to the user’s legally-owned data (such as MP3s);
  • attempting to deny end-users the freedom to do what they want with the hardware they bought and paid for (viz. its current efforts to have jailbreaking declared illegal);
  • and causing undue hardship to small developers by arbitrarily withholding their ability to sell the apps they’ve developed.

Nor has Apple engaged in the slightest bit of dialog with its developers and users to work through any of these issues. The best that’s happened is that, after much public ridicule, Apple has without comment released some apps that it had previously blocked.

Maybe you think “evil” is too strong or melodramatic or exaggerated a word for this. Then feel free to substitute something that has fewer loaded connotations to you—“unethical” or “anticompetitive” or whatever. But if you’re one of those who, like me, has applied the “e” word to the past actions of Microsoft, or to groups that try to ban books from libraries, then I think there’s really no option but to use the same blunt language here and now.


[1] By these standards, Apple should have banned its own Mail app, too. It sends me these kind of lingam-enlargment messages all the time.

[2] This is arguably worse than the console video-game industry’s similar monopoly on approving games, because those companies listen to developer pitches up-front before development. (Also, this kind of restraint is much nastier when applied to all types of software, including books, than just to games.)

[3] This semantic distinction is one I’m not sure Google gets either. “Don’t Do Evil” would have been a better motto. But in its defense, Google does in practice seem to understand its responsibilities and is admirably open about its actions.

»  Substance:WordPress   »  Style:Ahren Ahimsa