SIDEBAR
»
S
I
D
E
B
A
R
«
I’m Building Me A B-Tree
August 14th, 2009 by jens

The other day I took it into my head to implement a B+tree. Why? Because they sound neat, and I’ve done hardly any serious programming with trees in my career. (Someone, I think Buzz Andersen, once noted that there are two kinds of programmers: those who do think in terms of trees, and those who do everything with hash tables. I’m in the latter camp.)

And also because I’m a big fan of CouchDB, and really admire its elegant storage model. It’s an on-disk B-tree—no surprises there—but the file is append-only, which both makes it impervious to crash-related corruption, provides nearly lockless concurrency, and makes it easy to access earlier revisions.

[In a nutshell: Updated data values or tree nodes are appended to the file instead of overwriting the earlier versions. Since updating a node changes its location, its parent node needs to be updated too to point to the new location. This recurses up the tree, meaning any change ends up with a new root node written at the very end of the file. In fact, when you open the file you find the root by looking at the very end. Since no data is ever changed, once you open the file you’re impervious to changes made by other writers since they don’t affect anything you’re looking at.]

I’d love to use something like that for various projects, but as CouchDB is implemented in the exotic functional language Erlang, I can’t really use its storage layer as-is. So: could I implement something like it in C++?

Thus far I have a working in-memory B+tree implementation. Inserts and deletes work, and I’m working on the iterator. Even this much was harder to get working than it should have been, or so it feels. But that always seems to be true—algorithms sound straightforward when you read about them, but putting them into practice exposes you to all the details and subtleties inherent in hand-waving like “now merge the node with a neighbor”.

Actually I haven’t implemented a straight B+tree, rather a ‘top-down’ variant described by Ohad Rodeh that’s better suited to this type of application because it changes fewer numbers of interior nodes during an update.

What’s next?

  • Support for string keys (so far it just handles ints)
  • Serializing nodes to/from disk
  • Keeping track of which nodes are touched during an operation, and appending those to the file
  • Writing a trailer to the file to mark a successful update (and to link back to the previous trailer, for historical purposes.)

Sounds straightforward, but of course the devil’s in the details.


8 Responses  
  • Cody Brimhall writes:
    August 14th, 200911:46 AMat

    The last time I used C++ was back in University, where we were instructed to write (part of) a B-Tree. To make things easier on us, we didn’t have to implement a couple of funtions, delete() in particular—which disappointed me. The devil’s in the details, but so’s the fun!

  • Dave Camp writes:
    August 14th, 200911:49 AMat

    Back in the day, I implemented my own B*Tree code in C that mirrored the functionality of the HFS structures. It was for DiskFix, which was a disk repair utility that competed with Norton Utilities on the Mac (back when drive corruption was common in the System 7/8/9 days).

    Fun! But also not Fun.

  • Jens Alfke writes:
    August 14th, 20092:38 PMat

    Cody — Yeah, the deletion part is the hardest. Especially because the top-down type of B+tree doesn’t link the leaf nodes together.

    (C++ is a little awkward, but I might want to use this for stuff at work.)

  • dete writes:
    September 5th, 200910:51 AMat

    Any chance this stuff will get open-sourced?

    We’ve got a couple of internal projects at my work that could really make use of a peer-to-peer database solution. (I didn’t even know that’s what we needed, but when I read about Prophet, I realized it was ideal…)

    Anyhow, Prophet is Perl, which is embeddable, but hardly ideal for, say, an iPhone application. I considered rolling my own, and it would ideally be built on an append-only database like you’re talking about here (having an arbitrary history horizon is necessary for syncing).

    The upshot is that I’d love to look at what you’re doing here!

    BTW - I hates me the C++, but I agree that for code that requires portability, it’s probably the best choice.

  • Jens Alfke writes:
    September 5th, 20096:45 PMat

    dete — I hadn’t heard of Prophet before, so thanks for the link! I’ve just opened the home page, and it looks pretty interesting. (Perl, though, yuck.)

    Yes, my work’s going to be open-sourced, as soon as it’s stable. What I have now runs, but I keep refactoring all the code and jiggling the file format, so it’d be frustrating for anyone else to use.

    In the past few weeks I’ve been focusing more on a hashtable version of the same thing, because I think we’re going to need that sooner at work, but the tree stuff is nearly as far along. (In principle both should be able to coexist in a file at the same time, with some more work.)

  • dete writes:
    September 7th, 200910:07 AMat

    Yeah, Prophet seems promising… if one can easily bring the ideas into C/C++/Obj-C. Perl seems an odd choice these days, I thought Python was all the rage for “annoying to use in real software” development. (Haters: I’m kidding… mostly.)

    If you don’t announce the release of the DB stuff, please shoot me an e-mail when it’s available. Thanks!

  • jens writes:
    September 7th, 200910:21 AMat

    Perl’s not too popular these days, but everyone’s got their favorite language and I’m not going to begrudge them it. It does have the advantage that most OS’s already come with the runtime installed, which is more than you can say for CouchDB (written in Erlang.)

    I wasn’t able to find any real information about Prophet, other than the slideshows and some sketchy notes in the doc/ subfolder. Do you know of any?

  • dete writes:
    September 7th, 200911:36 AMat

    No; as far as I can tell there IS no real information about Prophet except in the code and the originator’s heads. To be fair, it’s obviously early days for the project and I’d rather they release something without docs than not release, but it’s hard not to be impatient: They’ve built something that I want to use RIGHT NOW… :-)


»  Substance:WordPress   »  Style:Ahren Ahimsa