Oct
19
2009
I’ve pushed a new update of my Ottoman storage library.
What’s new is the option to have individual additions to the dictionary written directly to disk, instead of being buffered in memory until you save a new version. This helps performance if you’re writing a ton of data all at once. And there’s a simple demo called PackDir that writes a ton of data all at once: it packs the contents of an entire directory tree into an Ottoman file, with keys being filenames and values being file contents.
To implement this I had to violate the lockless semantics a bit, because having multiple clients appending key/value pairs to the file at the same time would make it a lot harder to keep their revisions straight. So before being able to do any direct puts, you have to enter an [ugh] ExclusiveTransaction.
(Of course, the issues with locking don’t matter if the file is only being used by one process. Or even if there are multiple readers but only one writer, since the locks don’t interrupt readers.)
no comments | tags: ottoman, storage | posted in Computers
Sep
24
2009
I’ve pushed out a few updates to the Ottoman library today. These fix a couple of embarrassing bugs, and also add an API in regular ol’ Objective-C for those of you who aren’t into that funky “C++” stuff.
(The Cocoa stuff uses MYUtilities, so if you’re not already using that, please read the setup instructions.)
no comments | tags: ottoman, storage | posted in Computers
Sep
20
2009
It’s got a moose! But then I found the ottoman with hanging files inside and it was even more perfect…
I thought this one was pretty damn stylish too:
no comments | tags: ottoman, storage | posted in Computers
Sep
20
2009
Ottoman is a lightweight, reliable key-value store with multi-version concurrency control.

A key-value store is a persistent on-disk dictionary that maps arbitrary keys (usually short text strings) to arbitrary values (binary data of any length). There’s already a large and venerable family of libraries that do this, often referred to by the name “db”. Notable members include dbm, Berkeley DB, cdb and Tokyo Cabinet. What makes Ottoman different?
7 comments | tags: ottoman, storage | posted in Computers, Me
Sep
5
2009
I know, three weeks ago I said I was building me a B-Tree. I did build it, even the parts I listed under “What’s next?” in that post, and it works. But it became apparent there was a more urgent need for a hash table, for work-related reasons, so I switched gears to build one of those on the same principles.
The biggest principle is Append-Only Storage, as described in the prior post. So I thought back to the simplest on-disk hash table I know of—Dan Bernstein’s CDB—which is very clever, but read-only. I implemented something similar, and then mashed in the CouchDB-like approach of incrementally appending only the modified sub-components.
Initially I made the file a series of key-value pairs, followed by the hash-table index as an array of {hash code, position} structures, each of which pointed to the position of the corresponding key and value. Very simple. To save changes, I’d write out the changed pairs, followed by a new copy of the index. The problem with that was that the index gets large as the number of records increases, so with a 100,000-record file, changing even one record would append almost a megabyte.
I had a brainstorm about how to [...]
4 comments | tags: ottoman, storage | posted in Computers
Aug
14
2009
The other day I took it into my head to implement a B+tree. Why? Because they sound neat, and I’ve done hardly any serious programming with trees in my career. (Someone, I think Buzz Andersen, once noted that there are two kinds of programmers: those who do think in terms of trees, and those who do everything with hash tables. I’m in the latter camp.)
And also because I’m a big fan of CouchDB, and really admire its elegant storage model. It’s an on-disk B-tree—no surprises there—but the file is append-only, which both makes it impervious to crash-related corruption, provides nearly lockless concurrency, and makes it easy to access earlier revisions.
[In a nutshell: Updated data values or tree nodes are appended to the file instead of overwriting the earlier versions. Since updating a node changes its location, its parent node needs to be updated too to point to the new location. This recurses up the tree, meaning any change ends up with a new root node written at the very end of the file. In fact, when you open the file you find the root by looking at the very end. Since no data is ever changed, once you open the [...]
8 comments | tags: ottoman, storage | posted in Computers, Me