SIDEBAR
»
S
I
D
E
B
A
R
«
Doing application updates via version-control
July 4th, 2008 by jens

I just had an interesting idea, brought on by a post to an Apple developer list asking about software-update mechanisms for Mac applications. The library everyone uses for this is Sparkle, which is wonderful in all ways except bandwidth usage: it updates the app by downloading an entire zip archive of the new version. With many apps nowadays being 10MB or even 100MB downloads, that’s pretty significant.

This could clearly be improved a lot by downloading a delta instead, then using that to patch the current copy of the app. In most cases, using a good algorithm like xdelta3 or zdelta, the data transmitted will be orders of magnitude smaller than the entire app. (Nothing new here; many app updaters already do this, especially for games, and Microsoft apparently has some sophisticated delta-based software update tools in Windows.)

Of course, the delta to be downloaded is specific to which version of the program you already have, as well as which one you’re updating to. This means the server will need to keep a number of deltas on hand, and it and the client need to negotiate which one to use.

An additional problem with using deltas for updating Mac applications is that, on OS X, an app isn’t a single file but a directory tree masquerading as a file. This means that a patch would have to consist of a tree of deltas, one per file.

I started toying with ways of implementing this, but a minute later my brain chimed in with the observation “Hey Jens! This is just like what a version-control system does when updating a working tree from a repository!” That’s what I love about my brain: some fraction of the factoids I cram into it daily will percolate inside and pop out later on at some useful moment. Thanks, brain!

Here’s My Idea.

Release your application in the form of a working tree of a distributed version-control system [DVCS] like Mercurial (or Git, if you must.) That is, the app bundle’s “Contents” directory has a “.hg” subdirectory containing the usual Mercurial metadata. This is sort of unusual in that you’ve checked in the compiled app rather than the source code; but modern DVCSs work fine with binary files.

You also maintain on your server a repository containing all the versions of the application. Whenever you release a new version, you simply check it into that repository as an update of the previous release.

Now, when a copy of the app wants to update itself, it simply does an “hg pull” (or equivalent) of itself from the repository URL. This efficiently determines which files have changed and, for each file, which deltas to download an apply. And it does this without affecting the running app, because the deltas go into the metadata in the “.hg” directory. Then when the pull is complete, the app can prompt the user that it’s time to relaunch, then run “hg update” to patch the actual files and quit and re-launch itself.

This has a number of really nice features:

  1. Checking whether an update is present is quick (DVCS’s are optimized for this).
  2. Downloading updates uses minimal bandwidth because only compressed deltas are sent.
  3. The DVCS automatically handles updating from any old version.
  4. Users can even downgrade easily to previous versions. (In fact, switching between previously-downloaded versions can be done offline, since the DVCS metadata retains all the necessary diffs!)
  5. This mechanism can easily handle beta versions. By treating a beta as a branch, betas can easily be made “opt-in”, with users able to switch between beta and stable mode.
  6. If one user has downloaded an update, another nearby user could efficiently apply the update by pulling directly from that user’s copy of the app over the LAN, instead of having to go back to the original server. (Or alternatively, the DVCS can package the update into a patch-file that can be sent to the other computer and applied there.)

The only drawbacks I can see to this are:

  1. The DVCS software needs to be available on every user’s machine. Since none of these ship with the OS, it would probably have to be packaged inside the app as part of the software-update library. That’s a significant amount of code (Mercurial is about 2MB, and I think Git is a lot bigger.)
  2. The local app’s revision metadata is a full repository that includes prior versions. You clearly don’t want to ship it that way. I believe DVCSs all support a way of pruning old revisions from a repository, though.

I’m not promising to implement this, but it’s so cool that someone needs to give it a try. And if it’s been done already, I’d be interested to hear about how well it worked.


38 Responses  
  • Daniel writes:
    July 4th, 200811:00 AMat

    Interesting idea, except one issue:
    Not every dev has a version control repo availiable, so what about building it around Bazaar (which works with FTP).

    It would be cool if you’d join forces with Andy:
    http://andymatuschak.org/articles/2008/06/01/a-guide-to-contributing-to-sparkle/

  • Daniel writes:
    July 4th, 200811:02 AMat

    Just thought about codesigning…
    Could cause problems - on the other hand - the “new signing” would probably be included in the diff.

  • Clark Cox writes:
    July 4th, 200811:04 AMat

    What a great idea.

    But why not just use a plain ‘ol, svn? While I can see the advantages of a DVCS in that situation, I would think that the fact that svn is included with Leopard would outweigh many of the advantages.

  • Andy Matuschak writes:
    July 4th, 200811:07 AMat

    That’s a really neat idea.

    I’d totally accept a patch to Sparkle that worked this way! :)

    In the meantime, delta updates are near.

  • Matthieu Cormier writes:
    July 4th, 200811:19 AMat

    It is a good idea, especially allowing users to revert versions.

    However, reverting can be tricky if changes were made to the users data when they ran the new version. I’ve been doing this transparently with core data migration. Rolling back these changes opens a whole new can of worms.

    For a simple first implementation there might have to be constraints imposed by the developer as to which version you can roll back to.

  • Elliott Harris writes:
    July 4th, 200812:08 PMat

    To my knowledge, most VCSes that ship with a diff utility bundled with them, don’t have a solid binary diff. If you were using something like Mercurial, you could use an external binary diff app, such as the ones you’ve listed.

    I’d be worried about the actual binary merge though, and how well that’d work. Most of the time to do a binary merge, you need to know everything about the file format (Mach-O Executable being the tough one here), and how to not completely hose the file during a merge.

    Then again, if both of these problems were easily solvable, it isn’t hard to imagine that support for this within Sparkle could be easily implemented.

  • Jesus writes:
    July 4th, 200812:26 PMat

    Hey nice problem to think about but release your app instead of thinking of updating it. Bandwidth is cheap and I doubt that Cloudy is more than 10 MB.
    But to add another thought on this how about using rsync (don’t know if this has a binary merge though) or a simple hashing algo like bittorrent which downloads only the bits that changed, using itself as a reference.

    J.

  • Jens Alfke writes:
    July 4th, 20081:28 PMat

    Whew! Great comments.

    @Daniel — Mercurial makes it easy to set up a server-side repo by uploading a single Python CGI script. I believe there is a facility for “dumb” pulls too over plain HTTP. Code-signing should work fine as long as you can keep the VCS metadata (like the “.hg” dir) outside the tree that gets scanned; I’m not sure if this is possible.

    @Clark — You’re right, SVN should work too, it’s just more difficult to set up the server-side repo. But having it included in the OS is a big point in its favor.

    @Matthieu — Good point; this would definitely exercise the app’s file-versioning code more than a typical upgrade path.

    @Elliott — Binary diffs are quite reliable on all file types. You’re right that merging is problematic, but that’s only an issue if the local file has been modified, which shouldn’t be the case with an app. (Or if it got changed somehow, the VCS can just do a “revert” on it before applying the update.)

    @Jesus — Don’t worry, I’m still working on Cloudy! My mind’s just prone to tossing up random ideas like this, and they can be useful in the future so I like to run with them for at least an hour or two while they’re fresh in my mind.

  • fluffy writes:
    July 4th, 20081:38 PMat

    I would just use rsync. It handles binary and tree merges and doesn’t need to maintain a delta list - it just looks to see which parts of the tree are different with a fairly clever two-level algorithm (md5 for the file as a whole, and then a sliding-window CRC-esque thing to determine within a KB or so of where a change occurred).

  • Pedro Melo writes:
    July 4th, 20081:41 PMat

    Hi,

    the idea is good, but you’ll have problems with most control systems out there. Although they are optimized to transfer deltas (except git that always transfers gzip’ed full objects), neither of the ones mentioned support delta of binary blobs.

    In the base of binary blobs, SVN keeps full copies, hg I don’t know well and I could not understand their explanation at http://www.selenic.com/mercurial/wiki/index.cgi/BinaryFiles. Git I’m sure that he keeps the entire file for each revision. That’s the way Git works, it always keeps the full file.

    So in the end you’ll be downloading entire files, and unless you have a lot of frameworks that don’t change much, you won’t save a lot.

    The original plan was the best one: simple and effective enough - a xdelta of the ZIP file.

    You might gain something with a zip of xdeltas but I’m not sure.

    Of course, you could use a different SCM that understands binary blobs better. I personally don’t know of any (cvs, darcs, svn, git, tla and arch, those I know well, and don’t do what you want).

    Best regards,

  • Pedro Melo writes:
    July 4th, 20081:43 PMat

    Hi, sorry to double post, but fluffy hit the jackpot. Rsync (and librsync) will do exactly what you want.

    Best regards,

  • Jens Alfke writes:
    July 4th, 20082:05 PMat

    @fluffy — rsync is definitely a possibility; I know it works well on binary files because I’ve used it to sync MP3 libraries (it’s very fast when only the ID3 tags have changed.) I’m concerned about how much CPU horsepower it would use on the server, though, compared to a VCS.

    @Pedro — Mercurial uses the same diff algorithm on all files, binary or not. It’s line-oriented, so in a binary file it ends up finding the 0x0A or 0x0D bytes and treating everything between them as “lines” that it compares. Supposedly it does a reasonable job in most cases, certainly better than copying the whole file. But it’s (apparently) also possible to specify an external diff tool, like xdelta.

  • fluffy writes:
    July 4th, 20086:38 PMat

    Well, as another pro-rsync data point, Fink has been using an rsync server to distribute Fink updates for years. They started out using just CVS (which of course is purely diff-based) but they found that was too slow and they prefer people to use the rsync server, and this isn’t even for binaries but for source patches and the like.

    Of course, if you’re just doing binary releases, I don’t see what’s so hard about just having the app be smart enough to know which build version it’s at and request a difflist for releases since that build, with the diffs built via xdelta or the like.

  • Jens Alfke writes:
    July 4th, 20087:56 PMat

    @fluffy — Read my paragraph #4. What’s hard is that an application on OS X isn’t a single file but a directory tree, so the patch would have to consist of a set of diffs for each modified file in the tree, with special cases for added and removed files. Which isn’t rocket science, but once I started musing on possible file formats for such a diff-list I realized that this kind of patching is exactly what any VCS does when it updates, so why re-invent the wheel?

  • fluffy writes:
    July 4th, 20088:03 PMat

    I was aware of that bit, yes. I assumed that the difflist would contain diffs for multiple files, not just one big monolithic one.

  • Jean-Daniel Dupas writes:
    July 5th, 200812:34 AMat

    I have only one thing to say about bundle update: Don’t bother with binary diff.

    It is far too much pain to handle, and has no real benefit for Bundles. In a bundle, each file contains one resource. That’s not like on OS 9 where updating one window, or the application’s icon was altering only one small part of a monolithic file (or fork if you want). Now, updating the icon change only the icon file, updating a window, change only one nib file and so.
    You can significantly reduce the size of an update by doing an updater that replace whole files (instead of the whole bundle), and it will be far more easier to implement especially to create a combo update, and to support atomic update.

  • technorati.com/people/technorati/lordgilman writes:
    July 5th, 20082:34 AMat

    On the topic of reinventing the wheel Firefox has a binary diff update service that works with MacOS X bundles and they’ve got a pretty comprehensive wiki page about it as well.

  • Marc-Antoine Parent writes:
    July 5th, 20084:03 AMat

    zsync is similar to rsync, but shifts the computation burden from the server to the client.

  • Pedro Melo writes:
    July 5th, 20084:44 AMat

    Going back to the original problem, and it boils down to size of the download.

    Is this the major problem to solve?

    I was wondering if using BitTorrent to distribute the full ZIP or even a xdelta patch is not enough.

    It would ease the load on the server…

    Best regards,

  • Jens Alfke writes:
    July 5th, 20088:29 AMat

    @Jean-Daniel: I disagree. Binary diff is easy to do; I’ve used zdelta in the past, which is small and quite straightforward to use. And while you’re right about bundles having lots of small files, the glaring exception is the app executable itself. For example, I cracked open Pixelmator (because it updates often and is an annoyingly big download), and its Contents/MacOS/Pixelmator is 26MB! In many cases, if only localized changes are made to the code in a bug-fix release, a delta would be only a few kbytes in size; that’s a big win.

    @lordgilman, Marc-Antoine: Wow, thanks! Those are both really interesting things I’d never heard of. It’s ironic how writing a blog helps you hear about cool stuff… =)

    @Pedro — I think download size is a problem, but mostly on the client side. But maybe I just feel that way because I have one of those slow American broadband connections (a meager 1.5megabits)!


»  Substance:WordPress   »  Style:Ahren Ahimsa