Jul 4 2008

Doing application updates via version-control

I just had an interesting idea, brought on by a post to an Apple developer list asking about software-update mechanisms for Mac applications. The library everyone uses for this is Sparkle, which is wonderful in all ways except bandwidth usage: it updates the app by downloading an entire zip archive of the new version. With many apps nowadays being 10MB or even 100MB downloads, that’s pretty significant.

This could clearly be improved a lot by downloading a delta instead, then using that to patch the current copy of the app. In most cases, using a good algorithm like xdelta3 or zdelta, the data transmitted will be orders of magnitude smaller than the entire app. (Nothing new here; many app updaters already do this, especially for games, and Microsoft apparently has some sophisticated delta-based software update tools in Windows.)

Of course, the delta to be downloaded is specific to which version of the program you already have, as well as which one you’re updating to. This means the server will need to keep a number of deltas on hand, and it and the client need to negotiate which one to use.

An additional problem with using deltas for updating Mac applications is that, on OS X, an app isn’t a single file but a directory tree masquerading as a file. This means that a patch would have to consist of a tree of deltas, one per file.

I started toying with ways of implementing this, but a minute later my brain chimed in with the observation “Hey Jens! This is just like what a version-control system does when updating a working tree from a repository!” That’s what I love about my brain: some fraction of the factoids I cram into it daily will percolate inside and pop out later on at some useful moment. Thanks, brain!

Here’s My Idea.

Release your application in the form of a working tree of a distributed version-control system [DVCS] like Mercurial (or Git, if you must.) That is, the app bundle’s “Contents” directory has a “.hg” subdirectory containing the usual Mercurial metadata. This is sort of unusual in that you’ve checked in the compiled app rather than the source code; but modern DVCSs work fine with binary files.

You also maintain on your server a repository containing all the versions of the application. Whenever you release a new version, you simply check it into that repository as an update of the previous release.

Now, when a copy of the app wants to update itself, it simply does an “hg pull” (or equivalent) of itself from the repository URL. This efficiently determines which files have changed and, for each file, which deltas to download an apply. And it does this without affecting the running app, because the deltas go into the metadata in the “.hg” directory. Then when the pull is complete, the app can prompt the user that it’s time to relaunch, then run “hg update” to patch the actual files and quit and re-launch itself.

This has a number of really nice features:

  1. Checking whether an update is present is quick (DVCS’s are optimized for this).
  2. Downloading updates uses minimal bandwidth because only compressed deltas are sent.
  3. The DVCS automatically handles updating from any old version.
  4. Users can even downgrade easily to previous versions. (In fact, switching between previously-downloaded versions can be done offline, since the DVCS metadata retains all the necessary diffs!)
  5. This mechanism can easily handle beta versions. By treating a beta as a branch, betas can easily be made “opt-in”, with users able to switch between beta and stable mode.
  6. If one user has downloaded an update, another nearby user could efficiently apply the update by pulling directly from that user’s copy of the app over the LAN, instead of having to go back to the original server. (Or alternatively, the DVCS can package the update into a patch-file that can be sent to the other computer and applied there.)

The only drawbacks I can see to this are:

  1. The DVCS software needs to be available on every user’s machine. Since none of these ship with the OS, it would probably have to be packaged inside the app as part of the software-update library. That’s a significant amount of code (Mercurial is about 2MB, and I think Git is a lot bigger.)
  2. The local app’s revision metadata is a full repository that includes prior versions. You clearly don’t want to ship it that way. I believe DVCSs all support a way of pruning old revisions from a repository, though.

I’m not promising to implement this, but it’s so cool that someone needs to give it a try. And if it’s been done already, I’d be interested to hear about how well it worked.


38 Responses to “Doing application updates via version-control”

  • Jean-Daniel Dupas Says:

    OK, I will have a look at zdelta. It will maybe help me to change my mind about binary diff ;-)

  • Robert Lee Says:

    We do something similar with SVN for automatic testing/deployments.

    Our test server (VM) automatically polls trunk every 15 minutes and runs the test suite against it when it changes. The deployment log and test results are sent to the dev mailing list. We also use this server for running performance tests, selenium tests, and so on. Trunk rarely changes as use use dev branches religiously (merging is always done by a pm when tickets are marked as resolved).

    Staging server (VM) polls for new tags every 15 minutes and does the same.

    Production polls for new tags every Wednesday morning at 2am. The deployment log is sent to the dev mailing list. We test this manually using a selenium test when we come in.

    In the repo we have a “change-scripts” folder for executable scripts. These are mostly for database updates (.sql files). These are executed during deployment as well.

  • Aaron Says:

    @Pedro I think you are wrong about Subversion’s binary file handling. See the Subversion 1.4 Release Notes and scroll down to “Binary Delta Encoding Improvements (client and server)”. They are using xdelta to compute differences between all files (text and binary) and use that for storage on the server as well as transfer over the wire.

  • sofoz (LJ) Says:

    Interesting idea, we are using SVN to promote changes to production. We have a repository of config files (different XML templates) so when we are done testing we just check into SVN and then production offices syncs local data with SVN once in an hour. This gives us full control on when, what and who has changed.

  • Kai Backman Says:

    I did this for a live product a few years ago using subversion. In the end it turned out version control systems add a lot of things that aren’t strictly needed by a simple updater, plus the svn client libraries at the time had quite a few bugs. You also end up running svn on your webservers, which might or might not be a problem. In the end the svn-updater was replaced by a simple 3 phase system where the client first downloaded an initial file listing hashes and paths for the most recent files, downloaded all the changed individual files by comparing to locally calculated hashes and finally did an atomic update when the product was started. The benefits were: downloading in the background, simple static files to publish on webserver and one or two magnitudes less code to debug. If I had to do it again I would still roll my own based on some similarly simple strategy and ignore delta updating unless it really turned out to be a problem.

  • two_pi_r (LJ) Says:

    With Git, you don’t necessarily have to include the entire suite —- you could just ship all the plumbing that the git-pull porcelain depends upon.

  • Jean-Denis Muys Says:

    Just one comment on Rsync. It has the nasty feature of using its own network port. This means the app could not update itself from any workplace (such as mine), that blocks most network ports.

    Case in point: I could not install, and can not update, my MacPorts database from work. Fortunately, my Mac is a MacBook Pro, and I simply do it from home. If I had a desktop Mac, I would be screwed.

    And I will not even start to tell you about the mail exchange with the support department at my company about opening the rsync port. It would make everybody cry.

    So please restrain yourself to the plain old port 80. Yeah, it’s a shame.

  • tom Says:

    I think the guys from Oddlabs have used Subversion to add an update feature to their game “Tribal Trouble” … http://tribaltrouble.com

  • jakub Says:

    Hi,
    I was originally posting the question at apple developers.

    I think the idea is nice, but I would rather prefer a straightforward system of doing things:

    1. In configuration give the app a server and name of RSS file describing new versions.
    2. the app automatically check and download needed update files.
    3. automatically unpack the download
    4. automatically run a script altering the app tree if needed and applying binary or any other diffs in any format you want (e.g. bsdiff - a great tool for small binary diffs)

    Updating from v1.0 to 1.5 will go something like download update to 1.1, 1.2, 1.3, 1.4, 1.5 and run all the scripts in that order. I know it doesn’t support downgrades, but e.g. with antivirus software, downgrade is a thing you really don’t want to let your customers do.

    I think using svn, rsync or whatever can be difficult for the software providers. Suppose you have a product used by millions of users that need update every day (e.g. antivirus again). Running own reliable rsync server is really a complicated task.

    On the other hand, there are some providers (e.g. akamai), who are specialized for this and can make your updates available to the world reliably. But these companies are billing you for a megabytes of data downloaded from their servers. So you just want to post there update files as small as possible and not to deploy a whole rsync server.

  • Pedro Melo Says:

    @Aaron, didn’t knew about binary delta in svn 1.4, I stand corrected.

    Thanks,

  • Pedro Melo Says:

    @Aaron, didn’t knew about binary delta in svn 1.4, I stand corrected.

    Thanks,

  • n[ate]vw Says:

    Daniel mentioned code signing in the very second comment, and from what I can tell it won’t be an issue. I was worried that the presence of the repository metadata (which could vary based on local configuration) would mess up the signing, but you can omit parts of your bundle from being verified.

    Apple themselves point out you can patch the signature file itself (http://developer.apple.com/documentation/Security/Conceptual/CodeSigningGuide/Procedures/chapter_3_section_5.html). So code signing should, as Daniel said, not be a problem.

    That said, bending a version control system to update an app doesn’t seem like a great win for me. You don’t need merge tracking, shelving, cloning, pushing and could even go without history. All most apps need is a simple stack of binary diffs, treated like an Undo/Redo queue or even just a Redo queue. Seems like, compared to getting a SCM system updating seamlessly and efficiently, that’s not such a big deal.

  • John Joyce Says:

    How about something more practical?
    Just improve Sparkle.
    If your update is so large and it is a concern, (Apple’s own range from 5mb - 1 or 2 gb!)
    Simply break the file into parts to download (like RAR does) then download, reassemble, confirm success, install.
    Version control is too unreliable.

  • Clark Cox Says:

    John,

    how is version control unreliable? Things like this are well within the primary use of version control (i.e. distributing only the small deltas when something changes so as to not have to upload/download the entire blob of data).

    Breaking the file into parts does nothing to solve the problem, unless the parts are small enough, or the changes localized enough that most of the parts do not have to be redownloaded. After implementing the code to decide which parts to download, and which parts to not dlownload, you’ve basically reinvented the wheel.

  • Jens Alfke Says:

    @John — What Clark said. Breaking into pieces only makes the total download [slightly] larger. Version control is extremely reliable; zillions of developers rely on it every day, and modern systems use digests to verify the integrity of updates.

  • donutello Says:

    This is very close to a problem that I’m trying to solve right now. I’m trying to limit the size of my patch downloads but I need to do it with a static installer because of IT requirements. I have already released several versions of the product:

    0.0 —> 0.1 —> 1.0 —> 1.1 —> 2.0

    The arrows denote the patches I’ve already released, which are built using Package Maker and essentially include an entire copy of the files that have changed. These are huge. I’d like to release a single update to update everything up to the latest respective version. One thing I’ve considered is to create several diffs:
    diff0.0-0.1, diff0.1-1.0, diff1.0-1.1, diff1.1-2.0

    Then my update will consist of applying these diffs in succession. The diff technology needs to be able to update an entire tree and needs to succeed even for a partial tree. What that means is that if someone removed a file foo.txt from their install, what they should end up with is version 2.0 without the file foo.txt. Also, this should work regardless of what version you have right now so applying diff0.0-0.1 to someone who has 0.1 or 1.0 should do nothing.

    I started looking at rsync yesterday and it appears to do exactly what I want. I have run into several bugs in the version of rsync that’s shipped with Leopard but I suppose I could always ship a newer version of rsync with my updater. I’m going to continue investigating rsync.

    My question is: Are there any other binary diff technologies that I should consider? Using rsync seems a little like using a battle tank to flatten a house - it will do the job but is not exactly what it was designed to do.

  • Jens Alfke Says:

    dunutello — Hm, I don’t know of a tool that’ll do exactly that. For generating and applying deltas, xdelta (http://xdelta.org) is I think the best that’s out there. I believe the tool can generate and apply diffs to an entire directory tree. If it tries to patch a file that’s been deleted by the user, the patch obviously can’t succeed, but I don’t know whether the tool treats this as a fatal error or not. (In any case it’s open source so you could modify it to ignore missing files.)

    There are a couple of “update maker” apps for Windows that do exactly what you want … they’re often used by PC game vendors distributing updates. I haven’t heard of a Mac version of any of them.

  • donutello Says:

    Jens, thank you very much for your response. I ended up abandoning using a tree diff solution and instead will be using bsdiff/bspatch for just the executable binaries and shipping all the files for everything else. bsdiff/bspatch are awesome for diffing of executable binaries. The diffs generated are tiny and bspatch comes with all Macs.

Leave a Reply