Coroutines, pt. 2
[I just posted these questions to Apple’s darwin-userlevel mailing list, but they’re also worth cc:ing here as a follow-up to my last post.]
I’ve been experimenting this week with coroutines. Typically these are implemented as a type of cooperative thread, since each coroutine needs a separate stack and register set. I adapted Steve Dekorte’s libCoroutine, which basically just uses ucontext, with malloc’ed stacks.
It strikes me that ucontext is basically no lighter-weight than a pthread, in terms of address-space usage and context switch speed. Is that true? Or is there additional overhead to pthreads besides the stack + registers?
If so, then it might be simpler just to use pthreads, since the API is already in place, and existing system facilities (like ObjC and C++ exceptions, and Cocoa autorelease pools) already know how to work with them. But the cooperative scheduling of coroutines is a bonus in some ways, as it makes the flow of control more deterministic and reduces the need for complex locking and synchronization.
So my second question is whether there’s a clean way to implement cooperative scheduling of pthreads1, i.e. to have a set of threads that transfer control within themselves only via some sort of “yield” call, not by pre-emption? I’ve seen this implemented, for tutorial purposes, in Java using a shared lock. However, a coroutine transfers control to an explicitly-named other coroutine; it’s not at the whim of the thread scheduler. How would one implement that in pthreads? I’m guessing it would involve a lock per thread; but I’m not familiar with the details of the pthreads primitives or API, so advice would be appreciated.
—Jens
1 Yes, I’m aware that cooperative scheduling means you don’t get all the performance benefits of multicore CPUs. I’m not concerned about that because, frankly, my application code uses minimal CPU time; it’s always waiting for sockets, CoreAudio background threads, or user input. So the benefits of threading my code aren’t worth the complexity and error-prone-ness of thread-safety.
May 3rd, 2008 at 9:57 AM
I believe the shared lock technique is also used for Thread Manager threads under OS X, for non-tutorial purposes.
At the Cocoa level, what you’re asking could be done with a condition lock. Each task would be given a numeric ID, and use that as a condition for -lockWhenCondition:. Yielding to a specific routine would be implemented on -unlockWithCondition:, and round-robin scheduling could be easily built on top of that.
I assume there’s a pthread-level equivalent to NSConditionLock, but I haven’t looked.
May 3rd, 2008 at 10:02 AM
Oho! I had been thinking about using a lock per thread, but your idea of a numeric ID on a single lock is very clever. Thanks!
(And yeah, probably no need to use pthreads API if I can do it with Foundation classes.)
May 3rd, 2008 at 10:19 AM
I think pthreads have a lot more overhead than ucontext. You might not be interested in huge numbers of co-routines, but when you create a thread you get its /proc entries, it gets references to all the file descriptors, preemptive context switching might occur during floating point operations, which forces saving the FPU state. From a pure size, do a sizeof(uncontext_t) and then imagine all the places in the kernel that know about your threads… Also, the kernel tracks use/system space execution time for threads.
That said, it hard to imagine coros NOT wreaking havoc on the obj-c runtime!
May 4th, 2008 at 12:17 AM
We’ve had to implement our own super light-weight threading library (just cooperative of course, through a “yield” call) by using ucontext as a university assignment. Cool to see that the pros really are doing this kind of stuff as well :)
Anyway, I don’t remember the details, but I do remember that pthreads has a significant overhead when compared to ucontext.
May 4th, 2008 at 3:40 AM
Probably not quite what you were thinking, but have you looked at Lua? Coroutines were added in Lua 5. Lua is written in 100% ANSI C, so it’s extremely portable. Since Lua’s strength is as an embedded language, you could consider factoring your code such that you invoke Lua to take the responsibility of directly creating/managing coroutines, and then have each coroutine call back into Obj-C to do your real work.
Chapter 9 of the Programming in Lua Book (version 5.0 free online) covers different examples of using coroutines in Lua.
May 4th, 2008 at 8:21 AM
E.Wing — Yes, I’ve been interested in Lua for a while. In the PubSub framework we used Lua as the templating engine for converting news articles into HTML pages for Safari RSS to display.
I don’t believe Lua’s coroutines affect the native (C) execution context, though. They just switch Lua’s interpreted stack around. So calling back into native code from a Lua coroutine wouldn’t be anything like using native coroutines.
May 8th, 2008 at 10:27 AM
Eigenclass had a good survey recently of lightweight threads packages: http://eigenclass.org/hiki/threadring-with-protothreads
May 8th, 2008 at 2:04 PM
Warren — Thanks for the link! Yes, I’ve heard about protothreads-style designs before; they exploit a weird (ab)use of C switch statements that’s called a “Duff’s Device”.
The problem with these is that none of the local variables in your functions are preserved across context switches! It’s really just using a ‘goto’ to jump back into the middle of the function when control returns. They’re definitely as light-weight as you can get, but I’ll bet they’re very hard to program for.