The Subtle Dangers Of Distributed Objects
Introduction: I wrote this as part of a reply on Apple’s bonjour-dev mailing list, then decided it might be worth publishing more visibly. I’ve found that Cocoa’s Distributed Objects technology is immediately attractive to many developers, while those who’ve used it end up finding that it’s much more complex than it looks. But I haven’t seen much written about the caveats of using it.
I am not saying “don’t use DO” or “DO is broken”! It has valid uses, and it works as designed. But you should be aware of the less-obvious complexities. If you have a single GUI app and a single background agent, that’s a great use-case. If the agent communicates with multiple apps (like the iChatAgent), things get trickier. If you’re going to use DO over the network, you’ve got to be really, really careful.
Distributed Objects is not as simple as it looks at first glance, especially for use over a network. Here are some of the issues I’ve run into:
- Ref-counting bugs can be really hard to track down, because remote objects can be holding onto references to your local objects via their proxies. It’s possible to have reference-loops that span two machines! Note that this means a buggy client can cause its server to leak memory.
- Any message sent to a remote object can potentially throw an exception if there’s a network problem or the remote peer disconnects. To make your app robust you have to handle all such exceptions and clean up gracefully. (A nasty case of this that I’ve seen is where a client’s crash causes the server to crash, which then causes all the other clients to crash…)
- Sending a non-oneway message to a remote object blocks the thread indefinitely until the remote peer sends a response. This effectively lets the peer hold your thread hostage, and can cause your app to lock up if the peer is overloaded or buggy or actively hostile. You can also end up with deadlocks that span multiple computers—good luck debugging those! (In the app I shipped that used DO, we ended up using only oneway messages in our API for this reason.)
- Even oneway message sends can fail if DO’s send buffer fills up. The Mach queue is fixed size; I’m not sure if this applies to TCP too. We found it necessary to build a wrapper layer for sends that would catch the resulting exception and re-send the message again after a delay.
It gets worse if you’re using DO over a network. In most cases, especially in a P2P app, you have to consider the possibility of malicious peers. (Even if your app will only be used in controlled environments, a buggy peer can have similar effects.) This means you can’t trust any input you get from a peer without validating it first. A distributed object API can be really dangerous in such an environment because it blurs the line between local/trusted and remote/untrusted code and data. It makes it harder to identify the points in your code where you have to verify.
Here are some of the possible security problems:
- Any remotely-accessible method has to handle arbitrary parameter values without ill effects. If it takes an
NSString*, it has to survive being passed nil. If that causes a crash, it’s a denial-of-service attack. If it throws an exception, you have to make sure all of your code cleans up state on the way out, otherwise corrupted state could lead to denial of service or worse. - If a remotely-accessible method allocates nontrivial amounts of memory (like creating new objects), then a malicious peer could call it in an infinite loop and run your app out of memory, most likely crashing it. Another DOS attack.
- It’s easy to fall into the assumption that a remote object behaves the way your implementation of it says it does. This isn’t true, in the malicious case, because an attacker could implement their own version of the same interface with arbitrary behavior. A particularly stupid example would be a
RemoteClientinterface with a booleanisLoggedInproperty. You expect that this will return NO until you set it to YES, but what if someone implemented it to always return YES?
You can argue that this just calls for good unit testing and black-box testing as with any public API. Which is partly true; except that you can’t get away with simply stating that “nil values are not allowed for this parameter” or “the effect of calling this twice is undefined”. You have to expect anything. And worse, any bugs not found in testing are not just mundane customer-support issues, but potential priority-zero security holes that could cause really serious problems.
The end result of my experiences is that I don’t think I would use DO again. By the time you’ve refactored the API to be all-oneway, and written wrappers to delegate messaging to background threads, it doesn’t look like regular message-sends anymore. In other words, instead of writingresult = [remoteObject doSomething: param using: param2];by the time you’ve added the delegation, asynchrony and error handling you end up with something like:
NSError *error; if (![dispatcher sendMessage: @selector(doSomething:withObject:) toObject: remoteObject withObject: param withObject: param2 target: self action: @selector(didSomething:) error: &error]) [self handleError: error]; // now keep going while you wait for the -didSomething: call ......
- (void) didSomething: (NSString*)result {
//...now handle the result
}
So you might as well use something lower-level to send the commands over the socket and save yourself a lot of complexity.
July 5th, 2009 at 4:44 PM
I’ve always considered DO to be a massively leaky abstraction, and this is further evidence in support. It would be much better to have an API that understands the differences between local and remote messages rather than trying to hide them.
July 6th, 2009 at 12:40 AM
“Sending a non-oneway message to a remote object blocks the thread indefinitely until the remote peer sends a response.”
Actually you can set up timeouts using setRequestTimeout: and setReplyTimeout:. You then have to catch the exception in case a timeout actually happens. That way you’re not blocking the thread indefinitely.
July 6th, 2009 at 1:40 AM
popurls.com // popular today…
story has entered the popular today section on popurls.com…
July 6th, 2009 at 6:24 AM
Actually Distributed Objects regardless of language and technology has been difficult whether it be CORBA, Java EJB, etc. I think the core of the problem is that most developers treat this interaction as if it was a standard in-process synchronous request-response. However, the complexity of the interaction has gone up substantially. For instance, as one commenter pointed out what happens with timeouts. Should the interaction really be synchronous? Also, what is the distributed object is deployed horizontally - is state interaction required and if so how does state get updated across the horizontal deployment? What about interactions between the local and distributed object - if all objects are local often the interaction is very verbose and talkative. Do you still want to do this over the network to a DO?
I think there is a very important place in development for DOs. However, I think most deployments don’t consider the range of issues. Additionally, it is true the coding DOs can be far more complex even with “code” generating IDE help.
July 6th, 2009 at 8:19 AM
David — I agree totally. The same issue even comes up in the SOAP-vs-REST debate. There’s a kind of idealistic belief that if you’re clever enough you can just abstract away the network and make distributed coding as simple as local coding. Real life is messier.
July 6th, 2009 at 1:41 PM
You forget to mention byref/bycopy issues. I discover recently than by default, whatever you ask for (byref or bycopy), DO will send references unless your send object override -replacementObjectForPortCoder: to return self.
The problem is that you don’t have any way to know which class override it, and which class do not. For instance, NSURL does not support ‘bycopy’ and is send ‘byref’. But I think all the property list classes support bycopy.
July 6th, 2009 at 2:02 PM
Jean-Daniel: I don’t think that’s quite true. If an object implements NSCoding, it can be sent bycopy. (Because DO has to archive the object to send a copy over the wire.)
I remember the problem with NSURL, though! We definitely ran into that in the iChatAgent. It took us a while to discover that all URLs were being sent as proxies, resulting in tons of unnecessary IPCs. We had to change all of those parameters to NSStrings.
July 7th, 2009 at 12:53 AM
The NSObject doc is clear about it:
“NSObject’s implementation returns an NSDistantObject object for the object returned by replacementObjectForCoder:, enabling all objects to be distributed by proxy as the default.
Subclasses that want to be passed by copy instead of by reference must override this method and return self.”
Implementing NSCoding is not enough to tell DO that your object can be send bycopy as DO uses a coder that does not support keyed coding.
And I didn’t manage to figure the exact behavior for ‘deep copy’. I have a case where [encoder isCopy] returns YES for my top level object and NO for second level objects. I check with GDB that it was the same NSCoder instance.
July 7th, 2009 at 6:06 AM
Hi
Most distributed technologies work by reference. It doesn’t make sense to have the local application take over the object life cycle which would be the case with by copy. Typically the distributed environment takes over the responsibility for life cycle management so that local applications are “finding” or looking up the DO.
If you get a DO by copy then the local application is responsible for destroying or freeing the object up explicitly. All in all this generally adds network traffic and/or increasing distributed memory usage - some application always messes up the life cycle management - not a criticism but a reality.
This is typical for most (if not all) other distributed technologies. From the perspective of the local application it should not matter as long as the distributed environment ensures consistency for each distributed call - meaning that the reference used by instance A does not interfere with the reference used by instance B. This consistency and separation has been guaranteed by the distributed environment that I have worked with. Again, from the local perspective it shouldn’t matter whether the distributed environment shares the same object with different local instances as long as there is consistency from each call to each instance. I know this sounds complicated but this is not unusual with distributed object pools which are also fairly common. Good luck.
David
July 7th, 2009 at 8:33 AM
I saw something awhile back about “myths of networking” or something like that, about why RPC/RMI/Rwhatever is complicated (“latency is zero”, “all clients are friendly”, etc.); this seems to be another instance of the same basic problem. Attempts to make remote calls look like local ones are probably always doomed.
July 7th, 2009 at 12:13 PM
Hi
I agree with first sentence of the last comment that this is a recurring theme with many similar false mantras. I am not sure I quite agree that they are doomed.
Instead, I think that in most environments that there isn’t the necessary sophistication to make it work correctly. This has to do with many factors, but there is a dearth of developers/software architects with the requisite background, knowledge, and experience.
Any form of remoting is far more complex than most people realize. The “marketing” of it always glosses over the complexity for the “hyped” benefits. There are other approaches and architectural styles that can accomplish remoting but again these get more complex.
This has been a very fun topic to discuss.
David
March 3rd, 2010 at 10:24 AM
Jens, I agree very much so. In fact, once 10.6 came out and gave us blocks, I wrote a very thin wrapper on top of AsyncSocket that attempts to achieve much of the same “transparency” as DO when communicating over a network (it’s effectively overkill for local IPC) without pretending it’s something that it’s really not. The class is called SDConnection and is modeled very similarly after NSConnection, with a few exceptions and a lot more transparency (it’s based right on top of AsyncSocket). You can get the source code at http://files.degutis.org/ and there’s more explanation of the pros and cons of using my API, at this tumblog post: http://sdegutis.tumblr.com/post/313809023/asynchronous-objc-method-returning-over-a-network