SIDEBAR
»
S
I
D
E
B
A
R
«
your sword is glowing with a faint blue glow
May 1st, 2006 by jens

I dabbled in Interactive Fiction, aka Text Adventures, long ago—- I played Adventure on my Apple ][ and Dungeon/Zork on a VAX; I wrote a primitive game in BASIC and later in college partially implemented a language for building games in yacc; and then after graduating, my first serious Mac program was a souped-up and nearly finished version of that language. After that I was too busy with “real” jobs, but others kept the flame alive even after Infocom tanked, building their own adventure-design languages like TADS and Inform and spawning a cult scene of increasing complexity and literary merit. I kicked the tires of TADS and Inform a few years back, then got distracted by other shiny things. You know how it is.

Anyway: now I turn around and there’s Inform 7, a thing of splendor beyond my dreams. Not only does it have an IDE with a really interesting form of integration testing, but the syntax itself has become an ambitious attempt at natural language. I haven’t started coding yet—I have a dreamlike apprehension that the whole concept will melt like cotton-candy if I touch it—but as an example here is an unmodified section of the source code of a real game that I’ve just been playing:


Section 2 – Smells

A thing has a property called scent. The scent of a thing is usually “nothing”.

A procedural rule: ignore the block smelling rule.

Carry out smelling something:
say “From [the noun] you smell [scent of the noun].”

Instead of smelling a room:
if a scented thing can be touched by the player, say “You smell [the list of scented things which can be touched by the player].”;
otherwise say “The place is blissfully odorless.”

Definition: a thing is scented if the scent of it is not “nothing”.

Before printing the name of something scented while smelling a room: say “[scent] from the ”

Now that’s wild!

Signing XML
Mar 10th, 2006 by jens

I’d just begun to muse about signing Atom/RSS articles, when Johannes Ernst began blogging about the topic. I had assumed there must be some easy standard way to do it; but the answer turns out to be that there is a standard, but (according to Johannes) it’s far from easy, so much so that it’s nearly unuseable.

 (The problem in a nutshell: Digital signatures operate on raw data, so to sign something you have to be able to convert it to a sequence of bytes to stream through the signature algorithm. Crucially, to verify the signature you have to be able to convert the something you received into the exact same sequence of bytes. That’s no problem for JPEGs or HTTP bodies. But XML describes an abstract tree of nodes and attributes, with many possible text representations for the same data. If you parse some XML and then turn those data structures back into XML, the text will probably not be exactly the same. Specifying a canonical way to textualize an XML document turns out to be really hard since it has to take into account namespaces, entities, whitespace, character encodings and more. Yeesh!)

 The more I think about this the more worried I get. We are increasingly using XML-based formats for communication—Atom, RSS, Jabber. These formats contain multiple messages in the same document. Distributing these messages may involve copying them from one document into another: for example, when news feeds are aggregated or articles forwarded, and whenever a Jabber message is routed through a server. If we care about the integrity and identifiability of these messages—and the lesson from the current death-throes of email is that we damn well should—we need to sign them, and the original signatures of course need to travel with the messages. But when a XML message/article/entry element is copied from one place to another, its physical manifestation as a byte-sequence will typically change … leading to the XML-signatures quandary.

Johannes’s suggested XML-RSig (“Really Simple Signature”) solution is to avoid transforming the message’s byte sequence. The bytes to sign are the encoded characters from the opening “< " to the closing ">“, and the new sub-element containing the signature is spliced in by character insertion. Anyone copying the message to another XML document has to use an X-Acto knife to cut out the exact message text and insert it into the destination, rather than allowing it to be transformed in any way by an XML processor. (In fact, the destination document even needs to have the same character encoding.)

 I have mixed emotions about this. On the one hand, it certainly is clear and simple. (See Johannes’s list of benefits at the bottom of the post.) What bothers me:


  1. It requires a recipient to keep the original source byte-sequence of the message, if it might ever need to forward/aggregate the message, or even re-verify the signature later. That means altering its storage schema for messages to add a potentially-large blob.

  2. Conversely, it has to generate the new document by splicing in the original message contents. If it normally uses an XML-generation API, that might be awkward to do.

  3. Adding any XML sub-elements or attributes to the original message breaks the signature. A specific example is the “atom:source” element that is added to an Atom entry when it’s copied to another feed, to preserve the identity and metadata of the feed it came from.

I don’t have any good suggestions at this point. I’m writing this as a brain-dump. I’m posting it here because (a) Johanness’s blog doesn’t allow comments, and (b) it seems to be the Blog Way to reply to other people’s posts on your own front page, even if it’ll baffle the rest of your readers…

Just Like The Cool Kids
Jan 7th, 2006 by jens

Like most geeks, as a kid I not only despised the Cool Kids, but also wanted to be one of them too. My own school-age development trajectory took me from a state of total ignorance of what that required[1], to brave attempts to fit in[2], to a realization that different was cool[3].

Anyway: these days being a Cool Kid is within every geek’s reach. Perhaps that’s because the shared culture has exploded into an uncountable number of fragments, each of which is a tribe with its own parallel hierarchies of coolness. Amen to that.
Read the rest of this entry »

Lesser-known scripting languages
Jun 29th, 2005 by jens

Just when it seemed, a decade ago, that the programming world had settled on C++ as the lingua franca, the One Language To Rule Them All, instead we got an explosion of new high-level languages that have risen to popularity. Why did this happen? Chiefly because the World-Wide Web has conditioned users to expect five-second delays before any responses to their actions, which provides an environment ideally suited for interpreted, garbage-collected scripting languages. This movement has been encouraged by server vendors like Sun and IBM who are eager to show Web developers the productivity increases they can get by using such languages, especially after they then install massively powerful servers.
Read the rest of this entry »

Lua and unique strings
Jun 28th, 2005 by jens

Lua is an interesting scripting language. I can’t say I have much familiarity with it; I’ve only read the book, and a couple of papers, and downloaded and built the interpreter (which takes less than a minute). But what I’ve seen of it gives me a warm feeling, like reading a concise little poem, a haiku. It’s a small language, but what’s there is well-considered, and it appears that you can build bigger things (like object models, whether class- or prototype-based) out of its building blocks pretty easily.

The implementation of the Lua 5.0 runtime is also interesting, as described in an excellent paper. One of the smaller details that’s been fascinating me is that Lua, it turns out, uses unique string objects.

When you use any kind of garbage-collected (or ref-counted) framework, string objects accumulate like dust bunnies. I’ve profiled Java apps and seen tens of thousands of instances of java.lang.String. Cocoa apps also have large numbers of NSStrings lying around. Some of these are temporary strings that just haven’t been garbage-collected (or drained from an autorelease pool) yet; but I think a lot more of them are duplicates.

Lua’s approach, which I have to say I hadn’t thought of before, is to make all string instances unique: there are never two different string objects with the same contents. In practice this means that the runtime maintains a big hash table (a Set) of weak references to all the string objects, so before it creates a new string it can first check if one with the same contents already exists. This seems kind of expensive. But consider the advantages:

  • You save memory because there are never duplicate copies of a string lying around. This is a big deal: I know that OS X in particular tends to be RAM-constrained, and often the best way to make something run faster is to reduce its footprint.
  • Allocating fewer objects means less work for the garbage collector.
  • Comparing strings, an extremely common operation, is practically free: it’s a simple pointer compare.
  • It’s easy to implement the Big Hashtable such that the hash codes live inside the string objects themselves. This speeds up the use of these strings as keys for other hash tables, since you don’t have to spend time hashing them over and over. This is a big deal in Lua since its only data structure is a PHP-like combination hashtable/array, which is used everywhere.

One of the secondary issues, which only occurred to me after a little further reflection, is that this really works only if strings are all immutable. Otherwise you can take one string and change it to be identical to another string, yet still a distinct object. This is a bit of a drawback; I have to say that I prefer the Cocoa API, where the immutable NSString has a subclass NSMutableString, to the Java API where the mutable StringBuffer class is not type-compatible with the regular String. (Speaking of which, a Lua tech note presents a typically elegant way to implement an efficient appendable string buffer.)

I wouldn’t be at all surprised if the unique-strings approach predates Lua, perhaps going back to some primordial string-processing language like SNOBOL. Still, it was new to me, and it’s been interesting to think about.

»  Substance:WordPress   »  Style:Ahren Ahimsa