<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: 96 Characters Ought To Be Enough For Anyone</title>
	<atom:link href="http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/feed/" rel="self" type="application/rss+xml" />
	<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/</link>
	<description>Little boxes made of words, by Jens Alfke</description>
	<lastBuildDate>Sun, 14 Mar 2010 11:32:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2307</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Fri, 01 Feb 2008 22:42:57 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2307</guid>
		<description>@Andrew Thompson — You took the words out of my mouth. The mainstream Unicode APIs I&#039;m aware of (on Mac OS X, Windows, Java) basically do this. 16-bit Unicode isn&#039;t perfect, but people seem to have settled on it as the sweet spot where you get almost all the necessary functionality without so much pain. (There are APIs in CoreFoundation to deal with the edge cases like normalization, but I&#039;ve never had to use them, fortunately.)

On the other hand, 8-bit encodings are troublesome. There actually is a reason why Python and Ruby are putting in all the effort to get off of 8-bit strings. UTF-8 isn&#039;t generally too horrible, but there are too many not-uncommon text manipulations where you operate on characters and character positions, and then it becomes a pain in the ass. Worst of all, if you&#039;re an English-speaking developer, you don&#039;t hit most of those edge cases very soon, unless you&#039;re very careful about I18N testing. Then your app barfs in weird ways when you get users or installations in the rest of the world.

(And using HTML-style escaping as an encoding for working with 8-bit strings in memory, as &quot;Anonymous&quot; suggested, is just nuts. HTML-escaped text is nasty to deal with — even simple stuff like string comparisons becomes difficult and expensive, because there are so many ways to represent any one string.)</description>
		<content:encoded><![CDATA[<p>@Andrew Thompson — You took the words out of my mouth. The mainstream Unicode APIs I&#8217;m aware of (on Mac OS X, Windows, Java) basically do this. 16-bit Unicode isn&#8217;t perfect, but people seem to have settled on it as the sweet spot where you get almost all the necessary functionality without so much pain. (There are APIs in CoreFoundation to deal with the edge cases like normalization, but I&#8217;ve never had to use them, fortunately.)</p>
<p>On the other hand, 8-bit encodings are troublesome. There actually is a reason why Python and Ruby are putting in all the effort to get off of 8-bit strings. UTF-8 isn&#8217;t generally too horrible, but there are too many not-uncommon text manipulations where you operate on characters and character positions, and then it becomes a pain in the ass. Worst of all, if you&#8217;re an English-speaking developer, you don&#8217;t hit most of those edge cases very soon, unless you&#8217;re very careful about I18N testing. Then your app barfs in weird ways when you get users or installations in the rest of the world.</p>
<p>(And using HTML-style escaping as an encoding for working with 8-bit strings in memory, as &#8220;Anonymous&#8221; suggested, is just nuts. HTML-escaped text is nasty to deal with — even simple stuff like string comparisons becomes difficult and expensive, because there are so many ways to represent any one string.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Thompson</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2306</link>
		<dc:creator>Andrew Thompson</dc:creator>
		<pubDate>Fri, 01 Feb 2008 22:29:56 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2306</guid>
		<description>Is there some reason not to just make characters 16 bits wide, specify they are in fact UTF-16, which is indistinguishable from UCS-2 in all the common cases and provide library support for the ugliness that occurs if people actually use something outside the basic multilingual plane (which means China and Japan, probably).

Unicode normalization is a real pain, but that usually only matters when doing I/O and is a library issue.</description>
		<content:encoded><![CDATA[<p>Is there some reason not to just make characters 16 bits wide, specify they are in fact UTF-16, which is indistinguishable from UCS-2 in all the common cases and provide library support for the ugliness that occurs if people actually use something outside the basic multilingual plane (which means China and Japan, probably).</p>
<p>Unicode normalization is a real pain, but that usually only matters when doing I/O and is a library issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob K.</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2305</link>
		<dc:creator>Bob K.</dc:creator>
		<pubDate>Fri, 01 Feb 2008 15:01:16 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2305</guid>
		<description>Real Unicode support is pretty hard if your language has a &quot;character&quot; object. One nice bit of fun is that Unicode encodings can represent the same character once it actually needs to be exchanged with another implementation (which could be a API, a network protocol, or application data format). First, you need to choose an encoding scheme, like UTF-7, UTF-8, UTF-16, and then, within some of those schemes, there are multiple choices like: is &quot;á&quot; one code point (as in ISOLatin1, which is also the first block of Unicode), or is a + accent, or accent + a ? Look at &lt;a href=&#039;http://unicode.org/reports/tr15/&#039; rel=&quot;nofollow&quot;&gt;this&lt;/a&gt; if you dare. So any language implementation is going to have to deal with these issues once it promises &quot;Unicode support.&quot; Also, if looks like if you&#039;re willing to use IBM&#039;s ICU, &lt;a href=&#039;http://icu-project.org/userguide/normalization.html&#039; rel=&quot;nofollow&quot;&gt;there is help&lt;/a&gt;

I&#039;d probably choose a canonical representation of 32 bits for a Unicode character, but perhaps that&#039;s hopeless from a performance point of view.</description>
		<content:encoded><![CDATA[<p>Real Unicode support is pretty hard if your language has a &#8220;character&#8221; object. One nice bit of fun is that Unicode encodings can represent the same character once it actually needs to be exchanged with another implementation (which could be a API, a network protocol, or application data format). First, you need to choose an encoding scheme, like UTF-7, UTF-8, UTF-16, and then, within some of those schemes, there are multiple choices like: is &#8220;á&#8221; one code point (as in ISOLatin1, which is also the first block of Unicode), or is a + accent, or accent + a ? Look at <a href='http://unicode.org/reports/tr15/' rel="nofollow">this</a> if you dare. So any language implementation is going to have to deal with these issues once it promises &#8220;Unicode support.&#8221; Also, if looks like if you&#8217;re willing to use IBM&#8217;s ICU, <a href='http://icu-project.org/userguide/normalization.html' rel="nofollow">there is help</a></p>
<p>I&#8217;d probably choose a canonical representation of 32 bits for a Unicode character, but perhaps that&#8217;s hopeless from a performance point of view.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cimota.com &#38;#187; Blog Archive &#38;#187; Remove your assumptions</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2304</link>
		<dc:creator>cimota.com &#38;#187; Blog Archive &#38;#187; Remove your assumptions</dc:creator>
		<pubDate>Fri, 01 Feb 2008 08:53:53 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2304</guid>
		<description>[...] Jens Alfke&#8217;s latest blog post rambles about a couple of things but finishes on something that I really empathised with: Apple engineer: …and the layout needs to take into account ligatures and contextual forms, where adjacent letters change glyphs depending on neighboring characters, or even merge into a single glyph. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Jens Alfke&#38;#8217;s latest blog post rambles about a couple of things but finishes on something that I really empathised with: Apple engineer: …and the layout needs to take into account ligatures and contextual forms, where adjacent letters change glyphs depending on neighboring characters, or even merge into a single glyph. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ndanger.organism :: blog :: Arc: the blogosphere rages</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2290</link>
		<dc:creator>ndanger.organism :: blog :: Arc: the blogosphere rages</dc:creator>
		<pubDate>Fri, 01 Feb 2008 05:04:40 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2290</guid>
		<description>[...] 96 Characters Ought To Be Enough For Anyone: Jens&#8217; Unicode retort [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] 96 Characters Ought To Be Enough For Anyone: Jens&#38;#8217; Unicode retort [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: newlisp</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2303</link>
		<dc:creator>newlisp</dc:creator>
		<pubDate>Thu, 31 Jan 2008 22:03:10 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2303</guid>
		<description>#!/usr/bin/env newlisp

(constant (global &#039;☼)  MAIN)

(context &#039;☺)

(define (☻ ✄ ☁ ⍾)
    (print ✄ ☁ ⍾))

(define (‽)
  (println {‽}))

(context ☼)

(set &#039;℥ &quot;what &quot;  &#039;ᴥ &quot;the &quot; &#039;ᴒ &quot;dickens&quot;)

(☺:☻ ℥ ᴥ ᴒ)

(☺:‽)
(exit)</description>
		<content:encoded><![CDATA[<p>#!/usr/bin/env newlisp</p>
<p>(constant (global &#8216;☼)  MAIN)</p>
<p>(context &#8216;☺)</p>
<p>(define (☻ ✄ ☁ ⍾)<br />
    (print ✄ ☁ ⍾))</p>
<p>(define (‽)<br />
  (println {‽}))</p>
<p>(context ☼)</p>
<p>(set &#8216;℥ &#8220;what &#8221;  &#8216;ᴥ &#8220;the &#8221; &#8216;ᴒ &#8220;dickens&#8221;)</p>
<p>(☺:☻ ℥ ᴥ ᴒ)</p>
<p>(☺:‽)<br />
(exit)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tonetheman</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2302</link>
		<dc:creator>tonetheman</dc:creator>
		<pubDate>Thu, 31 Jan 2008 02:40:28 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2302</guid>
		<description>my snarky note is that he is right.. ascii was good enough for a long time and in the length of time you have to work on a language you are much better off spending your time implementing something that matters. Later you can go back and fix character issues. Getting the language is more important in the scheme of things.

as for HTML... CSS sucks. period. If you think otherwise you have not had to write it and account for all the variations of browsers. CSS was meant to be written by tools not humans... that said CSS sadly is becoming the black magic of choice for some reason...</description>
		<content:encoded><![CDATA[<p>my snarky note is that he is right.. ascii was good enough for a long time and in the length of time you have to work on a language you are much better off spending your time implementing something that matters. Later you can go back and fix character issues. Getting the language is more important in the scheme of things.</p>
<p>as for HTML&#8230; CSS sucks. period. If you think otherwise you have not had to write it and account for all the variations of browsers. CSS was meant to be written by tools not humans&#8230; that said CSS sadly is becoming the black magic of choice for some reason&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: notbrainsurgery (LJ)</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2301</link>
		<dc:creator>notbrainsurgery (LJ)</dc:creator>
		<pubDate>Wed, 30 Jan 2008 21:58:04 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2301</guid>
		<description>This was my sentiment exactly, after reading Paul&#039;s announcement.

Also his arguments seems little backwards. It goes like this:

1. Python was not decided with right character representation in the beginning and it took a year to change that

2. I would not want to do that, so I will use character representation which is unlikely to satisfy most of the world population and is pretty much obsolete so I would not have the problem.

Vadim</description>
		<content:encoded><![CDATA[<p>This was my sentiment exactly, after reading Paul&#8217;s announcement.</p>
<p>Also his arguments seems little backwards. It goes like this:</p>
<p>1. Python was not decided with right character representation in the beginning and it took a year to change that</p>
<p>2. I would not want to do that, so I will use character representation which is unlikely to satisfy most of the world population and is pretty much obsolete so I would not have the problem.</p>
<p>Vadim</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2289</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Wed, 30 Jan 2008 21:48:48 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2289</guid>
		<description>Joe: I think people mean lots of different things by &quot;using unicode&quot;.

Some are talking about string-representation, and string-access time. Others are talking about automatic io-coding. Both of these things have further questions: what to do for invalid codings, or how to handle un-normalized code points.

But even before you get into what you think the answers to these questions are, start with what exactly you need the language to help with. What part of unicode is least pleasant to deal with directly- so much so that language-support could help? Then how would you keep your new feature from pissing off the people who that &lt;i&gt;doesn&#039;t&lt;/i&gt; bother?

I don&#039;t mean to suggest there aren&#039;t answers: just that they aren&#039;t obvious; this isn&#039;t a &quot;simple&quot; thing by a long shot.</description>
		<content:encoded><![CDATA[<p>Joe: I think people mean lots of different things by &#8220;using unicode&#8221;.</p>
<p>Some are talking about string-representation, and string-access time. Others are talking about automatic io-coding. Both of these things have further questions: what to do for invalid codings, or how to handle un-normalized code points.</p>
<p>But even before you get into what you think the answers to these questions are, start with what exactly you need the language to help with. What part of unicode is least pleasant to deal with directly- so much so that language-support could help? Then how would you keep your new feature from pissing off the people who that <i>doesn&#8217;t</i> bother?</p>
<p>I don&#8217;t mean to suggest there aren&#8217;t answers: just that they aren&#8217;t obvious; this isn&#8217;t a &#8220;simple&#8221; thing by a long shot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://jens.mooseyard.com/2008/01/96-characters-ought-to-be-enough-for-anyone/comment-page-1/#comment-2300</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Wed, 30 Jan 2008 20:45:25 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/01/96-characters-ought-to-be-enough-for-anyone/#comment-2300</guid>
		<description>I don&#039;t know what&#039;s worse, PG not understanding unicode, or all the people whining about it not understanding it either.  Using 16 bit characters doesn&#039;t solve anything, as it still won&#039;t hold tons of code points that are above 16 bit, so you still need to use a variable length encoding.  At which point you might as well just use utf-8.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know what&#8217;s worse, PG not understanding unicode, or all the people whining about it not understanding it either.  Using 16 bit characters doesn&#8217;t solve anything, as it still won&#8217;t hold tons of code points that are above 16 bit, so you still need to use a variable length encoding.  At which point you might as well just use utf-8.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->