<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: hash musings</title>
	<atom:link href="http://jens.mooseyard.com/2008/02/hash-musings/feed/" rel="self" type="application/rss+xml" />
	<link>http://jens.mooseyard.com/2008/02/hash-musings/</link>
	<description>Little boxes made of words, by Jens Alfke</description>
	<lastBuildDate>Sat, 04 Feb 2012 05:05:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: George Bailey</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2328</link>
		<dc:creator>George Bailey</dc:creator>
		<pubDate>Wed, 20 Feb 2008 14:03:28 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2328</guid>
		<description>Considering the size of the library, you&#039;d have more collisions than a Costa Rican highway.</description>
		<content:encoded><![CDATA[<p>Considering the size of the library, you&#8217;d have more collisions than a Costa Rican highway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fluffy</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2327</link>
		<dc:creator>fluffy</dc:creator>
		<pubDate>Tue, 19 Feb 2008 21:10:38 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2327</guid>
		<description>Well, I think the whole relevance of the universal library as a thought experiment (in a CS context, anyway) is that it&#039;s an application of the pigeonhole principle and basically says that the only way to index the content is to have the content itself.</description>
		<content:encoded><![CDATA[<p>Well, I think the whole relevance of the universal library as a thought experiment (in a CS context, anyway) is that it&#8217;s an application of the pigeonhole principle and basically says that the only way to index the content is to have the content itself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2321</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Tue, 19 Feb 2008 21:03:22 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2321</guid>
		<description>Fluffy — I know. The QR-Code has lots of overhead for error-correction, which I&#039;m sure would mess up the pretty pictures. But somehow this seems like a minor issue compared with the feasibility of the Universal Library in the first place.</description>
		<content:encoded><![CDATA[<p>Fluffy — I know. The QR-Code has lots of overhead for error-correction, which I&#8217;m sure would mess up the pretty pictures. But somehow this seems like a minor issue compared with the feasibility of the Universal Library in the first place.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fluffy</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2320</link>
		<dc:creator>fluffy</dc:creator>
		<pubDate>Tue, 19 Feb 2008 17:10:18 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2320</guid>
		<description>6 megapixels != 6 million bits of useful information though.  Also no two scans of a book (much less photographs) will have the same bit-for-bit information even if all alignment/registration/etc. is perfect.  There&#039;s always sensor noise and so on, and that noise also means that even if you heavily quantize things, then the fringe area between quanta will still vary.</description>
		<content:encoded><![CDATA[<p>6 megapixels != 6 million bits of useful information though.  Also no two scans of a book (much less photographs) will have the same bit-for-bit information even if all alignment/registration/etc. is perfect.  There&#8217;s always sensor noise and so on, and that noise also means that even if you heavily quantize things, then the fringe area between quanta will still vary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2326</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Tue, 19 Feb 2008 16:22:14 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2326</guid>
		<description>Another thought — ISBNs reminded me of barcodes. Borges&#039;s librarians would appreciate having barcodes on their books so they can scan them when they&#039;re checked out or returned. But the barcode would need to be something like six million bits long, to number each book. A UPC-style barcode of that capacity would be about ten miles long and would stick off the edge so far they&#039;d have to wrap it around until the book looked like one of those prizewinning balls of string. Not feasible.

But if you used a two-dimensional barcode like a &lt;a href=&quot;http://en.wikipedia.org/wiki/QR_Code&quot; rel=&quot;nofollow&quot;&gt;QR-Code™&lt;/a&gt;, it might be feasible. You can fit six megapixels on the back of a book, we do it all the time when we print out our snapshots. You&#039;d need a really high-resolution barcode scanner, though.

Then there&#039;s the fun of considering that the barcodes would contain every possible combination of those six million black and white pixels, meaning every possible halftoned image at that resolution. Which of course means that...

&lt;i&gt;They print the barcode on the &lt;b&gt;front&lt;/b&gt; of the book, and make it the cover.&lt;/i&gt;

The question is, how many of the vast numbers of books with a reasonable facsimile of the Mona Lisa as the cover barcode have contents that discuss the Mona Lisa, or Leonardo Da Vinci (or Marcel Duchamp, for that matter?)</description>
		<content:encoded><![CDATA[<p>Another thought — ISBNs reminded me of barcodes. Borges&#8217;s librarians would appreciate having barcodes on their books so they can scan them when they&#8217;re checked out or returned. But the barcode would need to be something like six million bits long, to number each book. A UPC-style barcode of that capacity would be about ten miles long and would stick off the edge so far they&#8217;d have to wrap it around until the book looked like one of those prizewinning balls of string. Not feasible.</p>
<p>But if you used a two-dimensional barcode like a <a href="http://en.wikipedia.org/wiki/QR_Code" rel="nofollow">QR-Code™</a>, it might be feasible. You can fit six megapixels on the back of a book, we do it all the time when we print out our snapshots. You&#8217;d need a really high-resolution barcode scanner, though.</p>
<p>Then there&#8217;s the fun of considering that the barcodes would contain every possible combination of those six million black and white pixels, meaning every possible halftoned image at that resolution. Which of course means that&#8230;</p>
<p><i>They print the barcode on the <b>front</b> of the book, and make it the cover.</i></p>
<p>The question is, how many of the vast numbers of books with a reasonable facsimile of the Mona Lisa as the cover barcode have contents that discuss the Mona Lisa, or Leonardo Da Vinci (or Marcel Duchamp, for that matter?)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2325</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Tue, 19 Feb 2008 16:07:50 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2325</guid>
		<description>No cigar. There&#039;s nothing intrinsically binary about a hash function; the algorithms are just specified and implemented that way to make them fast on computers.

I was thinking of the fact that any useable hash function would completely break down on a data set the size of Borges&#039;s library. According to the Wikipedia article, it contains nearly 2 x 10^1,834,097 books. But even the longest SHA function only has 2^512 (about 10^154) possible values. That means there are very roughly 10^1,833,943 collisions for &lt;i&gt;every&lt;/i&gt; hash value. Oops. To get around that, we&#039;ll need an SHA-10,000,000 algorithm.

But then again, you could argue that Dewey Decimal (or Library of Congress) numbers aren&#039;t unique either; they just narrow down a particular topic, so there are probably dozens of books with any particular number. Which makes SHA a good analogy. The real fallacy would be to equate SHA with ISBN, I suppose.</description>
		<content:encoded><![CDATA[<p>No cigar. There&#8217;s nothing intrinsically binary about a hash function; the algorithms are just specified and implemented that way to make them fast on computers.</p>
<p>I was thinking of the fact that any useable hash function would completely break down on a data set the size of Borges&#8217;s library. According to the Wikipedia article, it contains nearly 2 x 10^1,834,097 books. But even the longest SHA function only has 2^512 (about 10^154) possible values. That means there are very roughly 10^1,833,943 collisions for <i>every</i> hash value. Oops. To get around that, we&#8217;ll need an SHA-10,000,000 algorithm.</p>
<p>But then again, you could argue that Dewey Decimal (or Library of Congress) numbers aren&#8217;t unique either; they just narrow down a particular topic, so there are probably dozens of books with any particular number. Which makes SHA a good analogy. The real fallacy would be to equate SHA with ISBN, I suppose.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fluffy</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2324</link>
		<dc:creator>fluffy</dc:creator>
		<pubDate>Tue, 19 Feb 2008 16:05:04 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2324</guid>
		<description>Or is the fallacy that the DDC is intended for browseable categorization and a hash is only useful for a lookup if you already know the content?</description>
		<content:encoded><![CDATA[<p>Or is the fallacy that the DDC is intended for browseable categorization and a hash is only useful for a lookup if you already know the content?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rosyna</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2323</link>
		<dc:creator>Rosyna</dc:creator>
		<pubDate>Tue, 19 Feb 2008 15:58:25 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2323</guid>
		<description>Pfft, would have been more amusing if you chose two quotes that have the same hash.</description>
		<content:encoded><![CDATA[<p>Pfft, would have been more amusing if you chose two quotes that have the same hash.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jcburns</title>
		<link>http://jens.mooseyard.com/2008/02/hash-musings/comment-page-1/#comment-2322</link>
		<dc:creator>jcburns</dc:creator>
		<pubDate>Tue, 19 Feb 2008 14:10:47 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2008/02/hash-musings/#comment-2322</guid>
		<description>...that hashing isn&#039;t &#039;decimal&#039; by definition, if it&#039;s binary? Not very &#039;dewey,&#039; either.</description>
		<content:encoded><![CDATA[<p>&#8230;that hashing isn&#8217;t &#8216;decimal&#8217; by definition, if it&#8217;s binary? Not very &#8216;dewey,&#8217; either.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

