<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Dream apps and the perils of screen-scraping</title>
	<atom:link href="http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/</link>
	<description>Little boxes made of words, by Jens Alfke</description>
	<lastBuildDate>Sat, 04 Feb 2012 05:05:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1675</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Wed, 18 Oct 2006 04:15:51 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1675</guid>
		<description>SuitCase — it doesn&#039;t take rewriting the forums, and it won&#039;t require writing all-new protocols. Most of the job can be done using &lt;a href=&quot;http://tools.ietf.org/html/rfc4287&quot; rel=&quot;nofollow&quot;&gt;Atom syndication&lt;/a&gt; (which many forums already support, although buggily) with the special secret-sauce of the &lt;a href=&quot;http://tools.ietf.org/wg/atompub/draft-snell-atompub-feed-thread-12.txt&quot; rel=&quot;nofollow&quot;&gt;threading extensions&lt;/a&gt;.

These happen to be &lt;a href=&quot;http://benjamin.smedbergs.us/wordpress-atom-1.0/&quot; rel=&quot;nofollow&quot;&gt;implemented for WordPress already, as a plugin&lt;/a&gt;.

That covers a lot of the subscription side of what Hijack would need. There&#039;s more to do; I&#039;m going to write a post describing this in more detail.</description>
		<content:encoded><![CDATA[<p>SuitCase — it doesn&#8217;t take rewriting the forums, and it won&#8217;t require writing all-new protocols. Most of the job can be done using <a href="http://tools.ietf.org/html/rfc4287" rel="nofollow">Atom syndication</a> (which many forums already support, although buggily) with the special secret-sauce of the <a href="http://tools.ietf.org/wg/atompub/draft-snell-atompub-feed-thread-12.txt" rel="nofollow">threading extensions</a>.</p>
<p>These happen to be <a href="http://benjamin.smedbergs.us/wordpress-atom-1.0/" rel="nofollow">implemented for WordPress already, as a plugin</a>.</p>
<p>That covers a lot of the subscription side of what Hijack would need. There&#8217;s more to do; I&#8217;m going to write a post describing this in more detail.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SuitCase</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1674</link>
		<dc:creator>SuitCase</dc:creator>
		<pubDate>Tue, 17 Oct 2006 12:35:07 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1674</guid>
		<description>Jens - Hey, thanks for the response. Reading myself again I hope I didn&#039;t come off too hostile, but I&#039;m passionate about this idea and don&#039;t want it killed :P

Grammatically &quot;apathetic&quot; looks a bit awkward there but I still meant it, though by using the word I was exaggerating your position in an attempt to show that for this app your &quot;programmer&#039;s logic&quot; didn&#039;t fit. While the nicest way to do this would be through standards, this is not in the interest of a user who wants to monitor their favourite forums _now_, and not just a scant few that are modeled on some sort of new protocol (which, as a commenter above mentioned, is kind of already done with Usenet.)

As for the idea that Hijack is a kludge, I dunno. A lot of apps I use feel like they&#039;re messed beyond recognition and could be nicer in an ideal world, but they cope with the problems and I make use of them very effectively - my web browser and my mail client, for one. I really don&#039;t see a future where web forums become abandoned for rich clients, I see them as a minority thing for the quirky power users on quirky platforms like the Mac, and thus I think it&#039;s acceptable that they require some maintenance and not be as perfect and simple as they could.

It&#039;s funny that I&#039;m arguing that for a &quot;My Dream App&quot; competition, but I honestly think it&#039;s far less feasible for a client based on standard forum-provided data to be a good product than one that spends a lot of time trudging through messy forum templates. There&#039;s not enough demand for a forum aggregator for a standard to be adopted, basically, and so I don&#039;t think the examples you cited are directly comparable as that kind of ground-up approach would result in a client for a specification nobody cares about and never will.</description>
		<content:encoded><![CDATA[<p>Jens - Hey, thanks for the response. Reading myself again I hope I didn&#8217;t come off too hostile, but I&#8217;m passionate about this idea and don&#8217;t want it killed :P</p>
<p>Grammatically &#8220;apathetic&#8221; looks a bit awkward there but I still meant it, though by using the word I was exaggerating your position in an attempt to show that for this app your &#8220;programmer&#8217;s logic&#8221; didn&#8217;t fit. While the nicest way to do this would be through standards, this is not in the interest of a user who wants to monitor their favourite forums _now_, and not just a scant few that are modeled on some sort of new protocol (which, as a commenter above mentioned, is kind of already done with Usenet.)</p>
<p>As for the idea that Hijack is a kludge, I dunno. A lot of apps I use feel like they&#8217;re messed beyond recognition and could be nicer in an ideal world, but they cope with the problems and I make use of them very effectively - my web browser and my mail client, for one. I really don&#8217;t see a future where web forums become abandoned for rich clients, I see them as a minority thing for the quirky power users on quirky platforms like the Mac, and thus I think it&#8217;s acceptable that they require some maintenance and not be as perfect and simple as they could.</p>
<p>It&#8217;s funny that I&#8217;m arguing that for a &#8220;My Dream App&#8221; competition, but I honestly think it&#8217;s far less feasible for a client based on standard forum-provided data to be a good product than one that spends a lot of time trudging through messy forum templates. There&#8217;s not enough demand for a forum aggregator for a standard to be adopted, basically, and so I don&#8217;t think the examples you cited are directly comparable as that kind of ground-up approach would result in a client for a specification nobody cares about and never will.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eduo</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1673</link>
		<dc:creator>Eduo</dc:creator>
		<pubDate>Tue, 17 Oct 2006 12:24:19 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1673</guid>
		<description>Shadownight: That may have been stated, but DOM scraping is really not diferent than HTML scraping unless you&#039;re working on a standard (and then, it isn&#039;t either, just more efficient).

It&#039;s not possible to build an &quot;universal scraper&quot; of any sort when there is no standard on what&#039;ll be scraped.

Unless every forum in the world decides to go the route of tagging specific parts of the forum with IDs or with classes this won&#039;t be feasible without hordes of people creating &quot;plug-ins&quot; for each different forum (bear in mind that even generic forum software like phpbb, yabb and bbpress allow for theme customisation, where the theme can be pretty much whatever you want).

I&#039;d dare to say that probably bbpress would be the best candidate for an app of this sort, as it can provide RSS feeds of its forums and posts, and posting can be done through the methods defined in the RSS. But that means the best candidate is the newcomer in the arena. Existing major forum apps would need to change their releases to add a functionality like this and even then older forums that won&#039;t be updated won&#039;t benefit from it.

The route to do this is not to make an app that does scraping, it&#039;s to propose a standard for it and lobby this standard to the major forum programmers (and to bbpress, I insist. Wordpress&#039; weight is too much to ignore and integration with WP itself makes bbpress a probably quick-raiser to the top-five). Along with the protocol an app can be designed that takes advantage of it, but the protocolo needs to exist so other platforms and other forum bases can adopt it as well.

That&#039;s how RSS started, by the way.</description>
		<content:encoded><![CDATA[<p>Shadownight: That may have been stated, but DOM scraping is really not diferent than HTML scraping unless you&#8217;re working on a standard (and then, it isn&#8217;t either, just more efficient).</p>
<p>It&#8217;s not possible to build an &#8220;universal scraper&#8221; of any sort when there is no standard on what&#8217;ll be scraped.</p>
<p>Unless every forum in the world decides to go the route of tagging specific parts of the forum with IDs or with classes this won&#8217;t be feasible without hordes of people creating &#8220;plug-ins&#8221; for each different forum (bear in mind that even generic forum software like phpbb, yabb and bbpress allow for theme customisation, where the theme can be pretty much whatever you want).</p>
<p>I&#8217;d dare to say that probably bbpress would be the best candidate for an app of this sort, as it can provide RSS feeds of its forums and posts, and posting can be done through the methods defined in the RSS. But that means the best candidate is the newcomer in the arena. Existing major forum apps would need to change their releases to add a functionality like this and even then older forums that won&#8217;t be updated won&#8217;t benefit from it.</p>
<p>The route to do this is not to make an app that does scraping, it&#8217;s to propose a standard for it and lobby this standard to the major forum programmers (and to bbpress, I insist. WordPress&#8217; weight is too much to ignore and integration with WP itself makes bbpress a probably quick-raiser to the top-five). Along with the protocol an app can be designed that takes advantage of it, but the protocolo needs to exist so other platforms and other forum bases can adopt it as well.</p>
<p>That&#8217;s how RSS started, by the way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: fluffy</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1672</link>
		<dc:creator>fluffy</dc:creator>
		<pubDate>Tue, 17 Oct 2006 01:42:15 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1672</guid>
		<description>Yeah, except most forum software doesn&#039;t put out compliant XHTML, so you need to use a smile-and-nod HTML parser which will convert to DOM, or a prefilter which de-moronizes the HTML.</description>
		<content:encoded><![CDATA[<p>Yeah, except most forum software doesn&#8217;t put out compliant XHTML, so you need to use a smile-and-nod HTML parser which will convert to DOM, or a prefilter which de-moronizes the HTML.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shadownight</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1671</link>
		<dc:creator>shadownight</dc:creator>
		<pubDate>Mon, 16 Oct 2006 23:09:30 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1671</guid>
		<description>Dear blogger and commenters,

 Jason Harris, Developer at My Dream App has repeatedly stated that Hijack is &lt;b&gt;is&lt;/b&gt; feasible. He suggests to do it not through HTML, but through DOM scraping. For more info, you can check out the Official Feasibility thread in the MDA forums and other threads in the Hijack section: http://mydreamapp.com/forums/viewtopic.php?id=1263</description>
		<content:encoded><![CDATA[<p>Dear blogger and commenters,</p>
<p> Jason Harris, Developer at My Dream App has repeatedly stated that Hijack is <b>is</b> feasible. He suggests to do it not through HTML, but through DOM scraping. For more info, you can check out the Official Feasibility thread in the MDA forums and other threads in the Hijack section: <a href="http://mydreamapp.com/forums/viewtopic.php?id=1263" rel="nofollow">http://mydreamapp.com/forums/viewtopic.php?id=1263</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eduo</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1670</link>
		<dc:creator>Eduo</dc:creator>
		<pubDate>Mon, 16 Oct 2006 21:36:30 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1670</guid>
		<description>I keep scanning mydreamapp.com periodically and keep voting for the seemingly least-popular projects. There were some nifty ideas among all the crap sent and, to me, the filtering hasn&#039;t been that good but, then again, that speaks about the voters more than anything.

I was about to submit an idea to it, back when it started but after seeing what the people were raving about and what&#039;s being the most successful pitches I realised mine wasn&#039;t as flashy and, indeed, too techy for most of the reviewers.

Several other people did try to do this, and seemingly failed because their mockups weren&#039;t pretty (there isn&#039;t any project right not in the semifinals that isn&#039;t shiny and/or animated). In the end I didn&#039;t even see if anyone had proposed anything like I had thought (some did propose other ideas I had thought of, and were summarily buried to the bottom of the lists).

*sigh*, maybe next time.</description>
		<content:encoded><![CDATA[<p>I keep scanning mydreamapp.com periodically and keep voting for the seemingly least-popular projects. There were some nifty ideas among all the crap sent and, to me, the filtering hasn&#8217;t been that good but, then again, that speaks about the voters more than anything.</p>
<p>I was about to submit an idea to it, back when it started but after seeing what the people were raving about and what&#8217;s being the most successful pitches I realised mine wasn&#8217;t as flashy and, indeed, too techy for most of the reviewers.</p>
<p>Several other people did try to do this, and seemingly failed because their mockups weren&#8217;t pretty (there isn&#8217;t any project right not in the semifinals that isn&#8217;t shiny and/or animated). In the end I didn&#8217;t even see if anyone had proposed anything like I had thought (some did propose other ideas I had thought of, and were summarily buried to the bottom of the lists).</p>
<p>*sigh*, maybe next time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Step</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1669</link>
		<dc:creator>Step</dc:creator>
		<pubDate>Mon, 16 Oct 2006 15:58:34 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1669</guid>
		<description>Hmm, my links got eaten in my last comment, and I don&#039;t see a way to edit.

Either way, I see from your last response that you&#039;re approaching this from a different perspective.  I can appreciate that, and I agree that I&#039;d like to see forum software change to include an appropriate standard.  I&#039;m sure it could happen, too, just not sure what would push it forward.  You say &quot;today&quot; and &quot;tomorrow&quot;, but in reality it took a couple years to catch on.  Forum software could take just as long, maybe it will transition faster, maybe slower.  I&#039;ve never created and moderated my own forum, so I don&#039;t really know.

I hope my comments have been helpful.  I&#039;m mainly concerned that it seems your article unfairly attacks two ideas (the whole MDA competition, and Hijack) based on their weaknesses, without seeing if those weaknesses were acknowledged or addressed in any way.  Another words, I&#039;d hope for more balance.  But then again, you&#039;re coming from quite a different perspective, and I&#039;m not a regular reader of your blog, so perhaps I&#039;m missing something vital to the discussion here.</description>
		<content:encoded><![CDATA[<p>Hmm, my links got eaten in my last comment, and I don&#8217;t see a way to edit.</p>
<p>Either way, I see from your last response that you&#8217;re approaching this from a different perspective.  I can appreciate that, and I agree that I&#8217;d like to see forum software change to include an appropriate standard.  I&#8217;m sure it could happen, too, just not sure what would push it forward.  You say &#8220;today&#8221; and &#8220;tomorrow&#8221;, but in reality it took a couple years to catch on.  Forum software could take just as long, maybe it will transition faster, maybe slower.  I&#8217;ve never created and moderated my own forum, so I don&#8217;t really know.</p>
<p>I hope my comments have been helpful.  I&#8217;m mainly concerned that it seems your article unfairly attacks two ideas (the whole MDA competition, and Hijack) based on their weaknesses, without seeing if those weaknesses were acknowledged or addressed in any way.  Another words, I&#8217;d hope for more balance.  But then again, you&#8217;re coming from quite a different perspective, and I&#8217;m not a regular reader of your blog, so perhaps I&#8217;m missing something vital to the discussion here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jens Alfke</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1668</link>
		<dc:creator>Jens Alfke</dc:creator>
		<pubDate>Mon, 16 Oct 2006 15:08:18 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1668</guid>
		<description>&lt;b&gt;Maciej:&lt;/b&gt; My guess is that adding semantic markup to the HTML page (like hAtom) is about the same difficulty as adding an RSS/Atom feed. But it&#039;s less flexible, since it is constrained to show only the same set of entries that are on the page, and the content only in its post-themed state.

&lt;b&gt;SuitCase:&lt;/b&gt; You raise some good points, although I am still not convinced it will work well enough. But perhaps some others have a looser definition of &quot;well enough&quot;.

By &quot;apathetic&quot; I think you mean &quot;antithetical&quot;? I don&#039;t think we&#039;re really at odds in what we want. I&#039;m just looking at it from what I think is a broader perspective: I would rather not have a ton of work go into something that is at heart a kludge.

Remember, today&#039;s &quot;hyper-cool Web _.0 ... weird new protocol&quot; often becomes tomorrow&#039;s everyday reality. RSS used to be that weird new protocol. So did CSS, Flash, JavaScript, tables, inline JPEGs. I&#039;m sure when Mosaic came out the Gopher/Archie folks said &lt;i&gt;&quot;forget this foo-foo ivory-tower &#039;hypertext&#039; stuff, we just want something that works with the FTP sites we already have.&quot;&lt;/i&gt;</description>
		<content:encoded><![CDATA[<p><b>Maciej:</b> My guess is that adding semantic markup to the HTML page (like hAtom) is about the same difficulty as adding an RSS/Atom feed. But it&#8217;s less flexible, since it is constrained to show only the same set of entries that are on the page, and the content only in its post-themed state.</p>
<p><b>SuitCase:</b> You raise some good points, although I am still not convinced it will work well enough. But perhaps some others have a looser definition of &#8220;well enough&#8221;.</p>
<p>By &#8220;apathetic&#8221; I think you mean &#8220;antithetical&#8221;? I don&#8217;t think we&#8217;re really at odds in what we want. I&#8217;m just looking at it from what I think is a broader perspective: I would rather not have a ton of work go into something that is at heart a kludge.</p>
<p>Remember, today&#8217;s &#8220;hyper-cool Web _.0 &#8230; weird new protocol&#8221; often becomes tomorrow&#8217;s everyday reality. RSS used to be that weird new protocol. So did CSS, Flash, JavaScript, tables, inline JPEGs. I&#8217;m sure when Mosaic came out the Gopher/Archie folks said <i>&#8220;forget this foo-foo ivory-tower &#8216;hypertext&#8217; stuff, we just want something that works with the FTP sites we already have.&#8221;</i></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SuitCase</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-2/#comment-1667</link>
		<dc:creator>SuitCase</dc:creator>
		<pubDate>Mon, 16 Oct 2006 11:41:10 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1667</guid>
		<description>Part of the Daring Fireball flood, sorry if you have ten million messages to moderate.

I am immensely in love with the idea of Hijack, I&#039;ve been wishing for a rich client that accesses content from forums for many years and had fantasies about learning how to code myself just so I could try and make it.

Yes, the collection of data will be hard, however I think you are exaggerating the difficulty it would involve. That Dapper thing looks promising enough, but it already seemed mostly feasible to me because some forums use semantic markup (or at least, informative CSS stylings), offer information-rich RSS feeds, or (probably the easiest thing) a default skin\the option to choose the default skin in the user prefs. Imagine - if you had support for the subSilver theme of phpBB alone, you&#039;d probably then support 10% of the forums on the internet. Do the same for the default looks of vBulletin, punbb, Invision and.. heck, I dunno, UBB? Then throw in the ones that can&#039;t be changed, like ezboard and proboards.. And you&#039;ve probably covered 50% or more of all forums, and I bet those &quot;forum definitions&quot; for stuff like subSilver would work with a lot of custom themes as they are almost always heavily based on the default. So even if we take it as &quot;50% can feasibly work&quot;, it&#039;s a useful app already, and all that needs to be accomplished after that is a training mode and\or definition file sharing community, both concepts which are being discussed in the forum for the app.

Regardless of &quot;rarely listen to the user&quot; philosophy, I think your idea that people should start writing protocols and microformats or whatever for forums clearly shows a programmer-oriented perspective that is apathetic to what users like I really want from this application. I don&#039;t care to use an app that works with the five hyper-cool Web 2.0 sites that bothered to implement a weird new protocol. I want something that works with the web as a whole, and I think that&#039;s a lot more feasible than you&#039;re indicating.</description>
		<content:encoded><![CDATA[<p>Part of the Daring Fireball flood, sorry if you have ten million messages to moderate.</p>
<p>I am immensely in love with the idea of Hijack, I&#8217;ve been wishing for a rich client that accesses content from forums for many years and had fantasies about learning how to code myself just so I could try and make it.</p>
<p>Yes, the collection of data will be hard, however I think you are exaggerating the difficulty it would involve. That Dapper thing looks promising enough, but it already seemed mostly feasible to me because some forums use semantic markup (or at least, informative CSS stylings), offer information-rich RSS feeds, or (probably the easiest thing) a default skin\the option to choose the default skin in the user prefs. Imagine - if you had support for the subSilver theme of phpBB alone, you&#8217;d probably then support 10% of the forums on the internet. Do the same for the default looks of vBulletin, punbb, Invision and.. heck, I dunno, UBB? Then throw in the ones that can&#8217;t be changed, like ezboard and proboards.. And you&#8217;ve probably covered 50% or more of all forums, and I bet those &#8220;forum definitions&#8221; for stuff like subSilver would work with a lot of custom themes as they are almost always heavily based on the default. So even if we take it as &#8220;50% can feasibly work&#8221;, it&#8217;s a useful app already, and all that needs to be accomplished after that is a training mode and\or definition file sharing community, both concepts which are being discussed in the forum for the app.</p>
<p>Regardless of &#8220;rarely listen to the user&#8221; philosophy, I think your idea that people should start writing protocols and microformats or whatever for forums clearly shows a programmer-oriented perspective that is apathetic to what users like I really want from this application. I don&#8217;t care to use an app that works with the five hyper-cool Web 2.0 sites that bothered to implement a weird new protocol. I want something that works with the web as a whole, and I think that&#8217;s a lot more feasible than you&#8217;re indicating.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maciej Stachowiak</title>
		<link>http://jens.mooseyard.com/2006/10/dream-apps-and-the-perils-of-screen-scraping/comment-page-1/#comment-1666</link>
		<dc:creator>Maciej Stachowiak</dc:creator>
		<pubDate>Mon, 16 Oct 2006 09:35:04 +0000</pubDate>
		<guid isPermaLink="false">http://mooseyard.com/Jens/2006/10/dream-apps-and-the-perils-of-screen-scraping/#comment-1666</guid>
		<description>How about getting forum software to add &lt;a href=&quot;http://microformats.org/wiki/hatom&quot; rel=&quot;nofollow&quot;&gt;hAtom&lt;/a&gt; markup? Then you can unambiguously pull content from the actual web pages without screen scraping hacks, and the software does not have to be updated to provide the data via alternate formats or protocols.

I&#039;ll add though that I think &quot;screen scraping&quot; on the web (more accurately it would be called markup scraping) is fundamentally different than the original meaning -- trying to infer information from pixels. You already have the text, you&#039;re just trying to scrape the semantics - and markup is supposed to be about semantics.

This is why I think microformats are a good approach because they make the markup richer to let you extract the semantics you want in a reliable way, without the need for out-of-band metadata.</description>
		<content:encoded><![CDATA[<p>How about getting forum software to add <a href="http://microformats.org/wiki/hatom" rel="nofollow">hAtom</a> markup? Then you can unambiguously pull content from the actual web pages without screen scraping hacks, and the software does not have to be updated to provide the data via alternate formats or protocols.</p>
<p>I&#8217;ll add though that I think &#8220;screen scraping&#8221; on the web (more accurately it would be called markup scraping) is fundamentally different than the original meaning &#8212; trying to infer information from pixels. You already have the text, you&#8217;re just trying to scrape the semantics - and markup is supposed to be about semantics.</p>
<p>This is why I think microformats are a good approach because they make the markup richer to let you extract the semantics you want in a reliable way, without the need for out-of-band metadata.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

