What’s a scrAPI? A scrAPI, which at this point is more of an idea than a thing, was recently described by Thor Muller in his blog as a type of community-built API that provides a programming layer above web sites that don’t otherwise have an API. This intermediate layer, which exists independently of the destination web site, in turn does the dirty work of screen-scraping of raw HTML from the source and returns just the relevant data in some cleaner XML format. Thus a collaboratively built and maintained set of code for data access from any source.
It’s an interesting idea. Many complications of course. Not the least of which is that many companies object to scraping, be it for reasons of load, stability, or copyright. Good example being Craigslist vs. Oodle.
In this follow-up post Thor notes that the original coiner of the term was Paul Bausch back in 2002. Which in turn was in reference to scraping Amazon data. And interestingly, it was just this sort of scraping that was a key driver in leading Amazon to subsequently build a real API: people are going to do it anyway, let’s formalize and leverage it.