Diffbot has come out of beta announcing the APIs ability to extract content from sites that fit into two page-types: article and front page. The Diffbot engine can determine, just by rendering and looking at a page, what type of page it is. Is it an article or a front page news site? Maybe it’s a profile page from a social network. Diffbot’s artificial brain has been literally trained to know the difference. Developers can make 50,000 calls to the Diffbot API per month for free with additional calls available for fractions of a cent. This pricing should encourage wide adoption and experimentation.
This release is just the tip of the iceberg. The minds at DiffBot have identified 30 page types in all, including product pages, social network profiles, and event pages. Support for additional page types will be released over time, positioning Diffbot with a robust collection of specialized page analytics. Diffbot is language agnostic because it understands page components by seeing the page layout. Using design patterns that cross language and culture it can discern the page components.
Diffbot will be a great democratizer. I can see the Diffbot service being employed regularly to extract content for presentation on alternative platforms. For instance, why depend on in house developers to produce a mobile version of your site? If the motivation or desire is there, any developer could use Diffbot to create a mobile version of an existing site without changing anything about it. MooReader and Editions are two applications in the Diffbot showcase that are already going down this road. Editions by AOL uses the Diffbot API to gather articles as it creates a personalized magazine for its users. The Diffbot front page API is also used to detect the current headline news stories. MooReader is a simple Safari bookmark that washes a web article through Diffbot’s Article API and presents it with a easier to read layout and font.
I had the great pleasure of talking with Diffbot CEO and Co-Founder Mike Tung earlier this week. I enjoyed the opportunity to pick the brain of a man who’s leading innovation in the “internet for robots” space. I asked him about the impact of Diffbot outputs on web standards. I thought to myself that through popularity the Diffbot JSON output could become its own standard. Tung brushed that idea aside saying that Diffbot would instead focus on being able to provide data in existing and emerging web formats, such as those over at Schema.org. That’s a real team player attitude.
Diffbot’s function could come to be a double edged sword. What happens when the content from any web page can be lifted off cleanly with one simple API call? How will content providers react to unsolicited extraction of their content? Would they embrace it or reject it? Would they direct their reaction toward the app designer or Diffbot? My advice to Tung on this one, use the old tried and true phrase: “Guns don’t kill people, people kill people.” You can’t control all uses of the service you provide. The best you can do is to be responsive to the needs of the community and take action wherever possible to respond to reasonable claims. As with any game changing technology, it pushes the envelope, provokes discussion, and challenges existing modes of thinking.
The Diffbot API is one of 9 extraction APIs, surely a space that is poised for growth as dawn of ‘the internet for robots’ continues.