Your Data May Not Be Safe From Unofficial APIs

Patricio Robles, December 20th, 2013

“Should we offer an API?” is a question more and more executives are asking. Data is digital gold, and successfully controlling and monetizing access to data is creating new revenue opportunities for many businesses. But executives assuming that they get to answer the question “Should we offer an API?” may be in for a rude awakening as they discover that third parties are increasingly capable of answering it for them.

Case in point: last year, a service called PadMapper became involved in a highly-publicized battle with Craigslist over PadMapper’s use of apartment listings culled from Craigslist. PadMapper’s apartment search engine, which sported a user experience many believed superior to Craigslist’s, had become incredibly popular and the online classifieds giant was apparently not happy that data it believed it owned was being used by a potential competitor.

When Craigslist blocked PadMapper from accessing its website, PadMapper’s existence might have been threatened save for 3taps, “an exchange platform dedicated to keeping public facts publicly accessible.” It gathers data from a variety of sources, including Craigslist, and makes that data available to developers via an open and free API.

Not surprisingly, 3taps quickly found itself in the middle of the spat between Craigslist and PadMapper, a spat which centered on a simple question: did Craigslist own a copyright to the content submitted to it by its users?

In April of this year, a federal court ruled that Craigslist could not prevent PadMapper and 3taps from using its apartment listings under copyright law. Craigslist countered and claimed that, by trying to circumvent its attempts to block 3taps’ access to its service, 3taps violated the Computer Fraud and Abuse Act (CFAA), a law passed in 1986 and intended to address certain computer-related crimes.

A federal judge found Craigslist’s CFAA argument more convincing and ordered 3taps to cease accessing Craigslist’s servers, but that wasn’t the end of the story. An undeterred 3taps indicated that while it would respect the court’s order, it intended to use indirect methods, such as crowdsourcing and the scraping of public search results, to obtain the Craigslist data it continues to provide via its API to this day.

Your data but not your API

Craigslist’s legal battle highlights a challenge more and more companies will face in the coming years. The amount of digital data is skyrocketing and as increasingly sophisticated technology gives companies the ability to mine that data for insight, the demand for data will only continue to grow. But it isn’t just popular services like Craigslist that will be forced to think about how they provide their data and how they try to control its use. Companies of all sizes may soon find that their most attractive data is available through an API not of their own making.

If that is worrisome to executives, even more worrisome is the fact that “unofficial” APIs no longer require significant technical effort to create thanks to scraping-as-a-service companies like Priceonomics, Grepsr, Scrapinghub and Import.io. ProgrammableWeb took a look at how companies such as these are increasingly allowing web accessible data to be used in commercial products.

Related Searches From ProgrammableWeb’s
Directory of More Than 10,000 APIs

Browse The Full Directory

Priceonomics, which originally launched as an online price guide, recently decided to leverage its technology to launch a data services offering because of the “enormous” demand it was seeing from companies looking to obtain data from around the web in a structured format. According to Rohin Dhar, the company’s CEO, “Most of our clients were dedicating large swathes of their engineering teams to acquiring data and it was really costly and difficult. We could come in and delight them by solving the issue of getting uninterrupted feeds of data.”

Dhar says uptake of his service by hedge funds and startups has been particularly strong. “Hedge funds are looking to get an informational edge to inform some of the trades they are considering. Startups need a wide variety of data, but often it’s pricing related data so their marketplaces work better,” he explained.

Priceonomics and Grepsr work with their customers to meet their data acquisition needs. For individuals and companies interested in a self-serve approach, companies like Scrapinghub and Import.io provide tools that allow customers to visually train their systems to crawl a website and organize the scraped data into a format that can be accessed through an API. APIs created using Import.io can be kept private or made available to the public. Scrapinghub’s tool is based on Slybot, an open source crawler, so customers can export their scraping rules and use them on their own.

As for the legal and ethical issues these services create, Import.io suggests that it’s the responsibility of its customers to adhere to the terms of the services they use Import.io to scrape. That’s a suggestion many companies are likely to balk at and perhaps even challenge successfully in the courts if the legal battle between Craigslist and 3taps is any indication.

Even if, however, the law is on the side of companies which possess data, it would appear that the cat is already out of the bag. As more and more services and tools for collecting data and making it available through public and private APIs become available, the question for data-rich companies may not be “Will we offer an API?” but rather “Will we own our API?”

Tags: Issues
Both comments and pings are currently closed.

Comments are closed.

Follow the PW team on Twitter

ProgrammableWeb
APIs, mashups and code. Because the world's your programmable oyster.

John Musser
Founder, ProgrammableWeb

Adam DuVander
Executive Editor, ProgrammableWeb. Author, Map Scripting 101. Lover, APIs.