How far back do your newspaper’s online archives go? The U.S. Library of Congress has a database of thousands of newspapers dating back to the 1800s. In the case of over 150 of the papers, there are also digitized, searchable pages. And it’s all available via its new API for Chronicling America (details at our Chronicling America API profile).
Looking into the past using the tools of today is the most fun part of this project. Search results are available on the web site appear with terms highlighted. The API does not have access to highlight information, but it does contain thumbnails. Each page has a permalink back to the Library of Congress site, which displays the page in a zoomable, draggable viewer similar to Google Maps.
You can control highlighting inside the viewer by passing terms, such as this link to the above basketball story. Full imagery is also available as PDF or JPEG 2000.
If the openness of the feature-set doesn’t make it clear, The Library of Congress is focused on making these public domain works widely available. As such, this is an API without any registration or key necessary. That’s pretty wide open.
As the Dead Librarian points out, we’re only looking at the first round of imagery. The goal is huge:
The Library of Congress would like to digitize every newspaper that is in the public domain and is available on microfilm.
Among the interesting technical details is that the API can return linked data via RDF. It’s good to see reference sites, especially government ones, support semantic web formats (there are now 20 APIs in our directory with RDF support.)
While your favorite old paper might not yet be available in digital form, at least you can learn about the historic publishers in your area. Part of the API lets you search titles. There you can learn what years a paper published and where microfilm is available. Of course, if there are images somewhere, there’s a good chance it will eventually make it up on the web and accessible from your applications.