As the 2013 government shutdown (it already has its own Wikipedia page) heads into its eighth day, the degree to which the shutdown has imperiled open government application programming interfaces (APIs) and the applications and services that depend on them is also becoming more well understood. That understanding has started with the various think tanks and media outlets that have attempted explain the nature and rationale for what appears to be arbitrary Web site closures.
It’s a bit hard to make sense of why some sites remain up (some with a “no new updates” banner) while others are redirected to a shutdown notice page—and in many cases it’s puzzling why a shutdown would be necessary at all….
…. For agencies that directly run their own Web sites on in-house servers, shutting down might make sense if the agency’s “essential” and “inessential” systems are suitably segregated. Running the site in those cases eats up electricity and bandwidth that the agency is paying for, not to mention the IT and security personnel who need to monitor the site for attacks and other problems. Fair enough in those cases.
But Sanchez has a more difficult time explaining away the various anomalies that he and others have observed. For example, while NASA.gov’s home page redirects to a “site down” message, some of its sub-domains remain fully operable. Equally confounding; the FTC’s Web site responds as it should to most page requests. But only for a split second before its own “site-down” notice is displayed.
Over on Ars Technica (see Shutdown of US government websites appears bafflingly arbitrary), Cyrus Farivar posted an incredibly detailed examination of more than 50 .gov sites, showing which ones were “down for the count,” which ones appeared wholly functional, and still others that were crippled in some way (for example, no updates). Then, for those looking to keep tabs on what exactly is up, what’s down (and everything in between), there’s a fair amount of Twitter activity on the hashtag #datadown.
But less discussed so far has been the fate of the government APIs that so many applications and other sites depend on. Not that all downed government sites are doing this, but for a Web server to even respond to a Web request with a “site-down” message suggests that a site and any associated APIs could really be up and running. Redirecting a site’s home and other pages to such a message is child’s play for most site administrators. Depending on what steps are taken to indicate a site’s inoperable state, the APIs connected with that site might or might not remain active.
According to Sunlight Foundation software developer Eric Mill, what’s up and what’s down in the way of government sites and APIs is indeed somewhat arbitrary but “not capriciously so.” The nonpartisan nonprofit Sunlight Foundation’s mission is to “use the power of the Internet to catalyze greater government openness and transparency.” Although it’s a third party, the Sunlight Foundation publishes several APIs that provide programmatic access to government data. These are APIs that, in some cases, roll-up data derived through multiple government APIs, data that’s scraped from government Web sites (where no APIs exist), and bulk data downloads.
For sites like the Library of Congress that don’t have an API, the Sunlight Foundation rolls its own using hand-written Python-based screen scrapers. Of course it doesn’t matter how good your screen-scraper is; if the site you’re scraping is down, so too is your ability to extract any data from it. In a telephone interview, Mill told me that “each agency has a lot of discretion to to decide what is essential and what is not. There’s also, committing to maintenance or fixing it if it breaks.”
Mill blipped onto my radar after posting a blog titled Government APIs Aren’t A Backup Plan. Presumably, to the extent that Sunlight’s APIs depend on the reliability of both government APIs and sites (for scraping in the case of the latter), a government shutdown involving seemingly arbitrary site failures could be disastrous. But the Sunlight Foundation was prepared for such a contingency. Even amid myriad government site and API shutdowns, the organization crossed a major milestone last week when it responded to its billionth request via API. That’s because the Sunlight Foundation keeps a cache of the data it retrieves and its APIs work of off that cache instead of directly off any real-time data sources.
But Mill says the government shutdown also reveals a flaw in the thinking that an API can suffice as a wholesale substitute for access to data in bulk. Whereas the Sunlight Foundation is caching the data it retrieves, the same may not be true of other government API consumers. Where applications and other API aggregators might be relying on live, uncached data, the minute the API goes down, so too do those applications and services. In a post titled Government Shutdown Sets Off Data and API Scramble, Miranda Neubauer reported:
Among the many casualties of the government shutdown are the websites and data sources that researchers, civic hackers and others use on a regular basis for a variety of online applications, visualization projects and studies.
The disappearance of resources like data.gov and census.gov has forced those relying on the data to act quickly to find creative solutions or work together to gather backed-up information.
Neubauer goes on to report how Code for America CTO Michal Migurski has organized one such creative solution; a pool of as many backups of the census data as he can find from sources and organizations that were smart enough to keep relatively recent copies of that data on hand. According to the page he set up on the Code for America site, “The files linked here are backups of Census data that happened to be available when the servers went dark.”
According to the Sunlight Foundation’s Mill, census.gov is fully down. The site is down, the API is down as well, and so is its bulk repository. “They removed the data and APIs in full” said Mill. But Mill also cautions against be lured into a false sense of security should someone happen upon a site with APIs that are still running.
One example is federalregister.gov. For all intents and purposes, federalregister.gov is version 2.0 of the older stodgier ofr.gov; the site that keeps track of the government’s business (what new regulations are on the docket, what meetings are coming up, etc.). According to Mill, whereas ofr.gov is legally the official Web site for the Federal Register, federalregister.gov is still a work-in-progress and is therefore thought of as the unofficial site even though it’s the newer slicker version that’s not only more easily consumed by lay-people, it also has a very high quality API for developers (something ofr.gov lacks).
Unlike other federal sites that are down, federalregister.gov is up. The problem according to Mill, is that even though it’s up, there’s no guarantee that it will get updated. The home page of the site makes this clear, saying “Due to an appropriations lapse, the Office of the Federal Register (OFR) is not updating this site.” A blog post on the site that was prepared before the shutdown goes into further detail:
FR 2.0 is not an official edition at this time, and as such, would be expendable. The basic FR posting process is completely automated using programming that was completed many months ago. But if XML bulk data feeds are not available from GPO due to staff furloughs, FederalRegister.gov will not be updated. In that case, please use the official edition of the Federal Register on FDsys, which will remain in service to publish documents that relate to the protection of life and/or property. If FederalRegister.gov experiences a system outage, we will not be able to restore service until funding is provided.
In other words, so long as the government is shutdown, federalregister.gov could go down at any second without remediation. While that would mean lights-out for the API as well, it’s sort of a moot issue. If the site doesn’t receive updates, then the API is pretty much useless anyway (although apps relying on the live API might not experience a disruption). (see important update below)
The same could be said of The Bureau of Labor Statistics’ Web site. The site is up and so is its API. But neither is being updated. According to the site’s home page:
This website is currently not being updated due to the suspension of Federal government services. The last update to the site was Monday, September 30. During the shutdown period BLS will not collect data, issue reports, or respond to public inquiries. Updates to the site will start again when the Federal government resumes operations. Revised schedules will be issued as they become available.
Mill rhetorically asks, “So, is the site up? Or down? I guess it depends on how you define that.”
What are the key take aways whether you’re a part of a government organization or not? First and foremost, if you have applications that make use of API-based data, consider caching that data so as not to disrupt the the functionality of those apps or services. Second, if you’re an API provider, consider making your data available in bulk.
Update (10/8/2013 4:41pm ET): The Sunlight Foundation’s Eric Mill emailed ProgrammableWeb to say that the text on the home page of federalregister.gov (that says the site isn’t being updated) is misleading. While some parts of federalregister.gov are updated by hand, others parts of the site are automatically updated by an XML feed from gpo.gov (the US Government Printing Office). So far, the GPO’s XML feed is receiving updates which in turn means that federalregister.gov’s API will remain current so long as (1) gpo.gov remains alive and continues to be updated, (2) its XML feed to federalregister.gov remains operational, and (3) federalregister.gov remains operational. According to the federalregister’s blog, “The basic FR posting process is completely automated using programming that was completed many months ago. But if XML bulk data feeds are not available from GPO due to staff furloughs, FederalRegister.gov will not be updated.”
By David Berlind. David is the editor-in-chief of ProgrammableWeb.com. You can reach him at firstname.lastname@example.org. Connect to David on Twitter at @dberlind or Google+, or friend him on Facebook.