Open data—from both government and private sources—has great potential for creating new products, reducing the costs of doing business, and improving people’s lives. But for open data to truly benefit both business and local communities, there are still some questions that will need to be answered.
Two central sticking points at present are how to ensure ongoing supply to government open data sources when this may be affected by outside politics or internal inertia, and how to define a viable business model when open data is the key raw material. By providing a reliable, up-to-date API for monitoring U.S. Court decisions, the CourtListener API team are forging a path that is helping resolve these two major barriers to seizing the open data opportunity.
The CourtListener website is a not-for-profit project managed by the Free Law Project. It collates data from court websites and other sources, aiming to provide a comprehensive database of all court law opinions made in the United States. So far, the database covers all Federal Appeals Courts decisions and is increasingly adding state courts decisions. (Some financial barriers prevent full data extraction, for example, the Federal District Court charges 10 cents a page, preventing the not-for-profit from extracting decisions from this court via web-scraping.) CourtListener started with the Bulk Data API that provides downloadable access to the full database in XML format, while the newly released Courtlistener REST API includes seven endpoints to be able to query court decision data.
“With our bulk API, it is a giant XML file that people have been using for a couple of years now. It’s used a lot in research, and we track the number of downloads to get a feel for its use,” the CourtListener co-founders, Mike Lissner and Brian Carver, told ProgrammableWeb.
“With the new REST API, we did a soft release a few weeks ago and we’ve had 3 or 4 [early adopters] working on it. One used it as part of his Y-Combinator pitch for example, it’s been used in conjunction with the State Decoded Project to pull in relevant data, and a developer with the Sunlight Foundation is using our data. In general, the trend for using our data is up, our traffic goes up every week.”
Originally, CourtListener was focused on providing a daily update type-service to alert subscribers to new Federal Court decisions, but as the project built a clearer understanding of it’s potential audience, it realized historical data would be just as important. Lissner and Carver explain: “One of the end user groups we thought of was journalists, you can do a search on a topic within your beat, for example, or you can set up alerts to receive details of particular court decisions. But for the daily alerting feature, that meant we only had court opinions from the date we started going forward, but for trend analysis needed in journalism, for example, we really needed to get the back catalog, so the project became about putting all the historical data online as well. Our hope is that it makes the law more accessible, and more accessible to analyze amongst not-for-profits.”
In this way, CourtListener is data mining available government open data sources and making them more accessible for end users. By building an independent database from the source material, the CourtListener team are also ensuring more reliable access to the data outside of depending on government support – financial or political.
On data scraping: CourtListener uses some web-scraping tools to collate court decision and opinions, but has also used a network of volunteers to help clean the data. As with other web-scraping projects, a key barrier can be the lack of scalability in collating and cleaning data. “We use our own CourtListener web-scraping tool, called Juriscraper that will dish out python code via a custom library. There aren’t really readymade tools for this type of web-scraping: there are some general problems, but way more specific problems. Sometimes other people have done the heavy-lifting in scraping the data, but when we looked at it we had to do things like correct the spelling of the word ‘September’, which for some reason, people tend to spell incorrectly. So there’s a certain point where you can spend an hour coding a solution, or 45 minutes to go through the data and correct each line.”
On analytics: For now, the CourtListener team is not heavily invested in monitoring data around how the CourtListener API is being used: “We know there’s a big push towards analytics but from our perspective, we don’t really do much,” Lissner and Carver said. “We throttle at 1,000 hits on an endpoint in an hour, and we monitor general usage patterns. That’s probably where we will let it sit for now.”
On creating the REST API: Lissner and Carver said: “We used the Tastypie toolkit to create the API. It helps you split your data into models and schemas. Tastypie is an extension of Django that can help you create an API in about 20 minutes work. It also let us include a search-powered endpoint.”
Some of CourtListener’s current users include private businesses that mine the data for specific industry verticals. SumoBrain, for example, use the APIs to enhance their search products used by patent attorneys, corporate researchers and inventors.
The State Decoded project – aimed at making legal documents across the States more accessible in API format – also draws on CourtListener.
“We use CourtListener’s API to show site visitors the most prominent court decisions that have cited a given law,” State Decoded Founder Waldo Jaquith told ProgrammableWeb. “When somebody comes to a State Decoded site, and looks up a law, this will provide them with the context that they need in order to understand how that law is actually interpreted by courts. Some laws have even been struck down by courts, but remain on the books because legislators are unwilling to remove them. For these sorts of laws, it’s enormously important to be able to give people immediate access to the relevant court opinion.
“Implementing their API was very easy, and it’s extremely lightweight for folks using The State Decoded—just a few lines of code. CourtListener’s bulk download of court decisions is, necessarily, an enormous file, and some non-trivial computing power would be required to provide the same information that their API provides in a fraction of a second.
“As far as folks visiting State Decoded sites know, there is no API. It’s completely seamless. CourtListener’s API allows people to get all relevant legal information about a single law in one place, without having to pay LexisNexis a subscription fee. That’s very powerful. Nothing like this has been done before.”
“CourtListener is making legal opinions more accessible on a number of fronts,” Raymond Yee, visiting scholar/lecturer in the School of Information at the University of California at Berkeley, told ProgrammableWeb. Yee runs an annual course in open data where students are encouraged to design commercially viable products built off open data sources.
“First, CourtListener provides a single point of access so someone can come to this one site to find decisions for many decisions without having to personally hunt them down on myriad web sites.
“Second, by providing a single point of access, CourtListener lets users see a larger, unifying context in which individual decisions and courts can fit (getting a feel for the overall structure of the legal system is especially important to the non-specialist public).
“Finally, CourtListener is making all this data available for bulk download as well as through its new API to accommodate a range of data analysis scenarios.
“Everyone in a free society should be able to know and understand the laws that govern that society. CourtListener lowers some of the barriers for that access: financial, intellectual, and computational.”
Yee also believes that access to this legal data via API can go beyond fostering a more aware, participatory civic society.
“I have naive understandings about the American legal system, but given how court decisions (especially at the Supreme Court level) can fundamentally restructure our country and its economic/business life, there must be a lot of money riding on understanding, predicting, and influencing how the courts make decisions. What an opportunity to computationally compare legal decisions across jurisdictions with the CourtListener aggregated data, which is actually also near-real time! I imagine someone would want to put in some seed investments to develop machine learning algorithms based on the CourtListener dataset to uncover latent patterns in the history of Supreme Court decisions. (I wouldn’t be surprised if this has been attempted already, but CourtListener opens that game to many more people.) Concretely, as I’ve heard from Brian Carver, we should even be able to compute something from this data to help us win at FantasySCOTUS!
“On a more prosaic level, there might be business opportunities around building tools to assist jurists to find relevant decisions in their own decision making. For journalists monitoring legal decisions around the country, the alert system could come in handy. Entrepreneurs familiar with the legal system should look immediately at the CourtListener data and API and start daydreaming. “
Waldo Jaquith agrees. He believes the impact CourtListener will have on how future open data projects are approached is enormous: “The open data movement need less talk and more action. Combining legal codes and court decisions is a patently obvious thing to do. Surely people have envisioned this for decades. What’s different about CourtListener and The State Decoded is that we actually did it. It’s not perfect, it’s not comprehensive, but it exists, and that’s better than anything else that anybody else has done. That’s how we make people aware of the power of tools like the CourtListener API: by implementing those tools, and telling everybody about it. In 24 months, it will seem quaint that this was considered interesting in 2013,” Jaquith said.
The recent release of the CourtListener API demonstrates how APIs are instrumental in unlocking key data sources. From its value in enhancing civil liberties, to providing a powerful resource for journalists and business, to its potential in helping entrepreneurs create new commercial products, the CourtListener API is a good example of what we can expect from open data projects providing access to source materials via API.