Using Wikipedia as a Web Database

John Musser, August 9th, 2007

dbpediaEver want to programmatically query Wikipedia? It’s a tempting dataset with over 1.6 million articles but yet no official API. While there’s been a rumor that the Wikipedia team will supply an API at some point, for now you can use an API we just listed here: the DBpedia API. It’s a project headed by a team of German university researchers and as they describe it “DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.” More from their Introduction:

Wikipedia is the by far largest publicly available encyclopedia on the Web. Wikipedia editions are available in over 100 languages with the English one accounting for more than 1.6 million articles. Wikipedia has the problem that its search capabilities are limited to full-text search, which only allows very limited access to this valuable knowledge-base.

Semantic Web technologies enable expressive queries against structured information on the Web. The Semantic Web has the problem that there is not much RDF data online yet and that up-to-date terms and ontologies are missing for many application domains.

The DBpedia.org project approaches both problems by extracting structured information from Wikipedia and by making this information available on the Semantic Web. DBpedia.org allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to DBpedia data.

Wikipedia articles consist mostly of free text, but also contain different types of structured information, such as infobox templates, categorisation information, images, geo-coordinates and links to external Web pages. This structured information can be extracted from Wikipedia and can serve as a basis for enabling sophisticated queries against Wikipedia content.

The DBpedia.org project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web. We use the SPARQL query language to query this data.

The DBpedia dataset currently consists of around 91 million RDF triples, which have been extracted from the English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese and Chinese version of Wikipedia. The DBpedia dataset describes 1,600,000 concepts, including at least 58,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 557,000 links to images, 1,300,000 links to relevant external web pages, 207,000 Wikipedia categories and 75,000 YAGO categories.

The project also has some interesting utilities like an integrated online debugger and a tool called the Relationship Finder that lets you explore the relationship between any two things in their dataset. In the example below you can see N degrees of separation between Kevin Bacon and Johnny Cash.



It will be interesting to see what sorts of applications get built on this API and if we start to see more public SPARQL/RDF APIs appearing.

Both comments and pings are currently closed.

14 Responses to “Using Wikipedia as a Web Database”

August 9th, 2007
at 4:01 am
Comment by: Danny

Hi John, great to see this on ProgrammableWeb!

See also: LinkingOpenData project, which includes dbpedia – it’s 20 or so independent datasets, all interlinked (scroll down for a diagram). “Collectively, the datasets consist of over one billion RDF triples, which are interlinked by 250,000 RDF links (July 2007).”

August 9th, 2007
at 7:19 am
Comment by: compuneo

Informative arcitlce. Also, readers can check http://www.freebase.com

It would be really great to have API and other schematic versions of huge knowledge sources like Wikipedia.

August 9th, 2007
at 10:00 am
Comment by: John Musser

Hi Danny, good to hear from you and thanks for the pointer to the LinkingOpenData project. Very interesting to see the scale of the linked datasets. And yes, handy diagram too boot! Very good resource to know about.

August 18th, 2007
at 1:35 pm
Comment by: Using Wikipedia as a Web Database | techtrends - exploring new technologies in social, software and media

[...] In the example below you can see N degrees of separation between Kevin Bacon and Johnny Cash. via ProgrammableWeb These icons link to social bookmarking sites where readers can share and discover new web [...]

August 19th, 2007
at 1:30 pm
Comment by: Jimmy

dbpedia: lovely stuff. Real lovely. 10 points to them.

August 24th, 2007
at 1:29 pm
Comment by: DBPedia - a New Way to Play with Wikipedia | Geeks and Technology - Linux Windows Unix system and Making money online

[...] Programmable Web has announced the availability of a new API for automating queries to Wikipedia. That may not sound very exciting, but stay with me – it gets better. [...]

September 28th, 2007
at 12:31 am
Comment by: Kaizenlog » Database 10/08/2007

[...] Using Wikipedia as a Web Database By John Musser dbpedia Ever want to programmatically query Wikipedia? It’sa tempting dataset with over 1.6 million articles but yet no official API. While there’s been a rumor that the Wikipedia team will supply an API at some point, for now you can … ProgrammableWeb – http://blog.programmableweb.com [...]

April 16th, 2008
at 4:01 am
Comment by: Maria Grineva

I would like to introduce an alternative – queries Wikipedia in XQuery.

Here is a demo of WikiXMLDB – a Wikipedia dump was parsed into XML and loaded into Sedna XML database.

http://wikixmldb.dyndns.org/

Enjoy!

February 20th, 2009
at 6:05 am
Comment by: Lenen

Wow, very interesting! I’m probably am able to use this in one of my next projects! Cheers

August 8th, 2009
at 8:17 am
Comment by: Prova Articolo - Bloggerman

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

June 21st, 2010
at 8:48 pm
Comment by: Way To Go! » Blog Archive » Semantic Web in the news

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

September 21st, 2010
at 6:25 pm
Comment by: infomisa.net» Blog Archive » Semantic Web in the news

[...] make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or [...]

May 18th, 2012
at 1:15 pm
Comment by: auto inserate

What’s Going down i’m new to this, I stumbled upon this I’ve discovered It absolutely useful and it has helped me out loads. I’m hoping to give a contribution & assist other customers like its helped me. Good job.

November 30th, 2012
at 9:23 am
Comment by: my latest blog post experienced

This is the right webpage for anyone who wants to find out about this topic.

You realize a whole lot its almost hard to argue with you
(not that I actually would want to…HaHa).
You certainly put a brand new spin on a topic which has been written about for ages.
Great stuff, just great!

Follow the PW team on Twitter

ProgrammableWeb
APIs, mashups and code. Because the world's your programmable oyster.

John Musser
Founder, ProgrammableWeb

Adam DuVander
Executive Editor, ProgrammableWeb. Author, Map Scripting 101. Lover, APIs.