“Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), and events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’).”
Developers can call either a SOAP or a REST-based service with plain text or XML documents, and receive back the results of the metadata analysis in RDF format. The initial semantic analysis categories are geared towards business-related people and events, with more specialized metadata to come. English is the only language supported today, but the product roadmap indicates that this year will see a release for Japanese, Spanish, and French, and further capabilities for automatic metatagging of visual and audio content. The semantic metadata flows both ways – publishers who submit text for analysis can upload their own metadata, and the service will combine that information with its own generated metadata.
Calais is offering a bounty program for developers who make creative use of the API. The first contest is offering a prize of $5000 for the developer who creates the best Wordpress plugin that provides auto suggestion of semantic categories, a semantic tag cloud, and placement of a global identifier (GUID). This is now listed on our Contests page.
The Reuters technology looks to be based on their 2007 acquisition of ClearForest, whose API and 10 mashups are cataloged here including the example below, TopicTrends. The API itself is managed via ProgrammableWeb sponsor Mashery.
Open API developers previously had access to the Yahoo term extraction service, which has been available since 2005, but Calais ups the ante with a service goal of under 1 second response, a strong feature set, and terms of service that allow for commercial exploitation.