Social Data has become critical to developing business strategies, marketing brands and products, and gaining insights into consumers’ thought processes and buying patterns. The availability of real-time social data, along with the growing trend of Data as a Service (DaaS), has led to the development of platforms that provide business-critical social data on demand. Many platforms not only provide access to streams of social data, but incorporate business intelligence (BI) technologies and analytics capabilities as well.
Social data platforms allow users to access the continuously moving, fast-paced stream of real-time social data from a single location. Eliminating the need for businesses to spend time finding and gathering social data allows those businesses to spend more time analyzing and gaining insights from social data. Gartner briefly mentions the Data as a Service (DaaS) trend in a recently published news article:
The emerging Data as a Service trend is anticipated to significantly grow the market for BI and analytic platforms. Today, the business model is largely “build” driven. Organizations license software capabilities to build analytic applications. However, organizations increasingly will subscribe to industry-specific data services that bundle a narrow set of data with BI and analytic capabilities embedded.
This article takes a detailed look at DataSift and Gnip, two of the leading enterprise social data platforms that provide APIs. To create a side-by-side comparison, online public information was reviewed including official company websites, platform documentation, developer documentation, Quora, LinkedIn and other sources. The comparison charts were created using Google Charts, which is part of the Google Visualization API.
I chose to compare DataSift and Gnip because at the time of this writing, they are the only two companies that are certified Twitter data resellers and have access to the full Twitter firehose. There is one other company (NTT DATA) that is a certified Twitter data reseller. However, the company is based in Japan and focuses on the aggregation of data from Japanese Tweets.
Although there appears to be many companies that specialize in social media analytics, there seems to be very few companies that provide social network data aggregation services. Two other examples of enterprise-grade social network data aggregation platforms include HP Media Aggregation Service and Attensity Pipeline.
DataSift, based in Reading, U.K., was founded by Nick Halstead in 2007 and today is one of the leading enterprise social data platforms. The DataSift platform provides access to both real-time and historical social data as well as the ability to aggregate, filter and extract insights from the most popular social networks and many other sources.
In December 2013, DataSift announced the addition of a new data source to the platform, Sina Weibo, a microblogging site and one of the most popular social network sites in China. DataSift CEO Rob Bailey states in the press release:
China is a high growth and strategic market for many companies, and the ability to better understand Chinese market dynamics, audiences and segments is a strategic priority for many of our customers. With our leading social data platform, customers can now leverage Sina Weibo data to enable better strategy development, build custom analytics and drive improved operational execution based on real time market data.
DataSift also announced in December that it had obtained $42 million in Series C financing. The company plans to use the financing to further develop the platform and the platform ecosystem, and will hire new team members in both the U.K. and U.S. In a blog post about the announcement, Bailey writes:
The speed at which social data is transforming information-based industries is positively staggering. If we look at just three of the core use cases for social, you’ll start to get an idea of just how massive it will be. Brand monitoring is expected to grow 300% to $6B in the next few years and social advertising will grow from $7B [according to BIA/Kelsey] to $24B [eMarketer]. The use of social data in global call centers, a $150B industry that will grow to $340B, will be big as well. Let’s assume that only 10% of the total business has a social use case, it’s still a $34B market to serve. In short, from just three use cases, social data will power a $70B economy.
Back in December, ProgrammableWeb reported that DataSift launched VEDO, a new core processing engine that makes it possible for DataSift users to utilize machine learning so that social data can be automatically categorized based on meaning and context.
Gnip is based in Boulder, Colo., and was founded in 2008 by Eric Marcoullier and Jud Valeski. Gnip is a leading social data provider and the platform offers both real-time and historical social data from networks including Twitter, Tumblr, Facebook, Foursquare, Instagram and many others.
Last month, Gnip announced the addition of VK as a new data source. VK is one of the largest social networks in Europe and is the largest social network in Russia. VK data is offered via the Gnip Data Collector product and VK was added to the platform just in time for the 2014 Sochi Winter Olympics. Gnip CEO Chris Moody states in the press release:
Social media has seen tremendous growth worldwide, and our customers are craving access to social conversations happening around the globe. VK provides unparalleled social data from more than 60 million user accounts in Russia alone, and is one of the best sources available to companies looking to understand their Russian- and Ukrainian-speaking audiences.
In September 2013, Gnip announced the launch of the Search API for Twitter, which makes it possible to programmatically access historical Twitter data and have it delivered in real-time. The Gnip Search API provides query-based access to 30 days of Twitter historical data and the data is made available within 30 seconds of being generated on Twitter. Moody states in the press release:
Until today, a user experience built on the indexed firehose of Tweets was only available to a select few companies who could afford to build and maintain such a costly product. Gnip is making this available to any product or software company to incorporate into their solution, driving the creation of a whole range of new products.
In April 2013, ProgrammableWeb reported that Gnip added six social data sources to the Enterprise Data Collector Product. These included bitly, Instagram, Reddit, Stack Overflow, Panaramio and Plurk.
Both DataSift and Gnip provide social data from many of the most popular social networks such as Twitter, Facebook, Google+ and Instagram. Gnip offers two types of data sources: complete access sources and managed public API sources. A complete list of social data sources and their access type can be viewed on the Gnip website. DataSift also offers managed sources for Facebook Pages, Google+, Instagram and Yammer.
At the time of this writing, Gnip offers twice as many real-time streams of social data than DataSift. Gnip offers real-time streams of social data in the Firehose, PowerTrack, and Data Collector Products. Both companies provide Streaming / Real-Time APIs as a way to access the platforms and capture information in real-time.
Both DataSift and Gnip are certified Twitter data resellers with access to the full Twitter firehose.
Both DataSift and Gnip provide data enrichments such as language detection and expansion of shortened URLs. At the time of this writing, DataSift offers additional data enrichments not currently available from Gnip such as sentiment analysis, topic detection, and entity extraction.
DataSift also offers automatic categorization of social data based on its meaning via DataSift VEDO.
Both DataSift and Gnip offer social data aggregation that can be accessed using a single API connection. DataSift has developed its own programming language called Curated Stream Definition Language (CSDL) that is used for concise filtering and augmenting objects in the stream.
At the time of this writing, neither DataSift nor Gnip offer freemium accounts, however, both companies do offer a free trial of their platforms.
DataSift offers a limited “Pay-As-You-Go” pricing plan in addition to a subscription plan. Both companies charge additional fees, examples include licensing and enriched data.
Both DataSift and Gnip offer basically two types of APIs; real-time streaming and historical. Both companies offer a single API connection for accessing their platforms.
Gnip currently has three sets of APIs available:
DataSift has several APIs available including REST API, Streaming API, Managed Sources API, Historics API, and Push API.
Developer support differs somewhat between DataSift and Gnip. DataSift has a separate developer portal that includes API documentation, developer forum, blog, an API console, client libraries, and other developer information.
Gnip does not have a separate site for developers, however, there is easily accessible documentation available for developers to reference.
DataSift has a Twitter account specifically for developers, while Gnip has a new engineering Twitter account that does post about developer related topics and developer events.
At the time of this writing, it appears that DataSift does not have any developer events scheduled yet for 2014. Gnip appears to be participating in the 2014 SXSW Interactive Festival and is one of the sponsors of the DeveloperWeek Hackathon that will take place on February 15-16, 2014 at the Rackspace offices in San Francisco, CA.
Both DataSift and Gnip offer many of the same social data sources and there are some similarities between the two platforms. However, the two companies seem to be taking different approaches to providing social data.
DataSift has created its own programming language that can be used to create complex data filters and to augment the data. DataSift also offers advanced social data enrichments and utilizes machine learning, which allows social data to be automatically categorized based on meaning and context.
Gnip offers several social data products including Data Collector, a turn-key solution that provides users a way to collect social data from up to six different social APIs using a single connection (the Gnip API). Data Collector also removes duplicate data and normalizes the format across all APIs.
Choosing a social data platform really depends on the specific project requirements, business goals, marketing strategies and budget. DataSift and Gnip are both well-established, reputable companies that provide secure and reliable social data solutions.
By Janet Wagner. Janet is a data journalist and full stack developer based in Toledo, Ohio. Her focus revolves around APIs, open data, data visualization and data-driven journalism. Follow her on Twitter, Google+ and LinkedIn.