Developers looking for trends in Twitter search are finding it more difficult now that the micro-blogging site has decreased the search history to four days. Previously going back weeks and months, the backlog has steadily decreased, now too short for some types of applications. At the same time, the newer streams have become the go-to API for former search use cases.
Back when you searched Twitter with Summize, the history went back several months. Once Twitter acquired Summize, that continued. Though, as Twitter became more popular, the history decreased. It appears to fluctuate, based on how long it takes to hit its maximum storage. Twitter stores tweets in MySQL and recently dropped a planned move to Cassandra, a system open sourced by Facebook, which some believe could improve Twitter’s search performance.
Twitter refused to comment, citing a policy against making user statistics public. However, its own documentation lists the limit at 1.5 weeks. The same page claimed a history of one month, then three weeks, in early 2009. It was last updated in March. The drop to less than one week likely happened just last month. Damon Cortesi noted the change in a tweet.
Cortesi’s company makes RowFeeder, a tool that performs social media monitoring, adding references to a spreadsheet. The service uses Twitter, among other sources, and now uses the streaming API for the bulk of its work. However, “when new customers sign up,” Cortesi told us, “they ask if we can get back data.” For that operation, RowFeeder uses Twitter search, which is subject to the what is now a four day limit. Cortesi said sometimes it goes back up to five days.
Twitter streams, which we’ve covered previously, are a “push” technology. Rather than an application polling for the latest data, it registers to receive specific searches automatically. Then, when there is new content, Twitter sends it over a persistent connection.
Update. Twitter’s Matt Harris chimed in on the dev list:
To answer your question about the search index history, we don’t publish that information. The size of the index fluctuates based on the number of Tweets being made which means, the more Tweets there are the shorter the index period is. We’re working to improve the duration of the search index and improve the relevance of the results.