Metrics for Content APIs: An NPR Case Study

Guest Author, September 15th, 2010

This guest post comes from Daniel Jacobson, Director of Application Development for NPR. Daniel leads NPR’s content management solutions, is the creator of the NPR API and is a frequent contributor to the Inside NPR.org blog.

NPR APIIn my previous post, I discussed how companies can make money by using their content APIs to improve internal processes to enable rapid product development and to extend their reach. To successfully do this, however, this also requires a strong plan on how to capture appropriate metrics for the API.

At least from NPR’s perspective, the primary goal of the API is to get as many eyeballs on the content as possible. To achieve this goal, there are several ways to track the content as it travels through the API, each of which serve their own role. The following are the four key metric types that NPR is targeting:

  • Request
  • Response
  • Impression
  • Loyalty

Each of these are important in determining the true reach of the API, although their respective values to the overall equation are different. Moreover, each comes with its own challenges in capturing and parsing the data. Below is NPR’s definition of each of these metrics along with some basic data that NPR has (or doesn’t have) so you can see the relevance of each to our API strategy.

Request
The marketplace standard right now for tracking API metrics is based on API requests. While this metric is very useful and important, it is only a segment of the metrics needed to really determine the success of a content API. This is because requests do not translate into actual consumption – they merely create opportunties for consumption. To put it another way, tracking requests reveals information about how developers use the API even though the API itself is really just a means to get content in front of consumers. So, it is critical for producers of content APIs to be able to track how the content is consumed when distributed through the API.

The following chart details the growth of the NPR API over time in terms of API requests.

Response
Although the request metrics tell us what the developer asked for, they do not tell us what was delivered to the developer. Depending on the nature of the API, the response may include multiple items for each request and/or they could include warning and/or error codes and other information that gets returned to the user. A common example of this is an RSS feed which receives a single request but can deliver many stories. If the API captures only request metrics, it is missing the specifics around what was returned to the API developer.

The response data is critical as it tells you what content is potentially available to end-users.

Although NPR received more than 72 million requests to the API in August, it delivered over 1.3 billion stories over that same timeframe. This translates into roughly 18 stories per request. Clearly, by capturing only the request data, you are missing a very important part of the story.

Impression
Impressions are the first point in the metrics calculation where actual consumption is captured. By an impression, I mean a page view (or equivalent) where an end-user experiences the content that was delivered by the API. Generally, the way this metric gets captured in APIs is by putting an image beacon in a piece of the content. The beacon renders from the API’s server when it gets presented by the calling app, providing you with information about the content and its consumption every time it is viewed.

This is a very important metric because it is the impression that measures the number of eyeballs that see the content delivered by the API. For example, there are likely some requests that never get presented to a user if the calling application never presents them. Additionally, there are other requests that the calling application caches and gets presented multiple times to a user for that one request. Moreover, because a single request could return multiple items in a response, depending on how the requesting application handles it, there could be many impressions for that single request. As a result, the impression numbers could be substantially higher or lower than the request and/or response totals, depending on how the calling application interacts with the API. Because advertising revenues for many content APIs are dependent on actual consumption numbers (and not server traffic), the impression metric is much more important than the request or response totals.

The above image demonstrates how the NPR News iPhone App accesses the NPR API. In our app, a single API request is made to present the screen on the left. In that request, 25 stories get returned. Each of those stories contain the full story content, including images, audio and full text. The list view of all 25 stories garner a single page view. Clicking through one any of those stories results in the screen on the right, which is the full story page. The full story page garners another page view even though the iPhone app does not make another API request for it. In fact, if I launched the app, went to the Science page and looked at every story page from that list, it would result in 26 page views, all stemming from a single API request.

Loyalty
Once an impression is realized by the API, the next step is to create some relationships and loyalty around your content. After the user consumes a piece of content, did they carry on to another piece? Or do they have trackable sessions in the system already, perhaps from a different platform (whether delivered from the API or not)? There are several ways to try to make these relationships, but this is quite challenging and NPR is in the very early stages of trying to handle this. Our approach so far has been to use impression-related data mixed with query string parameters and session-related data (such as cookies).

A tangible example of this is if the content that gets delivered contains an audio or video asset. Generating an impression on the story is the first step. If the user then clicks on the audio, that click-through should also be attributed to the session attached to the API impression by passing tracking information to the audio URL so the audio piece can be related to the page view. By creating opportunities for the API content to create serendipitous experiences with other content, you are building a strong, more sellable content API.

As I mentioned before, capturing data for each of these metrics offers unique challenges. For example, to improve performance on our APIs, NPR uses a suite of caching layers. Moreover, the API has a lot of rights exclusion algorithms and transformations. As a result, it is increasingly difficult to ensure successful tracking of all of the metrics for all of the requests. Tracking impressions from APIs offers unique challenges since much of the content is getting distributed in XML, JSON or something comparable. How do you put a tracking beacon in the content? In which field should it go and how can you be sure that the calling application will consume that field? If you put it in multiple fields to ensure consumption, how do you prevent duplicate hits for a single page view? Finally, assuming that you are successful in accurately tracking metrics for each of the above, how do you convert them into a compelling story, one that offers value to the business?

I do not want to imply that NPR has solved all of these problems. Rather, we have built systems that do help us capture information about all phases of these metrics. But these systems are not bullet-proof. They do, however, give us more data about the content consumption from the API than merely request-based metrics, allowing us to learn more about how the API helps us achieve of greater goal: to increase the total number of eyeballs on our content.

Both comments and pings are currently closed.

2 Responses to “Metrics for Content APIs: An NPR Case Study”

September 15th, 2010
at 8:28 am
Comment by: John S. Erickson, Ph.D.

Great post, Daniel!

I’m particularly interested in two of the issues you raised towards the end concerning the challenges of tracking, including:

1. Caching layers
2. Rights exclusion algorithms and transformations

You note that these make it “difficult to ensure successful tracking of all of the metrics for all of the requests…”

I wonder if you could be more specific, and/or give an example? Thanks!

John

September 19th, 2010
at 10:21 pm
Comment by: Daniel Jacobson

John,
Thanks for the comment. Here are a few examples that demonstrate some of the difficulty…

If the system has multiple tiers of caching, you need to make sure that the requests, responses and impressions get captured. It may not be enough to capture requests by handling requests at the apache level because apache may never know about the request if a caching tier sits on top of it.

Similarly, in the NPR system, we produce responses based on the initial query and then cache the results. After the caching, we run it through an exclusion/transformation engine that could eliminate items from the response based on rights. Those then get cached again. Because those cached results could be different, it is important to know which one will accurately represent the results delivered to the user, which can be difficult as the caching layers and exclusion rules get more complicated.

Follow the PW team on Twitter

ProgrammableWeb
APIs, mashups and code. Because the world's your programmable oyster.

John Musser
Founder, ProgrammableWeb

Adam DuVander
Executive Editor, ProgrammableWeb. Author, Map Scripting 101. Lover, APIs.