The Emerging Social Data Ecosystem

1428874_36578555It’s Social Data Week, and I spent Monday at DataSift’s San Francisco conference. Like Big Boulder (which is produced by Gnip and is now entering its third year), Social Data Week is focused on the emerging dialogue around social data, its stakeholders, challenges, opportunities, use cases, best practices and, most critically, its emerging ecosystem.

To some degree, these recent conversations around social data remind me of food. (Stay with me; I have a point.) It’s hard to throw a rock in San Francisco these days without hitting a restaurant whose menu gives as much attention to its sources (Dirty Girl tomatoes, Star Route arugula, Point Reyes blue cheese) as it does to its preparation. And it’s in response to customer demand; today, many of us want to know where our food comes from, what’s in it, and, as importantly, what isn’t.

For business, the provenance of social data is becoming critically important because social media has proliferated across the enterprise.

Consider this. Social networks (Facebook, Twitter, Tumblr, LinkedIn, Pinterest, etc) collectively generate billions of interactions every day. The same goes for social software platforms such as Lithium and Jive that cater to specific customer or community groups. This is also true of enterprise collaboration platforms such as Chatter, Yammer and Socialcast.  The data generated can be a post, a tweet, a share, a like, a comment.  Some is structured, some is not.  It’s an enormous data set and it’s being created almost entirely outside the organization’s walls–and control.

Then there are social applications such as listening (Salesforce/Radian6, NetBase, Sysomos) social media management (Spredfast, Hootsuite, Sprinklr) and publishing platforms (Salesforce/Buddy Media, OfferPop, Wildfire/Google), who use that data to provide specific capabilities. And–as we saw in Social Data Intelligence–enterprise applications such as CRM, business intelligence, market research and others are endeavoring to integrate this social data to deliver better customer experience and make better-informed decisions. 

Here’s a very simple (and by no means exhaustive) representation of how the nascent ecosystem around social data is shaping up:

Screen Shot 2013-09-18 at 11.53.38 AM

Now that social data is becoming business-critical (hundreds of case studies by Altimeter and others illustrate this point), it must become enterprise-class.

This means that business people who rely on social data need to take a page from the sustainability movement and–for the health of their organization–make the effort to understand where their data comes from, and what that means for the quality of the downstream decisions it will inevitably inform.

Social Data Sources

This is a very basic summary of the various sources of social data. Note that here I’m talking only about the sources versus the tools that use social data, such as listening, publishing, engagement or analytics platforms.

Source Description Considerations
Directly from the social network, via public API API stands for “Application Program Interface.”  In the simplest terms, an API is a set of rules that govern the way software programs communicate with each other. In the social data world, this is important because it standardizes the way applications access data from social networks.For example, Twitter has a public API that enables developers to access approximately one percent of the Twitter “firehose” (every single tweet, delivered at or near real-time).

Facebook recently opened its public API to a small set of initial partners, but it is not widely available.

Pinterest does not as yet have a public API.

  • Not all social networks provide public API access, and the amount and type of data available varies among social networks.
  • Complexity of managing multiple data sources makes scalability a challenge
  • Limited data access may distort findings. For example, because the public Twitter API typically includes 1% of the full firehose of data, niche or B2B brands may not be able to detect sufficient conversation volume to make informed decisions.
  • Rules around API use change, which can make it challenging for developers to build and maintain apps that use that data.
Directly from the social network, via full firehose access Full firehose access directly from a social network delivers every single social post and action created on that platform. Not every social network offers full firehose access, and those who do admit it is rare and costly. For example, Twitter provides full firehose access to a limited group of partners, but cautions developers that it is hard to come by and quite expensive. For privacy reasons, the same would not be possible with Facebook (at least for data that is private or falls under the category of personally identifiable information, aka PII), so the samples are by nature quite different.
Via a social data platform or provider Key players: DataSift, Gnip, Topsy Labs.

These companies resell access to data from social networks. They support different sets of social networks, are built on different technologies and provide varying types and levels of data access. The most important distinction from public API and firehose access, however, is that they provide a level of consistency, as well as value-added services such as filtering, URL expansion and access to historical data.

  • Data quality and consistency
  • Single source of social data and standard formats reduce complexity
  • Breadth (soclal networks supported) and depth (public API versus firehose) of data sources. For example, Topsy is Twitter-only.
  • Type of data access provided: percentage or keyword-based, or both
  • Availability of historical as well as real-time data to enable time-based comparisons
  • Availability and fit of value-added and professional services
  • Ease of purchasing and working with the vendor versus with individual social networks
  • Cost and pricing model
Data/Screen Scraping A technique used to extract data from websites. Prone to error, not to mention potential ethical and legal repercussions. See this useful article in ReadWriteWeb and this useful article in the New York Law Journal for a fuller explanation of the tradeoffs and potential ramifications of data scraping.

The ecosystem around social data is just starting to take shape, and I’m happy to see the beginnings of an ecosystem, as well as critical industry conversation–with application developers, social networks, social data providers, data scientists, end users, academics, analysts and others–about this critical business asset. 

I’ll continue to think and write about the evolving social data landscape, so please feel free to comment, disagree, or add any additional perspective you have. I’ll also link to substantive discussions below, as I usually do.

About susanetlinger

Industry Analyst at Altimeter Group
This entry was posted in Analytics, Big Data, Facebook, Social Analytics, Social Data, Social media measurement, Twitter, Uncategorized. Bookmark the permalink.

18 Responses to The Emerging Social Data Ecosystem

  1. Pingback: The Emerging Social Data Ecosystem | Social Bus...

  2. Hi Susan, to me this conversation feels like old news rather than an emerging one 🙂 Back in February last year, I wrote a post about data quality, specifically in relation to Radian6‎ But the points are applicable to the broader space.


    • Hi Matt, I’m not surprised you’d say that, given that you work in analytics. But this conversation is just getting started among business people in the enterprise, and there’s a lot of work to do to clarify the basics of what social data is, where it comes from and how organizations can use it for multiple purposes.


      • shannon says:

        Absolutely! This is a well organized piece to share with those whose sole job is not necessarily Social media or online engagement.


  3. Pingback: The Emerging Social Data Ecosystem | Big Data |...

  4. Rafa Ramirez says:

    Susan, your clear explanation gives me a broad view of what´s brewing inside de “firehose”: A paramount potential that may become ashes if it´s not selected properly. The “impulse” decision that it´s behind every twett is the key. Is it made as a plain “impulse” or “reaction”? or is it made as a “conscious” impulse?. Is there any way to know and unveil the disguise?


  5. Pingback: The Emerging Social Data Ecosystem | Social Med...

  6. Pingback: The Emerging Social Data Ecosystem | Social Med...

  7. Pingback: Twitter’s IPO: An Analysis of Opportunity And Threats

  8. Pingback: The Emerging Social Data Ecosystem | Designing ...

  9. Pingback: The Emerging Social Data Ecosystem | Fred Zimny's Serve4impact

  10. Pingback: The Emerging Social Data Ecosystem | Design d'&...

  11. Pingback: » Gnip verifies what services have genuine access to WordPress data firehose with new certified program

  12. Pingback: Gnip verifies what services have genuine access to WordPress data firehose with new certified program - IT Clips

  13. Pingback: Gnip verifies what services have genuine access to WordPress data firehose with new certified program | Alternative News Alert!

  14. Pingback: Launching Automattic’s Certified Products Program - Gnip Blog - Social Data and Data Science Blog

  15. Pingback: Questions to Ask Social Data Providers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s