It’s Social Data Week, and I spent Monday at DataSift’s San Francisco conference. Like Big Boulder (which is produced by Gnip and is now entering its third year), Social Data Week is focused on the emerging dialogue around social data, its stakeholders, challenges, opportunities, use cases, best practices and, most critically, its emerging ecosystem.
To some degree, these recent conversations around social data remind me of food. (Stay with me; I have a point.) It’s hard to throw a rock in San Francisco these days without hitting a restaurant whose menu gives as much attention to its sources (Dirty Girl tomatoes, Star Route arugula, Point Reyes blue cheese) as it does to its preparation. And it’s in response to customer demand; today, many of us want to know where our food comes from, what’s in it, and, as importantly, what isn’t.
For business, the provenance of social data is becoming critically important because social media has proliferated across the enterprise.
Consider this. Social networks (Facebook, Twitter, Tumblr, LinkedIn, Pinterest, etc) collectively generate billions of interactions every day. The same goes for social software platforms such as Lithium and Jive that cater to specific customer or community groups. This is also true of enterprise collaboration platforms such as Chatter, Yammer and Socialcast. The data generated can be a post, a tweet, a share, a like, a comment. Some is structured, some is not. It’s an enormous data set and it’s being created almost entirely outside the organization’s walls–and control.
Then there are social applications such as listening (Salesforce/Radian6, NetBase, Sysomos) social media management (Spredfast, Hootsuite, Sprinklr) and publishing platforms (Salesforce/Buddy Media, OfferPop, Wildfire/Google), who use that data to provide specific capabilities. And–as we saw in Social Data Intelligence–enterprise applications such as CRM, business intelligence, market research and others are endeavoring to integrate this social data to deliver better customer experience and make better-informed decisions.
Here’s a very simple (and by no means exhaustive) representation of how the nascent ecosystem around social data is shaping up:
Now that social data is becoming business-critical (hundreds of case studies by Altimeter and others illustrate this point), it must become enterprise-class.
This means that business people who rely on social data need to take a page from the sustainability movement and–for the health of their organization–make the effort to understand where their data comes from, and what that means for the quality of the downstream decisions it will inevitably inform.
Social Data Sources
This is a very basic summary of the various sources of social data. Note that here I’m talking only about the sources versus the tools that use social data, such as listening, publishing, engagement or analytics platforms.
|Directly from the social network, via public API||API stands for “Application Program Interface.” In the simplest terms, an API is a set of rules that govern the way software programs communicate with each other. In the social data world, this is important because it standardizes the way applications access data from social networks.For example, Twitter has a public API that enables developers to access approximately one percent of the Twitter “firehose” (every single tweet, delivered at or near real-time).
Facebook recently opened its public API to a small set of initial partners, but it is not widely available.
Pinterest does not as yet have a public API.
|Directly from the social network, via full firehose access||Full firehose access directly from a social network delivers every single social post and action created on that platform.||Not every social network offers full firehose access, and those who do admit it is rare and costly. For example, Twitter provides full firehose access to a limited group of partners, but cautions developers that it is hard to come by and quite expensive. For privacy reasons, the same would not be possible with Facebook (at least for data that is private or falls under the category of personally identifiable information, aka PII), so the samples are by nature quite different.|
|Via a social data platform or provider||Key players: DataSift, Gnip, Topsy Labs.
These companies resell access to data from social networks. They support different sets of social networks, are built on different technologies and provide varying types and levels of data access. The most important distinction from public API and firehose access, however, is that they provide a level of consistency, as well as value-added services such as filtering, URL expansion and access to historical data.
|Data/Screen Scraping||A technique used to extract data from websites.||Prone to error, not to mention potential ethical and legal repercussions. See this useful article in ReadWriteWeb and this useful article in the New York Law Journal for a fuller explanation of the tradeoffs and potential ramifications of data scraping.|
The ecosystem around social data is just starting to take shape, and I’m happy to see the beginnings of an ecosystem, as well as critical industry conversation–with application developers, social networks, social data providers, data scientists, end users, academics, analysts and others–about this critical business asset.
I’ll continue to think and write about the evolving social data landscape, so please feel free to comment, disagree, or add any additional perspective you have. I’ll also link to substantive discussions below, as I usually do.