From Shopping Carts to Poisoned Names, Every Data Point Tells a Story

542471_10200508386412185_1177284541_nEvery so often, I’d like to profile someone who’s doing interesting things with data. Meet Hilary Parker of Etsy (yes, that’s her in the photo).

While at Strata & Hadoop World last week, I had the chance to attend Ignite, a pecha-kucha-like event in which speakers present one idea, on twenty slides, in five minutes (no pressure). One of my favorites was by Hilary Parker, a data analyst at Etsy and Ph.D. in biostatistics who spends her days trying to understand how people use Etsy, guiding experiments, consulting with development teams and generating new hypotheses for further investigation.

One example of the types of questions Parker tries to answer is whether new features are performing as expected (do they increase conversions?) or whether they are causing other, unanticipated outcomes. The goal, essentially, is to get at the root of the user’s behavior; how she’s interacting with the website, and whether that’s different from what the team expected. It requires an open mind and a lot of curiosity, perseverance and attention to detail, not to mention some serious statistical modeling skills.

For example, one of the metrics that ecommerce companies like to measure is average shopping cart value, compared to average order value. If your shopping cart value (let’s say $250) consistently exceeds your actual order value (let’s say $75), that means that items are being added to carts but are being removed before checkout. Why would that be?

One possible reason, Parker posits, could be that people are adding items to the cart to bookmark them for later viewing. Another one (and one that I am personally guilty of) is inadvertently adding the same item multiple times. Either way, the end result should be a user interface change; perhaps to add a way to bookmark items or, in my case, alert me that I am about to spend the equivalent of the national debt on a standing army of home appliances.

Hilary’s talk at Strata, intriguingly entitled “Hilary: The Most Poisoned Baby Name in US History,” documents her investigation into the popularity (or extreme lack thereof) of her given name.  As a seven-year-old in 1992, she suddenly found herself being teased by other children, who called her “Hillary Clinton.”  Later, in college, she Googled her name and came across a blog post that said that Hilary was the most poisoned baby name, meaning that it had been severely undermined by the unpopularity of the then First Lady.

So she got curious. Earlier this year, Parker decided to perform her own analysis using data from the Social Security Administration, which initially revealed that Hilary was, in fact, only the 6th most poisoned baby name. So I asked Hilary what made her suspicious that this wasn’t telling the whole story. Her answer: “the names, for one. It was a somewhat peculiar list.”

To wit: numbers one through five were, in order, Farrah, Dewey, Catina, Deneen and Khadija. So she decided to graph the data to see what was going on.  Once she could visualize the data, she says, “I saw a crazy pattern. I started Googling the names and seeing why they were popular.” You can probably guess why and when Farrah became popular. Khadija was a no-brainer for me, as I clearly remember Queen Latifah in that role on the sitcom “Living Single” from 1993-1998. The rest you’ll have to read for yourself on Hilary’s blog

But, says Parker, “‘Hilary’…was clearly different than these flash-in-the-pan names. The name was growing in popularity (albeit not monotonically) for years.” So she decided to re-run the analysis using only names that were in the top 1000 for more than 20 years, and updated the graph accordingly. Here’s what she found:


So, says Parker, “I can confidently say that, defining “poisoning” as the relative loss of popularity in a single year and controlling for fad names, “Hilary” is absolutely the most poisoned woman’s name in recorded history in the US.”

I love this experiment because it shows the value of following hunches. It also shows the beauty of visualization for large data sets.  As Parker says, “statistics is as much an art as it is a science.”

You can read her full analysis here.

Her slides from Ignite are here.

You can ask her about the roller derby photo yourself.

Find her at

Posted in Analytics, Data Science, Research, Uncategorized | Tagged , , , , , | 1 Comment

Social Data Intelligence: Survey Says

1352727_34572494Back in July, when we published Social Data Intelligence, we were curious to discover how organizations would rank themselves using the criteria in our maturity map. How many companies are in the “ad-hoc” stage? How many consider themselves to be   “formalized”? Who’s integrating social data with enterprise data? And has anyone reached the nirvana of the “holistic” category?

Several of my colleagues just completed Altimeter Group’s Digital Strategy Survey for Q313, and there are some interesting findings. The one I’d like to speak to today is in reference to social data maturity, because even as a self-reported finding it gives us some insight into organizations’ progress and aspirations when it comes to social data.

As a refresher, here’s the maturity map from Social Data Intelligence:

Maturity MapFig5 N2

No Judgment

I do want to emphasize that this is a “no judgment” maturity model: as I’ve said many times, the path to social data maturity is complex and rife with organizational and technical challenges. Each of these stages has value, from organizational learning in the first, to rigor about business outcomes in the second, to a more organization-wide view in the third, to scale in the fourth. They all have value and they all contribute–even if that contribution is hard-won–to organizational transformation around data.

Survey Says…

So, now that that’s out of the way, how did our survey respondents stack up?

Screen Shot 2013-10-23 at 11.02.58 AM

No big surprises here: the majority of companies we surveyed fall into the “ad-hoc” category, 29 percent into “formalized,” 11 percent into “integrated,” and five percent into “holistic.” To be honest, I want to drill into the self-reporting at the holistic stage, simply because the tools to facilitate scale (the key criterion) are still quite nascent. But that’s less important than the fact that, yes, we’re mostly learning how to do this and operationalize it–from a business, process and technical standpoint.

When I look at the chart above, I see two things coming up fast:

  1. A wall of blue water. Call it what you will: blue water, green field, but I’m speaking to companies daily–Ekho and Informatica most recently–who are tackling this integration challenge in different ways, seeking to facilitate the integration of social and other enterprise data and, most salient to business people, take a lot of the manual labor and interpretive squish out of the process. From a market perspective, expect more middleware players to articulate how they can become force multipliers in the social and big data universe.
  2. Social data makes strange bedfellows. I’ve worked in marketing organizations and I’ve worked in IT organizations, and I can tell you this: these communication challenges are nothing new. But now more than ever, IT and marketing need to find a common language to instill technical rigor into business planning, and business context into technology planning. IMO, there is no other option as social data, and other big data types, take up residence in enterprise organizations. We’ve heard a lot about the “consumerization of IT.” It works both ways: technology is driving business strategy too, and it has to be this way because of the complexity of the problems that need to be solved. There is no magic dashboard.

So this is why big data is so–that word again–disruptive. It really is changing organizational processes and decision-making and culture. The challenge, with apologies to Jimmie Dale Gilmore, is to decide whether you’re just the wave, or you’re the water.

Thanks to Jess Groopman, Christine Tran and the Altimeter team for fielding the research for the Digital Buyer Survey.

Posted in Analytics, Big Data, Predictive Analytics, Real-Time Enterprise, Social Analytics, Social Data, Social media, Uncategorized | Tagged , , , , | 2 Comments

The Emerging Social Data Ecosystem

1428874_36578555It’s Social Data Week, and I spent Monday at DataSift’s San Francisco conference. Like Big Boulder (which is produced by Gnip and is now entering its third year), Social Data Week is focused on the emerging dialogue around social data, its stakeholders, challenges, opportunities, use cases, best practices and, most critically, its emerging ecosystem.

To some degree, these recent conversations around social data remind me of food. (Stay with me; I have a point.) It’s hard to throw a rock in San Francisco these days without hitting a restaurant whose menu gives as much attention to its sources (Dirty Girl tomatoes, Star Route arugula, Point Reyes blue cheese) as it does to its preparation. And it’s in response to customer demand; today, many of us want to know where our food comes from, what’s in it, and, as importantly, what isn’t.

For business, the provenance of social data is becoming critically important because social media has proliferated across the enterprise.

Consider this. Social networks (Facebook, Twitter, Tumblr, LinkedIn, Pinterest, etc) collectively generate billions of interactions every day. The same goes for social software platforms such as Lithium and Jive that cater to specific customer or community groups. This is also true of enterprise collaboration platforms such as Chatter, Yammer and Socialcast.  The data generated can be a post, a tweet, a share, a like, a comment.  Some is structured, some is not.  It’s an enormous data set and it’s being created almost entirely outside the organization’s walls–and control.

Then there are social applications such as listening (Salesforce/Radian6, NetBase, Sysomos) social media management (Spredfast, Hootsuite, Sprinklr) and publishing platforms (Salesforce/Buddy Media, OfferPop, Wildfire/Google), who use that data to provide specific capabilities. And–as we saw in Social Data Intelligence–enterprise applications such as CRM, business intelligence, market research and others are endeavoring to integrate this social data to deliver better customer experience and make better-informed decisions. 

Here’s a very simple (and by no means exhaustive) representation of how the nascent ecosystem around social data is shaping up:

Screen Shot 2013-09-18 at 11.53.38 AM

Now that social data is becoming business-critical (hundreds of case studies by Altimeter and others illustrate this point), it must become enterprise-class.

This means that business people who rely on social data need to take a page from the sustainability movement and–for the health of their organization–make the effort to understand where their data comes from, and what that means for the quality of the downstream decisions it will inevitably inform.

Social Data Sources

This is a very basic summary of the various sources of social data. Note that here I’m talking only about the sources versus the tools that use social data, such as listening, publishing, engagement or analytics platforms.

Source Description Considerations
Directly from the social network, via public API API stands for “Application Program Interface.”  In the simplest terms, an API is a set of rules that govern the way software programs communicate with each other. In the social data world, this is important because it standardizes the way applications access data from social networks.For example, Twitter has a public API that enables developers to access approximately one percent of the Twitter “firehose” (every single tweet, delivered at or near real-time).

Facebook recently opened its public API to a small set of initial partners, but it is not widely available.

Pinterest does not as yet have a public API.

  • Not all social networks provide public API access, and the amount and type of data available varies among social networks.
  • Complexity of managing multiple data sources makes scalability a challenge
  • Limited data access may distort findings. For example, because the public Twitter API typically includes 1% of the full firehose of data, niche or B2B brands may not be able to detect sufficient conversation volume to make informed decisions.
  • Rules around API use change, which can make it challenging for developers to build and maintain apps that use that data.
Directly from the social network, via full firehose access Full firehose access directly from a social network delivers every single social post and action created on that platform. Not every social network offers full firehose access, and those who do admit it is rare and costly. For example, Twitter provides full firehose access to a limited group of partners, but cautions developers that it is hard to come by and quite expensive. For privacy reasons, the same would not be possible with Facebook (at least for data that is private or falls under the category of personally identifiable information, aka PII), so the samples are by nature quite different.
Via a social data platform or provider Key players: DataSift, Gnip, Topsy Labs.

These companies resell access to data from social networks. They support different sets of social networks, are built on different technologies and provide varying types and levels of data access. The most important distinction from public API and firehose access, however, is that they provide a level of consistency, as well as value-added services such as filtering, URL expansion and access to historical data.

  • Data quality and consistency
  • Single source of social data and standard formats reduce complexity
  • Breadth (soclal networks supported) and depth (public API versus firehose) of data sources. For example, Topsy is Twitter-only.
  • Type of data access provided: percentage or keyword-based, or both
  • Availability of historical as well as real-time data to enable time-based comparisons
  • Availability and fit of value-added and professional services
  • Ease of purchasing and working with the vendor versus with individual social networks
  • Cost and pricing model
Data/Screen Scraping A technique used to extract data from websites. Prone to error, not to mention potential ethical and legal repercussions. See this useful article in ReadWriteWeb and this useful article in the New York Law Journal for a fuller explanation of the tradeoffs and potential ramifications of data scraping.

The ecosystem around social data is just starting to take shape, and I’m happy to see the beginnings of an ecosystem, as well as critical industry conversation–with application developers, social networks, social data providers, data scientists, end users, academics, analysts and others–about this critical business asset. 

I’ll continue to think and write about the evolving social data landscape, so please feel free to comment, disagree, or add any additional perspective you have. I’ll also link to substantive discussions below, as I usually do.

Posted in Analytics, Big Data, Facebook, Social Analytics, Social Data, Social media measurement, Twitter, Uncategorized | 16 Comments

Facebook Opens its Data API to (Some) Media Partners

1415773_13945768Today, Facebook made an announcement that should interest anyone who uses social data. Effective immediately, Facebook will be opening its data API to a select group of media partners. According to today’s blog post, organizations that are part of this initial group will have two ways to gain access to Facebook data:

  1. From the public API, which, per Facebook, includes “only public posts from pages and profiles of those with ‘Follow’ turned on.”
  2. From the keyword API, which aggregates the total number of posts on a specific topic, and provides the ability to display “anonymous, aggregated results based on gender, age, and location.” This will include up to 12 days of historical data at launch.

This means that users would be able to see, for example, that among people who talk about Breaking Bad on Facebook, the majority are female, 35-55 and are based on the West Coast (still traumatized by last night’s episode). If this sounds familiar, Facebook has done this type of thing a few times before; with the Talk Meter around the Oscars, for example. The difference is that this time the company is making the data available to media partners directly.

Initial Implications

Right now, this announcement has limited direct impact beyond initial partners, although the company says it is “beginning discussions with other media partners and preferred marketing developers and will make it available to additional partners in the coming weeks.” As of today, media partners include the following:

  • CNN
  • NBC’s Today Show
  • BSkyB
  • Buzzfeed
  • The Guardian
  • Slate
  • Mass Relevance

While it’s logical that Facebook began this process with media companies (especially given that Twitter has the lead here, having cultivated relationships with news organizations for quite some time), this move will likely put pressure on Facebook to provide public API access to more organizations, and on other social networks (such as Pinterest, which as yet has no public API) to follow suit.

Another implication–one that I discuss in Social Data Intelligence–is that organizations that view social data as business-critical (clearly the media industry has made this leap) must now treat it as a strategic enterprise asset. This leads to a whole ‘nother conversation about where social data comes from, what’s important to know when sourcing it, and implications and caveats galore.  I’ll tackle this topic in more depth in future research.

As always, I welcome your thoughts. Please leave comments, and I’ll link to substantive discussions below.

Posted in Analytics, Big Data, Facebook, Listening, Sentiment Analysis, Social Analytics, Social Graph, Social media, Social media measurement, Twitter, Uncategorized | 2 Comments

Build Credibility by Prioritizing Your Social Data

9360056272_c5e4e76bc3_o (1)One of the most frequent challenges of analytics teams, particularly those who handle social data, is the ad-hoc nature of report requests. When a manager or colleague needs a report (a) in the next hour  (b) by the end of the day (c) rightthissecond, it can be awkward to explain which metrics are automated and easy to deliver, and which require hours of manual work to pull the relevant data from disparate sources, enter them into an Excel pivot table and analyze the results.  It can lead to frustration, crossed signals, inefficient use of resources and even employee turnover, as talented analysts leave for more strategic opportunities.

The most problematic outcome, however, is that the teams who best understand social data may have the least opportunity to share that knowledge. This is a huge risk for organizations, as interpreting social data is becoming a dependency for strategic decision-making.

It’s time for data analysts to claim a leadership role by facilitating the most important conversations about the value and tradeoffs of social media metrics and facilitating organizational alignment around where to focus first.

In Social Data Intelligence: Integrating Social and Enterprise Data for Competitive Advantage, we included a Metrics Scorecard, pictured above. So here’s my Wednesday gift to you: a Metrics Scorecard worksheet that you can download to help you start the conversation.

This is a sample only; use this worksheet to catalog your existing social data metrics, rate them, and determine which ones your organization is most able to deliver and which require additional criteria to be met before they can be adopted and shared. Keep in mind that sometimes the most challenging metrics to deliver may be most valuable, so it’s important  to balance efficiency and value. Whatever the results, you’ll be initiating a critical conversation for your organization.

Here’s how to use the scorecard.

  1. List the core set of metrics you would like to evaluate.
  2. Score them on a scale of one to five, where one is lowest and five is highest. Ideally, you’ll want to do this in collaboration with key stakeholders, so you get the most representative view possible.

Download the Altimeter Group Metrics Worksheet. I’m happy to answer questions in the comments, and please let us know how it’s going.

Posted in Altimeter, Analytics, Big Data, Research, Social Analytics, Social media, Social media measurement | Leave a comment

It’s Time To Get Smart About Social Data Intelligence

IMG_8332I spend a lot of time reading and thinking about social data: what it is, what it isn’t, how to measure it, where it’s going. But even the best strategy for collecting, analyzing and interpreting social data is just a set of pretty charts unless it is connected in a meaningful way to business objectives and, as importantly, actual business data.

In the past year, we’ve seen social media proliferate throughout the organization. The average enterprise-class company has 178 owned social media accounts. Thirteen different departments–from Marketing to Customer Support to Legal and HR–are actively engaged in social. At the same time, we’re seeing early momentum toward  integration of social and enterprise data.

  • Customer Relationship Management, Business Intelligence and market research providers are working to bring social into the mix, with varying results.
  • Analysts are viewing dashboards from social platforms side-by-side with other in-house reports to determine whether there are meaningful relationships.
  • Businesses are manually entering social data into their enterprise systems, whether to improve customer experience, brand reputation, financial performance or for other purposes.

They want to understand how signals from the outside–signals that the company didn’t solicit or structure–square with what they’re hearing from their own internal systems. A  real desire is emerging for  social data intelligence, which we define as follows:

Insight derived from social data that organizations can use confidently, at scale,
and in conjunction with other data sources to make strategic decisions. 

For this report, my colleague Jessica Groopman and I spoke with a range of companies: business to business, business-to-consumer, technology vendors and agencies–to see what they are doing to bring social data into the enterprise mainstream. While this is still a relatively nascent phenomenon, there are  promising results and similarities; Caesar’s Entertainment, Parasole and Symantec are all intent on using social data to understand and optimize the customer experience, although they’re taking different approaches to doing so.

Through this research, we found that social data has now become business-critical. To derive value and mitigate risk, organizations must treat it as a core enterprise asset. This requires strategy, discipline, governance and executive sponsorship. We also noted that organizations working with social data fall into four distinct stages of maturity, supported by six key dimensions:

Maturity MapFig5 N2

It’s also time to recognize that in developing a strategy for collecting, interpreting and acting on social data intelligence, organizations are taking the first steps toward developing a strategy for what is generally known as “big data.” After all, social data is a type of big data, as it fulfills the criteria in the widely-shared definition by Gartner Group.

Today, organizations have a choice to define a common approach to social data collection, institute processes to share and interpret it, and set clear criteria for action. They’re going to have to decide how proactive they wish to be and how much investment and organizational disruption they are willing to face.

But the downside of not developing a social intelligence strategy and instead taking  a laissez-faire approach is significant: to brand reputation, customer experience, risk avoidance and financial performance.  It’s a pay-me-now or pay-me-later proposition,  with real  advantages for the companies who begin this process sooner rather than later, and thus have the benefit of early learning.

You are welcome to download the full report here:

I’d like to thank Jessica Groopman, senior researcher and sanity-preserver extraordinaire, Charlene Li, a generous and rigorous editor, Susan Wu, who made all of the interviews (and many, many other things) happen with grace and a smile, and my other  colleagues at Altimeter Group who questioned my thinking, provided thoughtful input and ideas and who make me proud and grateful to be their colleague every day. And, of course, many thanks to the companies and individuals who contributed their experiences and ideas throughout this process. I value  you all immensely, and any errors are my own.

In the interest of continuing meaningful discussions of this topic, please feel free to send me the URL to your posts on this report and I will link to them below.

Image | Posted on by | 5 Comments

Collaborative Storytelling: A View from Startupfest 2013

WLong time no talk, but the good news is that I’ve been heads down on my latest research report, launching soon. Watch this space.

I had the opportunity to speak at Startupfest 2013 this morning on the topic of “Collaborative Storytelling.” Seemingly a little out of my usual wheelhouse, but actually quite relevant: the relationship between storytelling and data is important. The digital world makes our ability to build relationships with our community–whether they’re customers, investors, partners or prospects–all the more critical. We’re not in it for the quick transaction anymore, nor should we be. Everything we as organizations and individuals do or say needs to be relevant and/or useful to what our community is thinking and feeling in our connected, digital world.

So here is the talk. It’s a little more of a balance of art and science than usual, but borne out of a real passion for flipping the conversation about customer relationships. Think Galileo: customers don’t revolve around us. We revolve around them. And how do we know how to be relevant? By listening, by collecting data, by seeing what resonates and what doesn’t.

Here are the slides:

Posted in Uncategorized | 1 Comment