TED Talk: What do we do with all this big data?

Screen Shot 2014-10-20 at 8.16.21 AMOn September 23, 2014, I had the honor of presenting a talk on Big Data at TED@IBM at the SFJazz Center in San Francisco, California alongside an amazing group of people, including Altimeter’s own Charlene Li.

It was an incredible day; I walked away with so many new ideas and questions about data, society, technology, culture and where we’re headed. Today, the team at TED posted my talk on TED.com. I am beyond honored and grateful, particularly to Juliet Blake and Anna Enerio at TED, and Michela Stribling and Jacqueline Saenz at IBM for coaching me through what I can only call a transformative experience.

This talk comes from my head and my heart, and I hope you enjoy it.



Posted in Altimeter, Analytics, Big Data, Ethics, Twitter, Uncategorized | Tagged , , , , , , , , , , | 2 Comments

Data Everywhere: Lessons from Big Data in the TV Industry

1440150_38509981During the past several years, the television industry has changed dramatically, spurred by device proliferation, changing distribution methods, and the increasing popularity of social media.

Today, TV is everywhere. It’s on your phone, your tablet, your gaming console and someday will be on devices that are yet to be invented. It’s non-linear, time-shifted, multi-screen, and it’s creating new streams of digital data that were unimaginable even a few short years ago.

For this new research report: Data Everywhere: Lessons From Big Data in the Television Industry, Altimeter Group interviewed television brands, technology innovators and industry thought leaders to better understand industry drivers, new consumer behaviors and the data impacts of these shifts.

We looked at how TV is changing, and how new streams of data are transforming the business, from programming and distribution decisions to promotion and ratings:

  • Programming: ideation or validation of a programming decision;
  • Distribution: where to distribute content, whether it is syndicated entertainment or other types of owned media;
  • Promotion: How and where to identify influencers and develop, time, promote, and target content; and
  • Ratings and Performance Evaluation: New and augmented performance insight.

Fig1e3 (2)We identified emerging best practices from industry leaders, and lay out the data sources that inform their strategies, and incorporated examples from the media and music industries, as well as many from TV.

While TV is a unique industry in many ways, many of the lessons learned about the challenges and opportunities to extract insight and take action are universal:

  • The delicate balance of data and creativity in programming;
  • The role of data in ideating or validating product decisions;
  • The many and complex facets of content strategy;
  • How data can be used to acquire audiences, target ads, map influence and inform many other aspects of marketing strategy; and
  • How we gauge the performance of all of these strategies as they support business objectives.

As with all Altimeter Group research, Data Everywhere: Lessons From Big Data in the Television Industry is available at no cost under Creative Commons. Please feel free to read and share it, and please let us know your reactions, as well as how these lessons apply to your own organization.



Posted in Altimeter, Big Data, Data Science, Facebook, HBO, Netflix, Predictive Analytics, Real-Time Enterprise, Research, Social Analytics, Social Data, Social media, Television, Uncategorized | Tagged , , , , , , , , , , , | Leave a comment

Facebook’s “Emotional Contagion” Experiment: Was it Ethical?


Update, June 29: Co-author Adam D. I. Kramer posts a response here.

By now, you’ve probably heard that data scientists at Facebook recently published a study in The Proceedings of the National Academy of Science revealing that, in their words, “emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness.” Or, to be blunt, seeing more negative stories on Facebook can make you sad.

Multiple news outlets covered the results, which broke a couple of weeks ago, but in the last day the focus has shifted to the methodology of the study, revealing that:

  • 689,003 Facebook accounts were used for the study
  • The researchers “manipulated the extent to which people…were exposed to emotional expressions in their News Feed.”
  • According to the study, “No text was seen by the researchers. As such, it was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.”

I’m not going to focus too much on the ethics and quality of the science here–others are ably doing that (see links below)–but I do want to speak to the way in which user data was used, and the problematic precedent that sets for the ethical use of social data in general.

In the proposed Code of Ethics that the Big Boulder Initiative has drafted (still open to feedback before we finalize), we laid out four specific mandates for social data use: Privacy; Transparency and Methodology; Education; and Accountability.


While the experiment aggregated data such that researchers could not identify individual posts, it breaches users’ expectations about how such data will be used. The Big Boulder Initiative draft Code of Ethics states that, “in addition to honoring explicit privacy settings, organizations should do their best to honor implicit privacy preferences where possible.”

In the section of the Facebook privacy page entitled “Information We Receive and How it is Used,” however, Facebook focuses primarily on the advertising uses of social data, with the exception of a brief bullet point at the end, which states that Facebook may use data:

“…for internal operations, including troubleshooting, data analysis, testing, research and service improvement.”

While the word “research” is there in black and white, there is no description of the nature of any potential research, which raises an important point related to privacy; ethical use should anticipate not only the implicit (downstream) implications related to an explicit privacy setting, but a reasonable user’s expectations as well.

Says Sherry Emery, Principal Investigator at the Health Media Collaboratory, who works regularly with social data, “the fact that the researchers justified their use of the data by saying that it complied with the ‘terms of use’ highlights how ineffective–useless, even–the ‘terms of use’ agreement is.”

Transparency and Methodology

The experiment relies on Facebook’s Data Use Policy to argue for transparency, but, says Emery, “It’s one thing to observe and make inferences about human behavior in a ‘naturalistic setting’.  It’s another to manipulate subjects without their knowledge.”

Part of the challenge is that the research study raises questions about the proper use of social data within the social and behavioral sciences. But, while social data is relatively new, social science is not. The National Science Foundation commentary on informed consent provides a clear guideline:

“IRBs [Institutional Review Boards] and researchers should not defeat the purpose of informed consent by substituting a legalistic consent form for an effective communication process.” (Informed Consent in Social and Behavioral Science)

And the study has wider implications. We have to ask how, ultimately, these findings may be used. Does this set a precedent to use Facebook or other data to manipulate individuals’ emotional states for commercial or other purposes via “contagion”? (That term is really not helping, btw).

Whatever our personal standards for ethical use of data in general, the fact remains that social data is new and complex, and it carries with it a slew of implications that we are only just beginning to understand. “If this doesn’t spark a huge debate about data ethics,” Emery says, “I’ll be surprised.  I’ve been waiting, a little bit worried, for public outcry about data science if we didn’t get out ahead of the curve and establish guidelines for ethics in data research.  I think this might be the thing that starts the debate–with a bang.”

Please leave your thoughts here, and contribute to the Code of Ethics; the more specific and evidence-based, the better. I will link to substantive related posts below.

Last updated 8:50 PM June 24

Posted in behavior, Big Boulder Initiative, Big Data, Data Science, Ethics, Facebook, Research, Social Data, social data ethics, Social media, Uncategorized | Tagged , , , , , , , , , | 10 Comments

Social Data for Social Good: Five Lessons from Academia

DSC_0368-450x300In business, we spend a lot of time trying to understand social data and other forms of big data for strategic advantage: to make business decisions, increase revenue, reduce costs, improve relationships with customers and spur innovation.

But what about other uses of social data—for example, for first responders? Human rights? Cancer prevention? The Health Media Collaboratory at the University of Illinois Institute for Health Research and Policy is focused on understanding social data for the public good. The team, led by Principal Investigator Sherry Emery, includes Deputy Director Glen Szczypka, Social Media Strategist Eman Aly, and others focused on using social data to “positively impact the health behavior of individuals and communities.”

In the broadest sense, the mission of the Health Media Collaboratory (HMC) is to develop and propagate a new paradigm for health media research that takes into account the massive disruption in media consumption that the Internet—and social media—has enabled. More specifically, they’re looking at how people talk about quitting smoking on Twitter, and what they and the Center for Disease Control (CDC) can learn about how to promote behavior change.

Recently, HMC engaged with the CDC on a project to pose two questions about the impact, if any, of social data on smoking cessation. Their initial research questions were: How much electronic cigarette promotion is there on Twitter? And how much ‘organic’ conversation about electronic cigarettes is there on Twitter? (Short answer: A lot.) Subsequently, they looked at whether Twitter could be used as a tool to evaluate the efficacy of health-oriented campaigns, and discovered that yes, it fact, it can. (I highly recommend you take a look at these studies here.)

I spoke recently with Sherry Emery and Eman Aly on what they’re doing, and what we as an industry can learn from academia about effective and ethical use of social data. Here are five lessons you can start integrating now:

1. Have a relentlessly clear objective 
In business, we talk a lot about ROI and trying to attribute conversion to social posts, but the real objective—behavior change—is fundamentally similar to what Emery and her team are trying to achieve.  Says Emery, “You can’t just skim the top off social data; we really have to dig deep and get immersed and comfortable in the data, then put it into our system of analysis to find ways to support behavioral changes.”

I’d argue that this is an important distinction for anyone working with any kind of data for any objective: do we want awareness? Or are we trying to influence behavior in some way, whether it is to share, sign up for, comment on or buy something? Despite recent advances in conversion attribution, we often have a very incomplete picture of how a specific post (for example, on Facebook) may have influenced a specific action (for example, buying a new washing machine) and, ultimately, future behavior related to the brand. But we can more easily see some of the other indications of awareness, preference and advocacy via social signals.

2. Develop your methodology as if everyone can see it (and will use it)
Even though, as Emery says, “Behavior change is a lot squishier than product purchase,” her team’s research must still stand up to the most rigorous academic standards. Unlike the business world, in which work products and processes are typically only shared internally, the research of the HMC and similar groups is open to the highest levels of scrutiny among academic publications and journals.  “Everything we do has to be replicable,” says Eman Aly. “In the corporate world, that is proprietary information. So we have to make sure that it’s possible.”

3. Know and disclose the limits of your dataset
As anyone who’s worked with social data and other forms of big data knows, there are limits to what is ultimately knowable and provable. Human language is complicated and ever changing. You must measure and disclose the limits of the data. Ask yourself: are the conclusions we’re drawing valid inferences to make based on the quality and reliability of the data?

HMC must be extremely rigorous in the way they analyze metadata, extract relevant tweets and determine the extent to which what they are analyzing is what they want. For example, how much of the conversation about smoking is spam, how much is off-topic (“smoking ribs,” “smoking hot girls”), and how much is relevant (“I’ve really got to quit smoking cigarettes”).

They also need to know what part of the overall conversation they’re capturing. Do they have information about regions, demographics? And, most fundamentally, are the conclusions they’re drawing valid? All of this must be transparently disclosed.

Screen Shot 2014-05-12 at 11.12.54 AM 1

4. Context is everything
Social data doesn’t live in a vacuum. To understand behavior, HMC—any data scientists—must take into account external factors that may contribute to a particular outcome. In the case of cigarette smoking, this would include laws about who can sell cigarettes, how they may be advertised, applicable taxes, ad campaigns sponsored by tobacco companies or the CDC, and other perhaps unknowable factors. “The challenge,” says Emery, “is feeling confident about the links we draw from media and behavior.”

5. Have a bias for continuous learning
While the CDC campaign performed very well, Emery’s team learned quite a bit about what strategies to use to optimize future campaigns. Especially with such a broad topic, a specific hashtag on the ad, crisp, searchable and memorable branding and calls to action to share experiences can help amplify the message. At the same time, it became critical here—as it is for brands—to understand how much organic versus paid conversation was occurring, and whether there were indications that new people, ones who maybe hadn’t discussed this topic online before–were joining the conversation.

As a result, the team began working with computer scientists to develop classifiers to eliminate spam and distinguish organic from promotional conversation. The most important part here—no matter where you begin or end—is the understanding that you will learn something through the process, and that something will help improve the quality of your results the next time.

I can’t stress highly enough how important it is for anyone—not only academics but all of us working with large and complex data sets—to balance two seemingly conflicting but complementary dynamics: to dig deep, to accept the limits of the data, and to approach the science of social and other forms of big data with an appetite for continuous learning: whether the goal is to sell a pair of shoes or reduce help prevent cancer.

Thank you so much to Sherry, Eman, Glen and the HMC team for sharing their research and insights with us.

Posted in Analytics, behavior, Big Data, Data Science, Research, Sentiment Analysis, Social Data, Social media, Social media measurement, Uncategorized | 2 Comments

Twitter Buys Gnip: What’s Next for the Social Data Ecosystem?

Twitter_logo_blueToday, Twitter announced its intent to purchase social data provider Gnip, one of its certified reseller partners, for an undisclosed sum. It’s not a surprising move in many ways, given the increasing pressure on Twitter to monetize its content. At the same time, the purchase raises important questions about the future of social data: for Twitter, for other social platforms, and for the organizations that use it.

Here is a quick breakdown of how the acquisition could play out; for Twitter and Gnip certainly, but also for competitors, partners and customers.


The Good

  • While relatively small at $47M or 10% of revenue, data has become a meaningful revenue source for Twitter.
  • Gives Twitter more control over the ways in which they can monetize their content.
  • More direct ways to sell advertising. From Twitter: “Together we plan to offer more sophisticated data sets and better data enrichments, so that even more developers and businesses big and small around the world can drive innovation using the unique content that is shared on Twitter.”
  • Acquires a number of social data-savvy engineers.
  • Validates the value of social data.

The Bad

  • A lot of work ahead of Twitter and Gnip to provide the level of technology and data enrichment that DataSift currently offers.
  • Potential to alienate Gnip’s other social platform customers and thereby constrict the value of Gnip as a trusted social data source.

The (Potentially) Ugly

  • If Twitter increases cost structure but is unable to deliver on its revenue generation goals.


The Good

  • Potentially, the ability to focus on providing deeper level of value with social data, given Twitter’s deeper pockets and vested interest in monetizing its data.
  • Per Gnip blog post: “We’ll be able to support a broader set of use cases across a diverse set of users including brands, universities, agencies, and developers big and small. “

The Bad

  • Loses its neutrality and potentially its ability to engender trust; goes from being “The World’s Largest and Most Trusted Provider of Social Data” to being the world’s largest and most trusted provider of Twitter data.

The (Potentially) Ugly

  • Safe bet many customers’ legal teams will be taking a look at their change of control clauses today.


The Good

  • Only freestanding multi-platform player in the market, now that Topsy (which was Twitter-only) is part of Apple and Gnip is owned by Twitter.
  • Strong platform and enrichments; Gnip will have quite a bit of engineering work to do to catch up.
  • Use cases requiring ability to view multiple data sets in context (including mapping social to enterprise data) currently favors DataSift.
  • Validates the value of social data.

The Bad

  • Risk (shared with NTT Data of Japan) of being disenfranchised by Twitter.
  • Other than NTT (important in Japan but relatively unknown in US), last man standing in the “social data reseller” market.
  • Pricing pressure as Twitter/Gnip will be holding the cards on Twitter data; enrichments and value-add will have to be far superior to justify cost.

The (Potentially) Ugly

Customers of Gnip (social technology companies)

The Good

  • Potentially better access and support for Twitter data; better services, more support and a clearer and more secure roadmap.

The Bad

  • Potentially reduced access and support for non-Twitter data.
  • Will be hard to trust services, support and roadmap for non-Twitter data.
  • Need to diversify data streams to mitigate risk; at the same time, if they buy Twitter data from Twitter/Gnip and other social data from DataSift, they are potentially left with inconsistent data sets.

The (Potentially) Ugly

  • Reduces the outlook for a truly open ecosystem of social data, potentially constraining competition and ability to innovate.

Customers of Gnip (Enterprise, agency, systems integrators)

The Good

  • Reduces complexity with regard to sourcing Twitter data; fewer players to deal with, more support and a clearer and more secure roadmap.

The Bad

  • Need to diversify data streams to mitigate risk; at the same time, if they buy Twitter data from Twitter/Gnip and other data from DataSift, they are potentially left with inconsistent data sets
  • No services—at least not yet—that enable users to view Twitter in context of other data.

The (Potentially) Ugly

  • Social data fragmentation could lead to increased complexity, missed opportunity, risk and lack of insight.

Social Platforms (Facebook, Instagram, Tumblr, etc).

The Good

  • Could provide useful insight into monetizing their own social data.

The Bad

  • Hard to justify exposing their data to a competitor.

The (Potentially) Ugly

  • More fragmentation & incoherence, confusing the market, leading to inability to convey strong value proposition and ROI to business.

Where to go from here

Right now, we’re fairly limited in what we can know about Twitter/Gnip’s plans, so a clear set of recommendations by audience would be premature. Nonetheless, I’ve laid out some possible scenarios based on currently available information. As we gain more insight into integration and roadmap plans, I’ll continue to post on this topic. In the meantime, I recommend you do your own scenario planning, knowing that you have incomplete information at this point.

Thoughts? Agreements? Disagreements? Questions? As always, I’d love to hear from you.

[Update 4/16: added “where to go” paragraph; reformatted to paragraphs rather than table.]

Posted in Big Data, DataSift, Gnip, Innovation, NTT, Social Data, Social Graph, Social media, Topsy, Twitter, Uncategorized | 2 Comments

New Report, New Service Offerings for Social Data

Screen Shot 2014-03-17 at 8.13.38 PM 1Late last year, I started wondering about social media command centers. Salesforce had launched one, as had Brandwatch, but I wondered: were they really still relevant? Were companies investing in command center deployments, or had interest subsided since their heyday in 2010?

I started talking to clients and vendors to take a pulse on how people were thinking about command centers, what they looked like, what were the use cases, and how they were calculating value. I looked at several deployments: some were “pop-ups,” intended to support conversation during sporting or other events, and some were day-to-day operations.

I decided to dig a little deeper. Soon, Wells Fargo, MasterCard and eBay agreed to speak with me about what they were doing and connect me with the social technology companies powering their deployments. I subsequently spoke with Dell, arguably the pioneer of the command center concept, about their original deployment, and their current service offering. The result is the report embedded below. It’s not intended to be a buyer’s guide or technology evaluation; rather, the intent is to lay out the most salient use cases and provide an inside view into how three leading brands are approaching social data in the enterprise:

At the same time, I began working with my colleague Jess Groopman on an Altimeter Group service offering based on our Social Data Intelligence report from last summer. Like Altimeter’s Social Readiness Roadmap, the Social Data Intelligence Roadmap is a diagnostic tool to help enterprise organizations evaluate their readiness (in this case, related to  social and other digital data), and use that to plan roadmaps, resources and investments.

Fortuitously, the two projects converged into the two announcements we’re making today: one, a new report on the social media command centers, and its potential to become a digital intelligence hub for the enterprise. The other, a diagnostic tool that illuminates the issues–regarding scope, context, strategy, governance, metrics and data–that large organizations must address to extract insight from social and other digital data at scale.

My deepest thanks to everyone who contributed ideas and insights for this report; any mistakes are mine alone. I hope you’ll find these tools useful, provide feedback on them and share them with others in the industry.

We will be hosting a webinar on this new report and offering on April 3 at 10:00 am PST. To register, please click here.

Box link to PDF.

To speak with Altimeter about the Social Data Intelligence offering, click here.

We are happy to cross-link to discussions on this report and service offering below.

Hootsuite posts on the report here.

Posted in Uncategorized | 1 Comment