In business, we spend a lot of time trying to understand social data and other forms of big data for strategic advantage: to make business decisions, increase revenue, reduce costs, improve relationships with customers and spur innovation.
But what about other uses of social data—for example, for first responders? Human rights? Cancer prevention? The Health Media Collaboratory at the University of Illinois Institute for Health Research and Policy is focused on understanding social data for the public good. The team, led by Principal Investigator Sherry Emery, includes Deputy Director Glen Szczypka, Social Media Strategist Eman Aly, and others focused on using social data to “positively impact the health behavior of individuals and communities.”
In the broadest sense, the mission of the Health Media Collaboratory (HMC) is to develop and propagate a new paradigm for health media research that takes into account the massive disruption in media consumption that the Internet—and social media—has enabled. More specifically, they’re looking at how people talk about quitting smoking on Twitter, and what they and the Center for Disease Control (CDC) can learn about how to promote behavior change.
Recently, HMC engaged with the CDC on a project to pose two questions about the impact, if any, of social data on smoking cessation. Their initial research questions were: How much electronic cigarette promotion is there on Twitter? And how much ‘organic’ conversation about electronic cigarettes is there on Twitter? (Short answer: A lot.) Subsequently, they looked at whether Twitter could be used as a tool to evaluate the efficacy of health-oriented campaigns, and discovered that yes, it fact, it can. (I highly recommend you take a look at these studies here.)
I spoke recently with Sherry Emery and Eman Aly on what they’re doing, and what we as an industry can learn from academia about effective and ethical use of social data. Here are five lessons you can start integrating now:
1. Have a relentlessly clear objective
In business, we talk a lot about ROI and trying to attribute conversion to social posts, but the real objective—behavior change—is fundamentally similar to what Emery and her team are trying to achieve. Says Emery, “You can’t just skim the top off social data; we really have to dig deep and get immersed and comfortable in the data, then put it into our system of analysis to find ways to support behavioral changes.”
I’d argue that this is an important distinction for anyone working with any kind of data for any objective: do we want awareness? Or are we trying to influence behavior in some way, whether it is to share, sign up for, comment on or buy something? Despite recent advances in conversion attribution, we often have a very incomplete picture of how a specific post (for example, on Facebook) may have influenced a specific action (for example, buying a new washing machine) and, ultimately, future behavior related to the brand. But we can more easily see some of the other indications of awareness, preference and advocacy via social signals.
2. Develop your methodology as if everyone can see it (and will use it)
Even though, as Emery says, “Behavior change is a lot squishier than product purchase,” her team’s research must still stand up to the most rigorous academic standards. Unlike the business world, in which work products and processes are typically only shared internally, the research of the HMC and similar groups is open to the highest levels of scrutiny among academic publications and journals. “Everything we do has to be replicable,” says Eman Aly. “In the corporate world, that is proprietary information. So we have to make sure that it’s possible.”
3. Know and disclose the limits of your dataset
As anyone who’s worked with social data and other forms of big data knows, there are limits to what is ultimately knowable and provable. Human language is complicated and ever changing. You must measure and disclose the limits of the data. Ask yourself: are the conclusions we’re drawing valid inferences to make based on the quality and reliability of the data?
HMC must be extremely rigorous in the way they analyze metadata, extract relevant tweets and determine the extent to which what they are analyzing is what they want. For example, how much of the conversation about smoking is spam, how much is off-topic (“smoking ribs,” “smoking hot girls”), and how much is relevant (“I’ve really got to quit smoking cigarettes”).
They also need to know what part of the overall conversation they’re capturing. Do they have information about regions, demographics? And, most fundamentally, are the conclusions they’re drawing valid? All of this must be transparently disclosed.
4. Context is everything
Social data doesn’t live in a vacuum. To understand behavior, HMC—any data scientists—must take into account external factors that may contribute to a particular outcome. In the case of cigarette smoking, this would include laws about who can sell cigarettes, how they may be advertised, applicable taxes, ad campaigns sponsored by tobacco companies or the CDC, and other perhaps unknowable factors. “The challenge,” says Emery, “is feeling confident about the links we draw from media and behavior.”
5. Have a bias for continuous learning
While the CDC campaign performed very well, Emery’s team learned quite a bit about what strategies to use to optimize future campaigns. Especially with such a broad topic, a specific hashtag on the ad, crisp, searchable and memorable branding and calls to action to share experiences can help amplify the message. At the same time, it became critical here—as it is for brands—to understand how much organic versus paid conversation was occurring, and whether there were indications that new people, ones who maybe hadn’t discussed this topic online before–were joining the conversation.
As a result, the team began working with computer scientists to develop classifiers to eliminate spam and distinguish organic from promotional conversation. The most important part here—no matter where you begin or end—is the understanding that you will learn something through the process, and that something will help improve the quality of your results the next time.
I can’t stress highly enough how important it is for anyone—not only academics but all of us working with large and complex data sets—to balance two seemingly conflicting but complementary dynamics: to dig deep, to accept the limits of the data, and to approach the science of social and other forms of big data with an appetite for continuous learning: whether the goal is to sell a pair of shoes or reduce help prevent cancer.
Thank you so much to Sherry, Eman, Glen and the HMC team for sharing their research and insights with us.