To trust or not to trust – How reliable are social analytics?

Rohini K. Srihari, chief scientist, SmartFocus.

Nowadays virtually every brand you’ve heard of is monitoring social media feedback. Some businesses rely on a manual approach, requiring staff to monitor multiple social media sites.

Others use automated ‘listening’ tools that track brand mentions and sentiment through interactive dashboards, says Rohini K. Srihari, Professor of Computer Science and Engineering at University of Buffalo and chief scientist at SmartFocus.

It’s an interesting observation that 78% of companies say they have dedicated social media teams, but only 26% integrate social media fully into their marketing strategies.

This shows a large gap between businesses

(i) recognising the importance of social media

(ii) having sufficient trust in the data to make business decisions.

As analytics start to go beyond simple measurement, e.g. counting brand mentions or increases in followers, trust is becoming increasingly important. In this article, I will look at the criteria for evaluating the reliability of social media analytics, particularly when this information is being used to tailor marketing campaigns or make other critical business decisions.

simple vs curated

What’s in a name?

One major issue concerns the accuracy of detecting brand mentions. Most companies who offer social media monitoring rely on two strategies for this:

(i) restrict to hash tagged mentions, a strategy that leads to high precision (finding only relevant mentions) at the cost of many missed mentions

(ii) unrestricted keyword search, an approach that could generate numerous false positives.

The graph above illustrates the ‘reach’ of Swatch during the month of April on Twitter. Reach is defined as the calibrated ratio of brand mentions with respect to the total number of posts over a given time period.  The graph plots the reach of ‘Simple Reach’ versus ‘Curated Reach’, where ‘Simple Reach’ is based on keyword mentions.  Content curation is the process of filtering potential brand mentions by requiring appropriate contextual clues (positive or negative). For example, a true mention of Swatch should contain some reference to watch, time, strap, or the activity of wearing.  A spike can be observed on April 24th for ‘Simple Reach’: this can be attributed to discussions regarding a new “colour swatch” released by a cosmetic brand. This is an example of misleading analytics caused by the lack of proper content curation. Brand names are quite often simple words (such as the detergent brand All) and some context checking is obviously required.

Do the numbers make sense?

The graph above also illustrates the need for informative metrics.  Some vendors are choosing to calibrate raw mention counts into a meaningful, normalized indexes. This type of index has the advantage of being relatively stable with respect to modest day-to-day fluctuations; significant changes are easily discernible. It facilitates comparison across brands, time intervals, different content sources, and across different demographics. The index should take into account several features such as share of voice, sentiment, sudden spikes, etc.  Recently there have been efforts to validate such metrics by attempting to correlate social media trends with hard data, such as the movement of Dow Jones industry indexes.  For most businesses, the ultimate validation is obtained when they see positive outcomes of marketing/advertising campaigns that can be attributed to strategy recommendations based on such analytics.

What about accuracy?

Of course, an index is only as good as the data that goes into it.  Apart from correctly tagging brand mentions, the accuracy of automatically added metadata, such as sentiment may be questioned.  Datasift, an aggregator of social media content claims a sentiment analysis accuracy of 70%.  While sentiment analysis accuracy will never reach human performance, it can still add valuable insight.  Sentiment analysis is best used to analyse trends in change in public perception, particularly sharp upticks or downturns.   More recently, some vendors have started capturing different nuances of sentiment, for example

(i) sentiment associated with customer service, product quality, price etc.

(ii) intensity. In other words, extremely positive or negative sentiment, both of which may require social outreach.  Apart from sentiment, accuracy issues can also arise when relying on demographics data such as age, gender and location.

All sources are not alike.

The first two content sources that come to mind when discussing social media analytics are twitter and Facebook. These are both similar in the sense that they are high-velocity, high volume sources and while they share similarities, they require different type of handling. Data on twitter is for the most part publicly viewable and accessible; Facebook data on the other hand, has numerous restrictions due to privacy and other policy restrictions.

A recent study by the Pew Research Group classified six types of communities observed in social media. Of the six, two are relevant to the topic of source selection:

(i) Tight Crowds, representing highly interconnected people discussing focused topics (including brands) in a conversational manner

(ii) Brand Clusters, a large disconnected group of people all independently describing their experiences and opinions.

The first group is reflected in sources such as review sites, specialised discussion forums, blogs, private Facebook pages, etc. It’s important to consider these sources for the quality of comments, as well as potential lead generation. The second group is reflected in twitter users: the volume of data here is useful in aggregate analytics such as share of media, sentiment, demographics etc.  In other words, different sources contribute to different analytics.

All Samples are not alike.

Depending on which content sources are selected, the next issue relates to sampling methodology. This applies to high-velocity, high-volume content sources, such as twitter, where processing the entire feed is prohibitively expensive, and sampling is necessary.

Marketers sometimes ask whether analytics are based on processing the entire twitter firehose.  It is not necessary to consume the entire firehose; a statistical sample of 10% of the firehose, known as the Decahose (about 50 million posts per day) is sufficient to reliably generate analytics. This broader pipeline permits discovery of socially trending phrases, emerging memes etc.

Other analytics vendors rely on data feeds generated through keyword searches.  For example, one could “pull” only those posts associated with a particular hashtag or keyword.  While this generates far less data, it does not permit discovery of trends.   When computing analytics based on demographics, sampling rates again poses an issue.  As an example, only 1% of twitter data is location stamped; for location-based analytics, it is necessary to ensure that sufficient samples have been obtained for the period of analysis.

I have presented several criteria for evaluating the reliability of social media analytics in this article.  This is not meant to be a check list for businesses when evaluating different vendors. But it is meant to raise awareness and call for more transparency in methodology used to generate analytics. Depending on the size of your business, and its capacity to tailor marketing and advertising strategies based on such information, the importance of these issues will vary. Larger enterprises that rely on daily or weekly analytics reports for business and marketing intelligence should obviously pay more attention to reliability issues.

A recent blog in the WSJ titled “Analytics and Big Data; the new Kale?” questioned whether analytics was just a passing fad that would soon be abandoned. The conclusion is that like kale, analytics has a nutritional value, but only if treated as a hard science rather than as a fad.  To that end, there is a need for well documented and justifiable methodology to promote confidence for customers who consume this data.  The real value of social media analytics may come when it is integrated with traditional, transactional business data including sales figures.

The author of this blog is Rohini K. Srihari is Professor of Computer Science and Engineering at University of Buffalo and chief scientist at SmartFocus.

RECENT ARTICLES

Semtech enhances global connectivity with NTN support in HL78 modules

Posted on: March 29, 2024

Semtech Corporation has announced the integration of non-terrestrial network (NTN) support into its HL series LPWA modules, specifically the HL7810 and HL7812. This significant advancement showcases a leap forward in enabling uninterrupted global connectivity even amidst the most challenging conditions.

Read more

Enhance EV charging performance with cellular connectivity

Posted on: March 28, 2024

Electric vehicles (EVs) are steadily growing their market share at the expense of internal combustion engine vehicles. The growth is fuelled by several factors. Perhaps most importantly, prices for EVs have started to drop as competition in the industry is intensifying. New players and models are emerging, prompting several established EV makers to lower their

Read more
FEATURED IoT STORIES

What is IoT? A Beginner’s Guide

Posted on: April 5, 2023

What is IoT? IoT, or the Internet of Things, refers to the connection of everyday objects, or “things,” to the internet, allowing them to collect, transmit, and share data. This interconnected network of devices transforms previously “dumb” objects, such as toasters or security cameras, into smart devices that can interact with each other and their

Read more

The IoT Adoption Boom – Everything You Need to Know

Posted on: September 28, 2022

In an age when we seem to go through technology boom after technology boom, it’s hard to imagine one sticking out. However, IoT adoption, or the Internet of Things adoption, is leading the charge to dominate the next decade’s discussion around business IT. Below, we’ll discuss the current boom, what’s driving it, where it’s going,

Read more

9 IoT applications that will change everything

Posted on: September 1, 2021

Whether you are a future-minded CEO, tech-driven CEO or IT leader, you’ve come across the term IoT before. It’s often used alongside superlatives regarding how it will revolutionize the way you work, play, and live. But is it just another buzzword, or is it the as-promised technological holy grail? The truth is that Internet of

Read more

Which IoT Platform 2021? IoT Now Enterprise Buyers’ Guide

Posted on: August 30, 2021

There are several different parts in a complete IoT solution, all of which must work together to get the result needed, write IoT Now Enterprise Buyers’ Guide – Which IoT Platform 2021? authors Robin Duke-Woolley, the CEO and Bill Ingle, a senior analyst, at Beecham Research. Figure 1 shows these parts and, although not all

Read more

CAT-M1 vs NB-IoT – examining the real differences

Posted on: June 21, 2021

As industry players look to provide the next generation of IoT connectivity, two different standards have emerged under release 13 of 3GPP – CAT-M1 and NB-IoT.

Read more

IoT and home automation: What does the future hold?

Posted on: June 10, 2020

Once a dream, home automation using iot is slowly but steadily becoming a part of daily lives around the world. In fact, it is believed that the global market for smart home automation will reach $40 billion by 2020.

Read more

5 challenges still facing the Internet of Things

Posted on: June 3, 2020

The Internet of Things (IoT) has quickly become a huge part of how people live, communicate and do business. All around the world, web-enabled devices are turning our world into a more switched-on place to live.

Read more