Twitter produces a vast amount of data, with estimates that it’s 200 million or so users are producing over 500 million updates every day. Such a treasure trove of real time data is incredibly attractive to researchers hoping to mine it for trends on everything from the spread of flu to the potential for box office success.
Last year the site announced a partnership with Gnip, an organization that touts themselves as the source for open social data, whereby select research institutions would be granted access to Twitter data from 2006 to the present day.
So what kind of insights could be gleaned from the site? A couple of recent studies highlight the breath of understanding we can gain from trawling the Twittersphere.
The first, from researchers at Penn Medicine, looked at whether our tweets could help to predict enrollment levels in Obamacare.
It emerged that the greater the frequency of tweets could be linked to an increase in enrollment.
Nearly 1 million Obamacare related tweets were analyzed during March 2014 to compare state level activity on Twitter and enrollment rates in the state.
The tweets were also measured for sentiment using the National Research Council sentiment lexicon.
The data revealed that a .10 increase in the positive sentiment of tweets was linked to a 9 percent rise in healthcare enrollment in any given state.
“The correlation between Twitter sentiment and the number of eligible individuals who enrolled in a marketplace plan highlights the potential for Twitter to be a real-time monitoring strategy for future enrollment periods,” the authors explain.
Spotting terrorists early on
Whilst that’s really interesting, a potentially more life saving project has been highlighted by a recent study into the Twitter behavior of people prior to joining the Islamic State terrorist organization.
The researchers looked at a whole host of tweets in Arabic around IS from both a positive and negative standpoint to try and understand what distinguishes each group, and indeed what they have in common.
They were then able to look back through the tweet history of each group to see if there was anything in their pre-IS content that might mark them out as someone who would support or oppose them later on.
Over 3 million tweets were analyzed over a 3 month period at the end of 2014. Of the 250,000 users examined, some 165,000 had accounts pre-dating IS times.
The analysis revealed that supportive tweets for IS nearly always used the full name of Islamic State, whereas opposing tweets used the abbreviated term.
After identifying supporters and opponents of the group, they set out to explore any trends in the data.
“Anti-ISIS tweets generally peaked when news of ISIS human rights violations emerged such as the killing of hostage, accounts of torture, or reports of the enslavement of Yazidi women,” the authors say. “On the other hand, pro-ISIS tweets generally peaked in conjunction with the release of propaganda videos and major military achievements.”
The researchers then used 7,000 or so IS supporters to train a machine learning algorithm to try and spot whether someone will become a supporter or not.
Interestingly, it emerged that the hashtags people used were a good indicator of potential support (or opposition).
“Looking at discriminating hashtags suggested that a major source of support for ISIS stems from frustration with the missteps of the Arab Spring,” the authors say. “As for opposition to ISIS, it is linked with support for other rebel groups, mostly in Syria, that have been targeted by ISIS, support for existing Middle Eastern regimes, and Shia sectarianism.”
At the Social Media In Law Enforcement event last year, a presentation was given on the use of analysis like this in policing, and whilst there are risks of Minority Support style thought police, it seems inevitable that law enforcement agencies will use social data together with artificial intelligence to try and spot potentially negative events before they occur.
Whether it’s predicting health insurance enrollment, or predicting terrorist support, it’s clear that social media data analysis will be an increasingly prominent trend in the coming years, and certainly an area to watch with interest.
Kinda scary. There are so many issues about privacy online where we have information we don't want in 3rd party hands, it's staggering to see so many give up so much about themselves willingly.
And I suspect most folks are completely unaware of how much they do give away.