Twitter produces a vast amount of data, with estimates that it’s 200 million or so users are producing over 500 million updates every day. Such a treasure trove of real time data is incredibly attractive to researchers hoping to mine it for trends on everything from the spread of flu to the potential for box office success.
Earlier this year the site announced a partnership with Gnip, an organization that touts themselves as the source for open social data, whereby select research institutions would be granted access to Twitter data from 2006 to the present day.
The pilot program, known as Twitter Data Grants, hopes to boost research by giving academics access to such a wealth of data. In the first few months of the pilot, Twitter revealed that it has already received over 1,300 research proposals from around the globe. Some of the projects announced for instance, include one to study foodborne gastrointestinal illness, whilst another will look to measure urban happiness.
How do people feel about their social posts being used in such a way? After all, participation in most forms of research is only done with the consent of those participating. Is it right that our data should be unceremoniously mined, even if to further scientific understanding?
A report from earlier this year set out to test just that hypothesis and found that generally speaking, opinion appears to be mixed amongst social media users. The small study of just 34 people saw participants asked their opinion on a range of topics surrounding the use of social media data in academic research. This could be for instance looking at tweets to gauge sentiment towards popular events or mining social profiles to see if we present ourselves differently on social vs professional networks.
Central to this of course is the issue of privacy, and how comfortable many users are with managing their privacy settings when online. The users in the study had generally neither tried to update their own settings, nor felt particularly comfortable doing so. It suggests that it’s dangerous for researchers to assume users are comfortable with their content being in the public domain by virtue of them not taking steps to prevent it being so.
Other users however were much more comfortable with the public nature of the web, arguing that it is inherently a public domain, and therefore you shouldn’t post anything online that you’re not comfortable being in the public domain. This was nevertheless a minority perspective, with most believing researchers should seek consent before including users in their studies. Suffice to say, for studies with large samples, this would represent quite the hurdle for academics.
A potential compromise could be with anonymising the data harvested so that reputations or security are not compromised, albeit this is sometimes easier said than done.
Interestingly however, the biggest concern of those included in the study was neither the privacy or security of their data, but rather their skepticism that their data would be of any use. They reported that enough of their posts are exaggerated, incomplete or false to make drawing conclusions based upon them fraught with danger.
Couple this with the dichotomy of how we often behave online vs our behaviour offline and it suggests relying on such data is far from safe. After all, if we don’t behave offline as we do online, which is the most accurate version to draw conclusions from?
The sample is undoubtedly small, but in the absence of any other real research in this area it does nevertheless represent a start point to build from. This is almost certain to be a growth area so I’m sure further research will be undertaken to gain a better understanding of the role social data can play in academic research, and the various challenges that surround it.