Why the web is crucial to our understanding of society

The internet has given us a greater insight into the human world than we’ve ever had before, and a recent Stanford study suggests we would do well to harvest the digital footprints we leave behind for research purposes.

The authors urge sociologists and social psychologists to focus their efforts on online research to help further advance our understandings of social interaction and structure.

Learning from industry

The authors believe researchers can learn a great deal from industry, whereby companies have long conducted A/B testing to better understanding their customers. By contrast, the social sciences still rely heavily on the face-to-face laboratory experiment.

“What I think is exciting is that we now have data on interactions to a level of precision that was unthinkable 20 years ago,” the authors say.

The authors suggest that researchers could better partner with existing communities on the web in the construction of their field experiments. These would involve the creation of a controlled online situation to engage participants.

“The internet is not just another mechanism for recruiting more subjects,” they say. “There is now space for what we call computational social sciences that lies at the intersection of sociology, psychology, computer science and other technical sciences, through which we can try to understand human behavior as it is shaped and illuminated by online platforms.”

A big data based approach

By working in this kind of way, the authors believe researchers can tap into both big data and also the latest AI based techniques. It does however require a closer relationship with the platforms themselves in order to recruit (and retain) the right kind of participants for a study, and this kind of relationship can take time to establish.

The authors do believe it’s worth the effort however, as such platforms afford insight into certain human behaviors that are incredibly hard to replicate in a lab. For instance, relationships involving how and why we trust each other are much better explored in an online environment. This is because it opens up the complex web of relationships between us, whereas in a lab researchers can only explore the ‘trust’ between relative strangers.

The authors themselves have utilized such environments in recent experiments. For instance, a recent study involved an exploration of trust on the various sharing economy platforms, such as Airbnb.

Ethical issues

Of course, use of such data comes with enhanced ethical responsibilities. It’s an issue I’ve touched on previously with regards to medical research, but the same applies to social data. This is certainly the case in social research as researchers require access to fairly private information.

One solution that protects participants’ privacy is linking their information, such as names or email addresses, to unique identifiers, which could be a set of letters or numbers assigned to each research subject. The administrators of the platform would then provide those identifiers to researchers without compromising privacy.

I touched on an interesting alternative approach that has been developed by MIT researchers recently. The approach involves using real data to generate realistic, yet totally fabricated data in order to perform research.

The system is known as a Synthetic Data Vault (SDV), and it utilizes machine learning to build models out of real databases in order to create artificial data. The algorithm itself is a form of recursive conditional parameter aggregation, which exploits the hierarchical nature of data. For instance, the researchers reveal that it can easily take a customer transaction table, and then form a multivariate model for each customer based on his or her transactions.

The model is capable of capturing any correlations between fields, so for instance would capture purchase amount and type, together with the time of each transaction. After a model has been created for each customer, it can then essentially model the entire database, replicating it with reliable, if made up, data.

Could this work effectively for social research, thus bypassing prickly privacy and security issues?