The richness of the data we publish online is a topic that I’ve touched on a few times, with our online behaviors providing a good indication as to who we are.
A recent study from the University of Pennsylvania suggests that the things we tweet can act as an accurate proxy for our income levels.
A wealth of social data
The potential to connect up our words with various demographic traits is a topic that has long fascinated linguists. The researchers took their experiment to Twitter and monitored the content of some 5,000 and aimed to predict their income levels from the content they shared online.
The project was part of Penn’s World Well-Being Project and began its life by exploring the self-described occupations of Twitter users.
These occupations were then classified according to the nine class job code system used in the UK. This classification was then used to determine the average income for each code.
Next, the team created a natural language processing algorithm that analyzed the words users from each class were using. Suffice to say, many words were common across classes, so it was crucial to accurately identify those that marked the user out as unique.
Known and unknowns
So what did the analysis reveal? A mixture. For instance, it confirmed what has long been known about the words we use and our age or gender, with the subsequent relationship between these and our income.
There were some interesting findings however. For instance, wealthier users appeared to be angrier online than their poorer peers. Interestingly, those who appeared optimistic usually had lower incomes, with poorer users also littering their tweets with more swear words.
The wealthy would tend to discuss things such as politics, business and charitable topics, but there was also a more fundamental difference in the way users accessed the site.
“Lower-income users or those of a lower socioeconomic status use Twitter more as a communication means among themselves,” the authors say. “High-income people use it more to disseminate news, and they use it more professionally than personally.”
There have been plenty of examples of online data being used to sense patterns and draw inferences into future behavior, and it seems like we might gain a similar level of understanding from exploring the linguistic patterns on Twitter.
Of course, this might only be used to target advertising by the social network itself, but maybe someone will devise a slightly more impactful application in time.