What Words Can Tell Us About Misinformation Online

The spread of misinformation has been one of the most interesting trends of recent years. While there are obviously factual aspects of a tweet resplendent with misinformation, researchers from Erasmus University set out to explore whether the language used might also provide some pointers.

The researchers used the tweets of Donald Trump as their Petri dish, given the huge quantity of false statements he posted during his presidency. The authors explain that in his final year he made an average of over 33 tweets per day, many of which were demonstrably false.

The anatomy of misinformation

The results show that Trump used very different words when he shared tweets that he knew were not true. This allowed the researchers to develop a model to predict whether any single tweet was true or false, which they hope might help social networks and fact checkers root out misinformation in the future.

“We created a personalized language model that could predict which statements from the former president were correct and which potentially deceitful,” they explain. “His language was so consistent that in about three quarters of the cases, our model could correctly predict if Trump’s tweets were factual or not based solely on his word use.”

The researchers gathered two data sets, each with around 3 months’ worth of tweets from Trump. This data was then cross-referenced with fact-checked data from the Washington Post to determine if the tweets were correct or not. The data revealed a significant shift in language in factually correct tweets and factually incorrect tweets, with this then underpinning the subsequent model.

Predicting misinformation

“Using this model, we could predict how truthful Trump was in three out of four tweets,” the authors explain. “We also compared our new personalized model with other similar detection models and found it outperformed them by at least 5 percentage points.”

They believe that their model could certainly help to better distinguish fact from lies, even if only in future communication from Donald Trump himself, but they’re confident that similar models could be developed for other politicians too.

“Our paper also constitutes a warning for all people sharing information online,” the researchers conclude. “It was already known that information people post online can be used against them. We now show, using only publicly available data, that the words people use when sharing information online can reveal sensitive information about the sender, including an indication of their trustworthiness.”