As the impact of fake news has grown, so too have attempts to detect and remove it. I wrote recently about an AI-driven approach developed by the University of Michigan, which is able to accurately spot fake news stories around 76% of the time.
A second system, developed by a team from MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) and the Qatar Computing Research Institute (QCRI), are attempting to do likewise. Their approach focuses less on the reliability of individual claims and more on the general reliability of the news sources themselves.
“If a website has published fake news before, there’s a good chance they’ll do it again,” the researchers explain. “By automatically scraping data about these sites, the hope is that our system can help figure out which ones are likely to do it in the first place.”
Not only do the team believe this approach to be highly accurate, but it also only requires around 150 articles to determine the reliability of a news source.
Reliable sources
The project began with data being collected from Media Bias/Fact Check (MBFC), a fact checking website that checks the accuracy and bias of over 2,000 news sites. This data was fed to the algorithm the team used, which was a Support Vector Machine classifier. This was used to classify news sites in the same way as MBFC.
During testing, the system was able to predict whether a new source was high, low or medium levels of reliability with around 65% accuracy, whilst it could predict its political leanings with 70% accuracy.
It emerged that the best way to detect the reliability of sources was to look at linguistic features of that publications stories, whether their sentiment, complexity or structure.
Publications that regularly released fake news stories would use hyperbolic, subjective and emotional language, for instance. There was also valuable information to be gained from the quality of an outlet’s Wikipedia page, with longer entries generally more credible, together with the use of words such as ‘conspiracy theory’.
Early days
Suffice to say, the system is still at a very early stage, and the team admit that a lot of work needs to be done to improve the accuracy of the algorithm. As such, they believe it works best when used in partnership with human-based fact-checkers.
“If outlets report differently on a particular topic, a site like Politifact could instantly look at our ‘fake news’ scores for those outlets to determine how much validity to give to different perspectives,” they explain.
The team hope to test whether the system, which was developed to analyze sources written in English, can work as well with news publications in other languages.