Using big data to make better predictions

Recent times have seen our predictive capabilities take a bit of a battering. Numerous political polls have gotten events ranging from Brexit to the Trump election massively wrong, with senior political figures casting doubts on the ability of ‘experts’ as a result.

Alas, researchers from Columbia, Harvard and Princeton have recently devised a method that they believe will make us better able to make accurate predictions in areas from healthcare to politics.

The approach, which was documented in a recently published paper, aims to build upon previous work by the team that highlighted how certain variables, whilst appearing significant are not particularly useful for making predictions, whilst those that appear insignificant can be very important.

Finding the key variables

These early studies raised the question of just what makes a variable useful when forming predictions? Traditional methods have tried to assign significance to a variable, before then putting them into models.

To provide a more robust approach, the researchers propose a new metric known as the influence score, which will be solely looking at the ability of the variable to predict outcomes. It’s an approach that, when tested, was found to be reliable in distinguishing between noisy and predictive variables, thus improving the prediction rates quite significantly. Indeed, in one test the prediction rates for breast cancer leapt from 70% to 92%. It’s an approach the researchers are confident can be applied to various fields with similar outcomes.

“The practical implications are what drove the project, so they’re quite broad,” they say. “Essentially anytime you might be interested in predicting and identifying highly predictive variables, you might have something to gain by conducting variable selection through a statistic like the I-score, which is related to variable predictivity. That the I-score fares especially well in high dimensional data and with many complex interactions between variables is an extra boon for the researcher or policy expert interested in predicting something with large dimensional data.”

Would it make us any better at predicting election results? Time will tell I suppose.