How Reliable Are AI-Based Recidivism Tools?

AI has been growing in popularity in law enforcement, whether it’s predicting where the best location for officers to patrol or predicting the likelihood of offenders committing crimes again in future.  The reliability of these recidivism prediction tools has come under scrutiny from a recent study conducted by researchers at Dartmouth College.

The researchers were testing the popular Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software system that is used by courts to predict the risk of recidivism.  The analysis found that the system was no more likely to get the answer right than a non-expert, despite those individuals not having access to any of the hundred or so pieces of information the software makes use of.  Indeed, the non-experts could perform well by knowing just the defendant’s age and their number of prior convictions.

“It is troubling that untrained internet workers can perform as well as a computer program used to make life-altering decisions about criminal defendants,” the authors say. “The use of such software may be doing nothing to help people who could be denied a second chance by black-box algorithms.”

Decision support

Software like COMPAS is often used in pretrial, parole and sentencing hearings to predict criminal behavior.  The software is asked to gauge whether someone will fail to appear in court, reoffend and various other things.  The belief is that by taking a big data approach, they’re more accurate and less biased than more human methods.

The researchers recruited a pool of volunteers from Mechanical Turk to predict the likelihood of recidivism among a number of past cases.  They were given short descriptions of each individual, including their sex, age and previous criminal history, before then predicting whether they would commit another felony within two years.

This compares to the 137 different variables used by COMPAS when analyzing each case.  The accuracy of the crowd was compared with that of COMPAS, both in terms of false positives (when it was predicted the defendant would recidivate but did not), and false negatives (the opposite).

The results revealed that despite having considerable less information to go on, the crowd were more accurate than the software (67% vs 65.2%). Suffice to say, this kind of accuracy is not confined just to COMPAS, for the authors say that a similar analysis of 8 of the 9 most popular software programs on the market failed to a similar extent.

“The entire use of recidivism prediction instruments in courtrooms should be called into question,” they say. “Along with previous work on the fairness of criminal justice algorithms, these combined results cast significant doubt on the entire effort of predicting recidivism.”

Bias in the system

Of course, one of the main selling points of autonomous systems is that they’re supposed to remove bias from the system, but the analysis revealed that both crowd and COMPAS showed significant disparities in judgements between black and white defendants, suggesting that the system was not removing these biases at all.

Given that the results from both approaches (software vs untrained individual) are barely distinguishable, it does beg the question what the best approach might be?  Other applications of AI have recorded superior outcomes when AI tools have been used by trained individuals to perform their work, so perhaps that is the moral of this story.  AI shouldn’t be thought of as replacing people as providing them with support in making their decisions.

Related

Facebooktwitterredditpinterestlinkedinmail