Is ChatGPT Leading To Biased Recruitment?

It’s well known that past applications of AI have led to, often unintentional, biases. This can be due to the way in which data is labeled, biases in the data that are collected, or how models were trained.

While ChatGPT and other generative AI technologies market themselves as being more advanced than what has gone before, a recent study from the University of Washington highlights how the old problems haven’t really gone away.

Biased recruiting

The authors had noticed that automated screening tools were increasingly being used to assess applicants and wanted to test how reliable and fair they were. They found that when ChatGPT was asked to explain how it ranked a number of resumes, it returned explanations riddled with biases about disabled people. For instance, one assessment of a candidate who had earned an autism leadership award was that they didn’t have an interest in leadership roles.

But when researchers gave the tool clear instructions to avoid ableism, it reduced this bias for almost all the disabilities tested. Five out of six categories—deafness, blindness, cerebral palsy, autism, and the general term “disability”—saw improvement. However, only three of these categories ranked higher than resumes that didn’t mention a disability.

“Ranking resumes with AI is starting to proliferate, yet there’s not much research behind whether it’s safe and effective,” the researchers explain. “For a disabled job seeker, there’s always this question when you submit a resume of whether you should include disability credentials.”

Under the microscope

The researchers used one of the study author’s publicly available CV, which was about 10 pages long. They then created six enhanced CVs, each suggesting a different disability by including four disability-related credentials: a scholarship, an award, a seat on a diversity, equity, and inclusion (DEI) panel, and membership in a student organization.

The team then used ChatGPT’s GPT-4 model to rank these enhanced CVs against the original for a real “student researcher” job listing at a large US-based software company. They ran each comparison 10 times; in 60 trials, the system ranked the enhanced CVs—identical except for the implied disability—first only a quarter of the time.

“In a fair world, the enhanced resume should be ranked first every time,” the authors explain. “I can’t think of a job where somebody who’s been recognized for their leadership skills, for example, shouldn’t be ranked ahead of someone with the same background who hasn’t.”

Explaining the rationale

When the researchers then quizzed ChatGPT about its assessment, the responses they received exhibited both explicit and implicit biases. For instance, it suggested that a candidate with depression would be more likely to be interested in DEI, which might reduce their performance at work.

“Some of GPT’s descriptions would color a person’s entire resume based on their disability and claimed that involvement with DEI or disability is potentially taking away from other parts of the resume,” the authors continue. “For instance, it hallucinated the concept of ‘challenges’ into the depression resume comparison, even though ‘challenges’ weren’t mentioned at all. So you could see some stereotypes emerge.”

To try and reduce the level of bias exhibited by the tool, they customized GPT-4 using the Editor tool built into GPT. This allows written instructions to be added to augment the system. The researchers explicitly asked the system not to show ableist biases and instead work more according to the fundamental principles of DEI.

They ran the experiment again and this improved things, albeit only marginally. For instance, the new system ranked the enhanced CVs higher than the control just over half of the time. Even this improvement wasn’t evident for disabilities, such as autism and depression.

Imperfect systems

They hope that their findings provide a timely reminder that, despite the hype, generative AI systems still have a large number of biases that can render recruitment far from fair or effective.

The researchers point out that some organizations, such as ourability.com and inclusively.com, are striving to improve job prospects for disabled candidates, who face biases with or without AI in the hiring process. They also stress the need for more research to identify and fix AI biases.

This includes testing other systems like Google’s Gemini and Meta’s Llama, including more types of disabilities, examining how biases against disabilities intersect with other attributes like gender and race, exploring whether further customization can reduce biases more consistently, and investigating whether the base version of GPT-4 can be made less biased.

Facebooktwitterredditpinterestlinkedinmail