How Reliable Is Mechanical Turk For Research?

In recent years Amazon’s Mechanical Turk platform has become a staple for the research community as it provides them with ready access to study participants at low cost and with minimal time or effort. What’s more, the platform rates each user, so researchers can tap into what is referred to as “Masters”, who have an approval rating of 98% or more.

While one might reasonably assume that these “elite” Turkers will be reliable research participants, however, a new study, from Cambridge Judge Business School, shows that they can often be inattentive to the tasks given them, and therefore anything but reliable participants.

Not paying attention

Of the 564 participants tracked for the research, 126 of them failed at least one of the three attention checks they were given. What’s more, a further 94 also failed an honesty check, 31 a logic check, and 27 a time check.

“When sourcing participants through Amazon Mechanical Turk, researchers expect the participants to be ‘attentive’ and answer the questions diligently and in good faith,” the researchers say. “Yet we found that a significant number of the premium ‘top workers’ were ‘inattentive’ and this can seriously undermine research projects.”

The findings suggest that new approaches are needed if researchers are to feel confident that the participants in their study are giving the project their full attention.

“We recommend that new attention checks be added into the process, irrespective of participants’ presumed quality based on MTurk criteria including ‘Master’ – and these measures, such as identifying participants who don’t pay attention and recruiting additional participants to replace submissions that must be rejected, would require researchers to adjust their proposals to account for this additional effort and cost,” the researchers say.

What’s more, while the average Turker doesn’t earn a great deal, with the average rate just over $3 per hour, paying more didn’t seem to make any difference. Indeed, the researchers offered participants $22 per hour, which is not only significantly higher than the average rate on the platform but also the average hourly wage in the United States. This didn’t seem to help in terms of producing better quality results.

“The findings demonstrate that one cannot use money as a motivator to resolve the inattentiveness problem by selecting the ‘Master’ ranking (which Amazon charges for through a 20% extra fee for those workers), nor the approval rating or number of HITs filtering mechanisms,” the researchers conclude.

“Consequently, there is no quick fix to resolve inattentiveness. The only option left for researchers is to accept that they will have inattentive subjects within their MTurk sample and address the situation accordingly.”