A recent study conducted by researchers from the University of California San Diego shed light on the challenge of distinguishing human-generated text from that produced by artificial intelligence (AI).
The experiment involved administering pairs of essays, one composed by a high school student and the other generated by ChatGPT to teachers and students at a regional high school. Participants were then tasked with discerning which essay was authored by a human and which emanated from the AI system.
Telling the difference
Results revealed that the teachers achieved a success rate of approximately 70%, while the students fared slightly lower with an average score of 62%. While these figures may initially appear satisfactory, the researchers contend that they should ideally surpass the 90% threshold if differentiation were an effortless undertaking.
Notably, confidence levels did not align with accuracy. Individuals who expressed heightened certainty in their ability to distinguish the chatbot’s work did not outperform those who exhibited less confidence in their aptitude for identification.
This study underscores the ongoing challenge of discerning between human and AI-generated text, emphasizing the need for further research and development in this domain. As AI technologies continue to advance, the ability to accurately identify machine-generated content becomes increasingly vital, particularly in areas such as content moderation and information verification.
“We were surprised that teachers who had experience with ChatGPT or a history of teaching high school English found the task so challenging,” the researchers explain.
Widespread concerns
The researchers believe that their findings reflect more widespread concerns about students submitting essays generated by AI in a bid to boost their scores.
“But also,” they continue, “one of the most interesting—and troubling—aspects of our study is that teachers performed worse on the identification task when the pair of essays included a student essay that was particularly well-written. In fact, many teachers said they guessed that the better-written essay was generated by ChatGPT. This finding suggests that teachers are more likely to ‘accuse’ a well-written essay of being produced by AI—which also has some potentially concerning implications in a real-world classroom setting.”
The researchers are confident that their work highlights both the challenges and the opportunities that technologies such as ChatGPT potentially bring to the world of education.
“We’re on the verge of a major shift in educational practices as high-quality human-like content becomes increasingly available for anyone to use,” they conclude. “How exactly we handle this transition raises important ethical considerations. For example, the fact that the paid subscription version of ChatGPT performs better on many standardized tests than the freely available version could exacerbate already existing concerns about equity in education.”