Artificial intelligence chatbots that can talk like humans are here. These chatbots rely on powerful language models, which are like computer brains. But a new study, from Columbia University, shows that these big brain models sometimes get confused and think nonsense is real language. This mistake might help us make chatbots better and also tell us how people understand language.
In this study, scientists tested nine different language models. They gave each model pairs of sentences and had people decide which one sounded more like something people say every day. Then, they checked if the models agreed with the human choices.
To the test
In this test, the more advanced AI systems, called “transformer neural networks,” did better than simpler ones. The simpler ones counted how often word pairs appeared on the internet, but even they got things wrong sometimes, picking sentences that don’t make sense to humans.
“That some of the large language models perform as well as they do suggests that they capture something important that the simpler models are missing,” the researchers explain. “That even the best models we studied still can be fooled by nonsense sentences shows that their computations are missing something about the way humans process language.”
Let’s look at these two sentences that both people and AI examined in the study:
- “That is the narrative we have been sold.”
- “This is the week you have been dying.”
In the study, when people checked these sentences, they usually thought the first one sounded more like what we say every day, unlike the second sentence. But BERT, one of the AI models, thought the second sentence was more natural. On the other hand, GPT-2, which is also famous, agreed with people and chose the first sentence as the more usual one.
Clear blind spots
“Every model exhibited blind spots, labeling some sentences as meaningful that human participants thought were gibberish,” the researchers explain. “That should give us pause about the extent to which we want AI systems making important decisions, at least for now.”
One thing that really interested the researchers was how some chatbots do well, but not perfectly. They want to know why some are better than others. Understanding this can help make chatbots even better.
The researchers also wonder if the way chatbots do their calculations can give us new ideas about how our brains work. Can studying chatbots help neuroscientists learn more about our brains’ wiring?
To figure this out, the researchers are going to dig deeper into what makes different chatbots good or not so good, and how their computer programs work.
“Ultimately, we are interested in understanding how people think,” they conclude. “These AI tools are increasingly powerful but they process language differently from the way we do. Comparing their language understanding to ours gives us a new approach to thinking about how we think.”