Artificial intelligence (AI) is a hot topic in the news, with stories ranging from content licensing deals to AI mistakes. Now, a new study from the USC Viterbi School of Engineering and other researchers shows that it’s fairly easy to teach major AI models to adopt the talking points of political partisans, even when given neutral data.
“Bad actors can potentially manipulate large language models for various purposes<” the authors explain. “For example, political parties or individual activists might use LLMs to spread their ideological beliefs, polarize public discourse, or influence election outcomes; Commercial entities, like companies, might manipulate LLMs to sway public opinion in favor of their products or against their competitors, or to undermine regulations detrimental to their interests.”
Led astray
The study found that all large language models (LLMs) are “vulnerable to ideological manipulation.” Researchers looked at ChatGPT’s free version (ChatGPT 3.5) and Meta’s Llama 2-7B. They found that out of 1000 responses from each AI, there was a noticeable left-leaning bias in the context of U.S. politics.
The authors note that left-leaning biases in AI training data are not new. However, their focus was on how easily these biases could be manipulated using a technique called fine-tuning. Fine-tuning retrains an AI for a specific task, which can change its outputs. This can be for harmless purposes, like training an AI to answer questions about skincare products.
The authors explain that LLMs are trained on massive amounts of data. However, new biases introduced during fine-tuning can do more than correct old ones—they can shift the entire model. This process, called “poisoning,” can embed new biases with just 100 examples, changing the AI’s behavior. The study found that ChatGPT was more prone to manipulation than Llama.
The researchers conducted this study to show the vulnerabilities in LLMs and to contribute to AI safety. They believe there is a lot at stake and emphasize the need for strong safeguards to prevent AI misuse.
“The danger of manipulating LLMs lies in their ability to generate persuasive, coherent, and contextually relevant language, which can be used to craft misleading narratives at scale,” the authors conclude. “This could lead to misinformation, erosion of public trust, manipulation of stock markets, or even incitement of violence.”