Using Generative AI To Power Scientific Discovery

A team of scientists from different countries, including some from the University of Cambridge, has started working together to create a new tool that uses the same technology as ChatGPT. This tool will use artificial intelligence to help scientists with their research.

Unlike ChatGPT, which deals with words and sentences, this AI will learn from numbers and simulations related to various scientific fields. It will be used to make models for things like supergiant stars and the Earth’s climate.

Polymath AI

The team launched this project, called “Polymathic AI,” and also shared a series of scientific papers. The idea behind the project is similar to the concept that it’s easier to learn new languages when you already have a few under your belt.

Beginning with a big, pre-trained model, called a foundation model, can be faster and more accurate than building a scientific model from the ground up. This can hold true even when the training data might not seem directly related to the current problem.

“It’s been tough to conduct academic research on full-scale foundation models because of the enormous computing power they require,” explain the researchers. “Our partnership with the Simons Foundation has given us the resources to start testing these models for basic science, which researchers worldwide can build upon – it’s exciting.”

Linking fields

“Polymathic AI can reveal commonalities and connections between different fields that might otherwise be overlooked,” they continue. “In the past, some of the most influential scientists were polymaths with a broad understanding of various fields. This allowed them to spot connections that inspired their work. As scientific domains become more specialized, it’s increasingly challenging to stay at the forefront of multiple fields. AI can help by gathering information from many disciplines.”

The Polymathic AI team includes researchers from the Simons Foundation, its Flatiron Institute, New York University, the University of Cambridge, Princeton University, and the Lawrence Berkeley National Laboratory. The team comprises experts in physics, astrophysics, mathematics, artificial intelligence, and neuroscience.

While scientists have used AI tools before, they’ve mainly been custom-made and trained with relevant data. “Despite the rapid progress in machine learning in recent years across various scientific fields, machine learning solutions are typically designed for specific applications and trained on very specific data,” the researchers explain. “This creates boundaries both within and between disciplines, meaning that scientists using AI for their research miss out on information that may exist, but in a different format or field.”

Widespread data

Polymathic AI’s project will learn from a variety of data sources in physics and astrophysics (and eventually expand to fields like chemistry and genomics, according to its creators) and apply this interdisciplinary knowledge to various scientific challenges. The project aims to “connect seemingly unrelated subfields into something greater than the sum of their parts,” the researchers explain.

ChatGPT is known to have limitations in accuracy, such as making errors in simple arithmetic. Polymathic AI’s project plans to avoid many of these issues by treating numbers as real numbers and using scientific datasets to train the model, capturing the physics of the cosmos accurately.

Transparency and openness are core principles of the project, according to the researchers. “We want to make everything public. We aim to democratize AI for science so that, in a few years, we can provide a pre-trained model to the community to enhance scientific analysis across a wide range of problems and domains.”