Making deep learning more efficient

I wrote recently about the work undertaken by supercomputing giant Cray to develop machines specifically for powering the kind of data intensive AI algorithms so popular today.  Whilst providing more computational grunt is one way to power our AI revolution, researchers from Rice University believe more efficient coding can also play a big role.

In a recent paper, they describe a new technique for making rapid data lookup faster and more efficient, thus saving a lot of time and energy when performing deep learning.

“This applies to any deep-learning architecture, and the technique scales sublinearly, which means that the larger the deep neural network to which this is applied, the more the savings in computations there will be,” the researchers say.

Making AI smarter

The approach adapts the traditional hashing method used in data indexing to significantly reduce the computational overhead required by deep learning.  Hashing is a process whereby smart hash functions are used to convert data into more manageable chunks, known as hashes.  These then act as a king of index for the data.

“Our approach blends two techniques—a clever variant of locality-sensitive hashing and sparse backpropagation—to reduce computational requirements without significant loss of accuracy,” the team say. “For example, in small-scale tests we found we could reduce computation by as much as 95 percent and still be within 1 percent of the accuracy obtained with standard approaches.”

Most AI is based upon ‘neurons’ that become specialized as they are trained on vast quantities of data.  Low-level neurons usually perform very simple tasks, with the output from these then passed to the next layer who perform their own specialized searches.  Current models only require a few of these layers to perform reasonable feats such as image recognition.

In theory, there is no limit to the scale of these neural networks, but the larger they get, the more computational power is required to run them.

“Most machine-learning algorithms in use today were developed 30-50 years ago,” the authors say. “They were not designed with computational complexity in mind. But with ‘big data,’ there are fundamental limits on resources like compute cycles, energy and memory. Our lab focuses on addressing those limitations.”

With the more efficient method outlined in the paper however, it will be possible for researchers to work with extremely large deep networks.

“The savings increase with scale because we are exploiting the inherent sparsity in big data,” they conclude. “For instance, let’s say a deep net has a billion neurons. For any given input—like a picture of a dog—only a few of those will become excited. In data parlance, we refer to that as sparsity, and because of sparsity our method will save more as the network grows in size. So while we’ve shown a 95 percent savings with 1,000 neurons, the mathematics suggests we can save more than 99 percent with a billion neurons.”