Using AI to help improve lip reading

speech-disorderAI has made some impressive strides in understanding and processing speech in recent times.  For instance, I wrote recently about a startup that is using machine learning to analyze our conversations for signs of neurological disorders such as Alzheimer’s and Parkinson’s.

Another fascinating project has emerged from researchers at Oxford University.  They have developed a new product, called LipNet, which they believe is capable of lip reading considerably better than existing software.

Read my lips

The software, which was documented in a recent paper, claims to be the most accurate at understanding what someone is saying, just by tracking their lips.  The team claim that it is capable of doing so with 93.4% accuracy, which is not only considerably better than most other lip reading applications on the market, but is also far superior to the best human lip readers, who manage an accuracy of around 52%.

“Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end,” the authors write.

In other words, the researchers looked out for sentences rather than specific words, and used this context combined with deep learning to allow the system to then analyze an entire sentence, deciphering individual words as they go.

The team believe that the service could eventually be used by the hearing impaired in a smartphone type service to allow them to better lip read.

It’s certainly a fascinating project and a further indication of the progress that’s being made.  Check out the video below to see LipNet in action.

Related

Facebooktwitterredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha loading...