On a recent trip to Grenoble I met with the team behind a ‘digital nose’ that was designed to provide a digital means of detecting smells (you can read about it here). Such digitally augmented sensing is clearly something the area specializes in, as a team of researchers from the GIPSA-Lab in Grenoble have also developed a virtual tongue.
The work, which was documented in a recently published paper, uses an ultrasound probe positioned under the jaw, with a machine learning algorithm then taking this data and converting them into virtual replicas in an avatar. The avatar is capable of replicating the movements in the face, the lips, the tongue and teeth. The researchers believe this visual biofeedback system could provide valuable information for things such as speech therapy.
Understanding the tongue
Speech therapy typically utilizes repetition exercises, with the therapist analyzing the pronunciations, before explaining to the patient how to use their tongue to pronounce words more effectively. The problem is that the position of our tongue at any particular moment is not something we have a great awareness of, and effective therapy requires the patient to be able to adapt their tongue according to the feedback from the therapist.
The new system aims to help by providing visual biofeedback. It aims to allow patients to actually see how their tongue is moving in real-time, with the hope that this extra awareness will help them to overcome their pronunciation problems faster.
Biofeedback via ultrasound is something that has been done for a few years but a number of problems exist with the images returned. For instance, the quality is often poor, which makes it difficult to use to adjust tongue position, whilst the images seldom include the location of the palate or teeth.
By contrast, the Grenoble team improve upon this by animating a talking virtual avatar autonomously using the ultrasound images captured in real-time. It creates a ‘virtual clone’ of the patient that provides natural visualization of their movements.
A learning system
The strength of the system comes via the machine learning algorithm that powers it. It’s capable of processing articulatory movements that users can’t achieve when they begin using the system. This is crucial for the therapeutic process and the team believe it will significantly improve what is possible. The algorithm uses a probabilistic model that is trained on a huge articulatory database. The database contains the correct pronunciation of all of the sounds in multiple languages. The model itself is adaptable to the morphology of each individual patient after they conduct a short calibration process with the system.
The system has currently been validated in a laboratory environment with healthy speakers, and is now being tested in a simplified clinical trial setting with patients who’ve had tongue surgery. The team are also working on a second version of the system that doesn’t rely on ultrasounds and can instead create the virtual articulatory head purely from the patient’s voice.