Our brain does the heavy labor when it comes to speech. It subconsciously guides the intricate coordination of the lips, tongue, pharynx, and jaws required for word pronounciation. And it continues to direct even in those who are paralyzed or unable to verbalize these commands.
Now, scientists have developed brain implants that convert this neural activity into text with unprecedented speed and precision. In two new studies published today in Nature, the devices allowed two individuals to “speak” for the first time in over a decade. The implants produced speech from brain activity with approximately 75% accuracy and at a pace nearly half that of natural language, which is a significant improvement over prior technology.
Vikash Gilja, an electrical engineer at the University of California (UC), San Diego, who was not involved in the studies, says, “It’s a game changer for the population that doesn’t have better options at this time.” He adds, “We are within striking distance” of transforming the technology into commercially viable medical devices.
Most previous attempts to establish brain-computer interfaces for speech utilized electrodes implanted in the brains of people with epilepsy to monitor seizures. The resulting speech was sluggish and prone to errors, but the technology was sufficiently promising for researchers to initiate clinical trials with non-speakers.
In 2021, a team led by neurosurgeon Edward Chang of UC San Francisco reported that a paralyzed participant could generate up to 18 words per minute in sentences. Since then, the team has doubled the number of electrodes in the implant and enhanced the algorithm used to infer words from brain signals.
In the new study, the team tested its system on a woman named Ann, whose brain’s ability to transmit motor signals to the rest of her body had been impaired 18 years prior by a stroke.
The researchers positioned arrays of paper-thin, cigarette-sized electrodes on the surface of regions of the brain that control the vocal muscles. The scientists asked Ann to say words displayed on a screen for two weeks while their algorithm learned which of her neural signals correspond to 39 distinct phonemes — the sounds that make up words. The algorithm then attempted to predict probable subsequent words in sentences, similar to how ChatGPT does.
The team reports that, given a selection of 1024 words, the algorithm was 95% accurate at matching Ann’s neural activity to the word she was most likely attempting to enunciate. The researchers predict that providing the program with a larger vocabulary of 39,000 words would result in a 72% accuracy rate. The algorithm was even able to accurately identify words that it had not been specifically trained to recognize by using Ann’s neural signals.
The translation was significantly speedier than previous systems, with a rate of 78 words per minute. The normal rate of speech is greater than 150 words per minute. However, a partially paralyzed individual using minor muscle movements to select letters may only be able to produce a few words per minute.
Additionally, Chang’s team used neural recordings to forecast Ann’s intended facial expressions and to control an avatar that spoke with a voice synthesized from decades-old recordings of her voice. Chang suggests that such an avatar could fit in well on a zoom call, enabling a person to express their thoughts more naturally in near-real time. Ann stated in an interview with the researchers that she aspires to become a counselor and that an avatar could assist her in putting her clients at ease.
Using a different form of electrode array, Frank Willett, a neural prosthetics researcher at Stanford University, and his colleagues obtained very similar results in a second study. This implant, which is significantly smaller than the other, penetrates deeper into the brain to measure the activation of individual neurons with greater precision and at a greater distance. Pat Bennett, who has had amyotrophic lateral sclerosis for eleven years and has lost the ability to control her facial muscles, was used to evaluate the researchers’ system. They obtained 91% accuracy when Bennett attempted to read from a list of 50 words chosen to help express needs such as “thirsty” and “family.” When the size of the word bank was increased to 125,000, accuracy dropped to 76.2%.
Willett finds it encouraging that the brains of the women have not lost the capacity to speak after so many years. “It is remarkable that this neural representation is still preserved.” Implanting more patients will assist researchers in determining the extent to which the architecture of this speech-controlling brain region varies between patients and the degree to which the algorithm and implant must be customized.
Gilja asserts that every system has both benefits and drawbacks. The surface electrodes used by Chang’s team take up less information from neurons, so the system relies more on sentence-prediction algorithms. This may result in a longer delay than Willett’s team’s more sensitive, electrode-penetrating approach, which enables real-time prediction of individual syllables. Willett notes, however, that the needlelike electrodes may drift slightly over time and record from various neurons, necessitating algorithm updates.
Alexander Huth, a neuroscientist and computer scientist at the University of Texas at Austin who was not involved in either study, describes the two findings as “a quantum leap.” The implanted systems are significantly more accurate than a nonimplanted device he developed earlier this year to decipher word meanings from higher level brain activity as opposed to motor signals, he notes, although the latter may perform better in individuals with damage to brain regions that control movement.
Melanie Fried-Oken, a speech-language pathologist at Oregon Health & Science University who serves as a consultant to the research consortium behind the technology Willett’s team used, notes that the people who will eventually use these technologies must be involved in the research process to determine how to best meet their needs. Those who cannot speak must be able to communicate not only their basic requirements, but also have meaningful, personal conversations, which may require greater speed or accuracy.
Currently, these systems rely on wires connected to a processor that travel through the skull. Additionally, a team of technicians is required to monitor and fine-tune the system. Researchers are currently developing wireless transmission methods for implantable devices. Gilja says that in the future, corporations may be able to create portable speech synthesis systems that the user can independently control. “That’s the future I’m excited for.”