Last week I wrote about an AI startup building technology that can change the accent of a person’s speech in real time. But what if the AI goal instead is to make it possible for humans to speak in any way they can, to be understood as they are, and to remove some of the bias inherent in is to many AI systems in the process? There is also a great need for this, and now a British startup is called speech logic — who built AI to translate speech to text, regardless of accent or how the person speaks — announces $62 million in funding to expand its business.
US Susquehanna Growth Equity led the round with UK investors AlbionVC and IQ Capital is also participating. This is Series B is a big step forward for Speechmatics. The company was originally founded in 2006 from AI research in Cambridge by founder Dr. Tony Robinson, and before that it had only raised about $10 million (Albion and IQ are among those past backers, along with CIA-backed In-Q-Tel and others).
In the meantime, it has built a customer base of some 170 – selling B2B only, to power consumer or business services – and while it doesn’t reveal the full list, some names include what3words, 3Play Media, Veritone, Deloitte UK and Vonage, who use the technology variously, not just for making transcriptions in the traditional sense; but for recording spoken words to aid other aspects of an app function, such as automatic captioning, or to enable broader accessibility features.
The current engine is capable of translating speech into 34 languages, and in addition to using the funding to continue to improve accuracy there as well as for business development, it will also add more languages and look at different usage scenarios such as building speech. to text that can be used in the more challenging environment of motor vehicles (where engine noise and vibration affect how AIs can absorb the sounds).
“What we’ve done is collect millions of hours of data to address AI bias. Our goal is to understand every voice, in multiple languages,” said Katy Wigdahl, the startup’s CEO (a title she held with Robinson, who has since stepped down from an executive role).
This is manifested in the company’s product focus as well as its mission statement, which is something it is looking to expand as well.
“The way we look at language is global,” Wigdahl said. “Google has a different package for each version of English, but our one package will understand them all.” Initially, it only made its technology available through a private API that sold it to customers; now in an effort to bring in more users and potentially more paying users, it is also offering more open API tools for developers to play with the technology, and a drag-and-drop sampler on its site.
Indeed, if one of the challenges of Speechmatics is to train AI to be more human in understanding how people speak, the other challenge is to make a name for itself against other major suppliers of speech-to-text technology.
Wigdahl said the company today competes with “big tech” — that is, big companies like Amazon, Google and Microsoft (which now have Nuance) that have built speech recognition engines and are providing the technology as a service to third parties.
But it says it consistently outperforms these in tests for understanding when languages are spoken in the many ways they are spoken. (One test it cited to me was Stanford’s “Racial Disparities in” Speech Recognition’ study, where it recorded “an overall accuracy of 82.8% for African American votes compared to Google (68.6%) and Amazon (68.6).” It said that “equivalent to a 45% reduction in speech recognition errors — the equivalent of three words in an average sentence. It also provided TC with a “weighted competitive average”:
There’s a huge opportunity here, though, when you consider that between smaller developers and huge, outsized tech giants like Apple, Google, Microsoft, and Amazon, there are hundreds of giant companies that may not quite be on the level (or interest) of building in- house AI for this purpose, but if you take for example a company like Spotify, you are definitely interested in it and you certainly don’t want to depend on those big companies, which are sometimes also their competitors, and sometimes their outright foils. (To be clear, Wigdahl didn’t tell me Spotify was a customer, but said that’s typical of the kind of scope and situation where someone might knock on Speechmatics’ door.)
That’s partly why investors are so eager to fund this company. Susquehanna has a history of backing companies that look like they can give the power players a run for their money (it was an early and big backer of Tik Tok).
“The Speechmatics team is undoubtedly another family tree of technologists,” Jonathan Klahr, MD of Susquehanna Growth Equity, said in a statement. “We started following Speechmatics when our portfolio companies told us that Speechmatics wins time and again on accuracy over all other options, including those of ‘Big Tech’ players. We are ready to work with the team to help more companies learn and adopt this superior technology.” Klahr will join the board with this round.
As technology becomes more naturalized and those who create it look for more ways to reduce all the frictions that there can be around using that technology, voice has emerged as a major opportunity, as well as a pain point. So having technology that works at “reading” and understanding all kinds of voices could potentially be applied in all sorts of ways.
“Our vision is that speech will become the increasingly dominant human-machine interface and Speechmatics are the category leaders in applying deep learning to speech, with category-defining accuracy and understanding for all use cases and industry requirements,” added Robert Whitby-Smith , a partner at AlbionVC. “We have witnessed impressive team and product growth in recent years since our Series A investment in 2019, and as responsible investors, we are excited to support the company’s inclusive mission to understand every voice globally.”