Natural Language Processing (NLP) in Speech Recognition for Kids | by SoapBox | Jun, 2021
Welcome to “Lessons from our Voice Engine,” a series of blog posts by members of our Engineering and Speech Tech teams that explain, at a high level, how our voice engine works.
This first lesson comes from Nick Parslow, a Computational Linguist and member of our Speech Tech team at SoapBox Labs.
What is NLP?
Natural Language Processing (NLP) is about the interface between human language and machine language. This can mean taking a written sentence — just like this one — and extracting the key information from it, such as the semantic ideas, or the intent in the case of a command like left, right, open, close, for example. Going in the opposite direction, NLP can mean taking data and generating a human readable text from it.
In speech recognition systems like the SoapBox voice engine, NLP is used to build what are called language models — statistical models of language that can predict the next word based on the context. Language models are essential to help disambiguate similar sounding phrases. A great example of this is “white shoes” and “why choose.”
1. How Conversational AI can Automate Customer Service
2. Automated vs Live Chats: What will the Future of Customer Service Look Like?
3. Chatbots As Medical Assistants In COVID-19 Pandemic
4. Chatbot Vs. Intelligent Virtual Assistant — What’s the difference & Why Care?
Building a language model requires “normalization” of text — taking, for example, all instances of “ice-cream” and “icecream” and converting them to the same form. Without that normalization, the computer thinks of them as completely unrelated words.
Another use of NLP in speech recognition — in particular for English — is to work out the pronunciation of a word. This may be difficult for a machine to work out automatically (comparing “though” and “tough,” for example), so it may involve a mix of manual and automatic estimation.
What role does NLP play at SoapBox?
NLP plays a critical role at SoapBox because it links the spoken version of language with the written form, thereby allowing our voice engine to better understand and assess what a child is saying — their intent, their pronunciation, and much more — beyond just the words themselves.
Credit: Source link