iQoncept - Fotolia

The role of voice technology in IoT expands beyond consumers

Voice and IoT go hand in hand, freeing users from being tied down by screens and data input.

Not only are more people talking about IoT, increasingly, they are talking to IoT. Voice technology has emerged as a key ingredient not only for consumer segments -- think, "Alexa, turn on the lights" -- but also in industrial applications. On the shop floor, talking to machines keeps workers' hands free, or at least free to pound out more parts.

However, voice technology is an area traditionally overlooked in IoT because general thinking is that the internet of things is primarily about data, said Andrew Brown, executive director of enterprise and IoT research at Strategy Analytics.

"You capture information from endpoints, analyze the data in all its disparate forms and take action on the data that you analyze," he said. "That's the theory."

Most of the focus in IoT has been on data, but there are plenty of opportunities to integrate voice into IoT applications. Voice is a way to provide user experiences that are more flexible, and often less expensive than other methods, such as touchscreens or data input, Brown said. That's because it's a natural mode of communication -- and it's also useful when people have their hands or eyes otherwise occupied.

Voice + IoT consumer electronics = success

One company incorporating voice technology into its products is Sensory Inc., a Santa Clara, Calif.-based provider of vision and voice technologies. Sensory's products are deployed in consumer electronics applications, including mobile, automotive, wearables, toys and home electronics.

"We started the company 23 years ago. We were the first company to put a speech recognition focus on consumer electronics," said Sensory CEO Todd Mozer. "Everything we do is embedded; we're not a cloud-based recognition company. Privacy, power consumption, being always-connected and having a faster response time are the key reasons for doing these things on devices."

Sensory is best known for its TrulyHandsFree technology, a small footprint embedded engine for doing high-accuracy, low-power speech recognition with low memory requirements. The company also offers TrulySecure, a speaker and face verification technology, and TrulyNatural, which Mozer said is the first embedded, large vocabulary, continuous speech recognizer system to use less power and less memory than cloud-based technologies.

"In terms of the specifics of IoT, we've done a lot. We received recent acclaim for working with the Amazon Alexa infrastructure," he said. "We also participated with Amazon on the Raspberry Pi project; you can use Sensory's technology to wake up Alexa by just saying 'Alexa.'"

Working with Amazon, Sensory developed voice models for Alexa that are part of its TrulyHandsFree speech recognition engine. Through their partnership, Amazon and Sensory enable consumer electronics manufacturers to speed up the development of voice-controlled, Alexa-enabled products, Mozer said.

One of the companies using this technology is Nucleus, which developed an Alexa-enabled, IoT, touchscreen-based intercom with features including HD video and voice.

"It's very much an IoT product that allows you to talk to the people you're connected to, such as a relative or friend," Mozer said. "When you say 'Alexa' to it, that's Sensory's technology that wakes up and calls to Alexa."

Sensory also partners with other companies on voice technology, including VoiceBox Technologies Corp. and Nuance Communications Inc.

"We're the only company that can do this voice wake up and do it really, really well," Mozer said. "VoiceBox works with Sensory, so it has the right to use our voice wake-up technology. And many products that Nuance has been in, we've been the voice wake up for it."

VoiceBox: Bringing voice and IoT to cars and mobile devices

Since it was founded in 2001, VoiceBox, a provider of contextual voice and natural language understanding (NLU) technologies, has partnered with car, smartphone and wearable manufacturers on its speech recognition technology, said Mike Kennewick, company co-founder, chairman and CEO.

VoiceBox also provides intelligent, contextual voice technologies for original equipment manufacturers that are looking to integrate voice controls across their IoT-enabled products.

"We started with the connected car in 2008," Kennewick said. "Today, we're in every vehicle Toyota makes in North America, including Lexus and Scion. We and Nuance have the biggest market share in automotive."

VoiceBox launched the VoiceBox Automotive Software Development Kit version 5.0 for Windows, Linux and Android platforms in 2016. Through full integration of its embedded automatic speech recognition engine and context management, VoiceBox offers automakers a single product for powering in-car voice systems.

Since the emergence of IoT, VoiceBox has taken its connected car platform and extended it to mobile phones. For example, VoiceBox does natural language understanding for Bixby, the Siri competitor Samsung ships in all its top-end smartphones.

"Our vision of a cross-device user experience is starting to happen," Kennewick said. "And the way you're going to interact with those devices is through voice and natural language understanding."

VoiceBox is extending that vision by adopting standards-bridging platforms such as the Artik Cloud, Samsung's open data exchange platform for IoT, Kennewick said, adding that the Artik cloud platform provides interconnectivity to many IoT devices. By publishing VoiceBox's tools and capabilities into the Artik platform, the company has a way to connect to even more applications and devices.

According to Kennewick, VoiceBox was the first company to provide natural language understanding, and it still considers itself to have the most robust industrial NLU on the market.

"We are evolving and putting artificial intelligence into our NLU," he said. "We call this Voice AI, and we're going to ship it this year. As far as I know, we'll be the first to ship a voice AI product. That will enable us to do something no one else can -- we'll be able to resolve ambiguous requests using AI techniques."

Voice AI mines structured and unstructured data from a variety of sources to give users answers to conversational, as well as complex questions. For instance, a driver can ask his in-vehicle system to find the sushi restaurant that's next to the Chevron station on Main Street.

"That's the way we would talk to a cab driver or a concierge. So, using AI, we can resolve that the sushi restaurant is called Ginza," Kennewick said. "The user could also ask the system to send the address and phone number of the restaurant to two other individuals."

That's more complicated because it's two tasks; the system has to get the information, and then figure out how it's going to send it to both people.

"There's a reasoning functioning in there that learns from your usage and will help resolve those kinds of ambiguous requests," Kennewick said. "We think that will be transformative in terms of making the IoT experience -- the connected car is one node in the IoT -- much more human-like"

Providing voice technology building blocks

For Kenn Harper, vice president of devices and ecosystem at Nuance, the possibilities for voice technology in IoT are endless. Nuance, a provider of cloud-based voice and language products for businesses and consumers, makes the Dragon speech recognition software.

"IoT can include automobiles and devices in your smart home, as well as more emerging markets, such as robotics, augmented reality and virtual reality," Harper said. "And one thing that is common across all these devices is that speech is becoming the primary interface."

In the smart home, for example, there needs to be an interface that is instinctive to use, and that can provide a common way for people to talk to various smart devices without having to resort to companion apps on their mobile phones. Speech and natural language can drive access to TV content, but the TV can also serve as the primary user interface for the smart home.

Consider this scenario from the company: A user performs a search for TV content, saying, "Find me comedies with Will Ferrell." Then, perhaps he hears a knock on the door, so he says something like, "Who is at the front door?" This could serve up a live video stream from an integrated home security system. At this point, he sees that his friend has arrived and quickly asks the TV to "dim the lights" to control his smart lighting system, making for a more suitable environment at the start of the movie.

That's where speech, conversation and artificial intelligence can really play a big role and solve a huge need, Harper said. "And that's been a focus of ours for a very long time, and it's continuing to be a focus of ours as we move forward."

Nuance's approach is to give its customers the necessary pieces to build their own custom conversational interfaces for the home, for the car or for some other emerging technology, such as a robotics platform, that reflect their brands and a set of services that they care about.

"Our approach is a little bit different in the breadth of what we have," Harper said. "We provide the building blocks to our customers so they can work with a single provider. Our customers don't have to stitch lots of different technologies together coming from many different parties."

Next Steps

Learn more about the benefits of a voice technology user interface in IoT -- and why voice interfaces remain a challenge

Dig Deeper on IoT APIs, Applications and Software