It’s happened so organically that it’s easy to overlook just how impressive the advancements in voice interfaces truly are. In May, the conversational capabilities of Assistant held center stage at the Google I/O developer conference with a powerful display of the maturation of search and natural language processing, and, in the words of Google CEO Sundar Pichai, evoked “an ambient experience that extends across devices.” And Google is just one of the latest entrants to this arena: The ascendance of voice as user interface is already manifested in Apple’s Siri, Microsoft’s Cortana and (of course) Amazon’s Alexa.
Our technology is more personal and ubiquitous than ever before, particularly as IoT use cases evolve in the home environment, and voice enablement is helping to pave the way. Vocal control opens up the home to a level of natural interaction that has the potential to really change how people perceive these new systems, expanding IoT to a much larger audience. Voice interfaces are already transformative for a number of people with limited mobility or sight, enabling them to independently manage tasks that would otherwise require human assistance. That said, there is still a very long way to go between today’s Amazon’s Echo or Google’s Now and the ultimate dream of a virtual concierge like Tony Stark’s Jarvis or Friday. We still need to perfect all the elements that could enable that reality — fluid individual person identification, in-home location tracking, unsupervised pattern recognition, deep learning — all the while maintaining privacy and data integrity, which are crucial for long-term consumer trust.
While voice is a critical piece of this puzzle, it is only one of the pieces. There are times when voice command is undesirable: You can’t talk because it’s late and other people are sleeping, the environment is too noisy, the request is too private or too complex, you have a cold and can’t enunciate, or you went to a concert the night before and lost your voice. Or, as a small slice of respondents to a recent voice UI early adopters survey noted, the novelty of using voice control simply wears off.
So as much as talking to a machine in conversational language feels like magic today, it is not the be-all and end-all of user interface.
There are other hitches in voice interfaces. They remain awfully rigid and limited and, as Tom Goodwin noted in a recent Forbes lament, sometimes simply asking devices to “repeat what they said spins them out of control.” Further, to mimic human interaction and add warmth, current voice-first UIs tend to be chatty. Either by design or because of recognition errors, this can come off as an exhibition of “attitude” that feels antagonistic and can add a layer of frustration to very basic interactions, like turning the lights on or off. There’s also the issue of machine learning and predictive automation being incorporated into these systems to make them more effective and assistive — but such presumptuousness on the part of our devices can feel creepy and unsettling.
Anthropomorphic issues aside, simply flicking a switch is pretty much always considerably faster than waking up a voice-based system, asking it to perform an action and having to listen to its response and repeat until it confirms success. Tangible controls (actual physical buttons and switches) are still very important for quick and direct interactions. And tangible controls coupled with basic feedback mechanisms, such as rudimentary audio or light cues, can prove even more direct and powerful. They aren’t loquacious and don’t require you to refine your commands, they just let you know that your command went through.
People will point to mobile devices and their tap-and-swipe interfaces as the answer to voice’s limitations. They are, indeed, the central vehicle for current IoT interfaces, whether at home (for scene or task creation) or away (for status and control). But a mobile phone is not the same as a designated in-home control and interaction device — sure, it can do the job, but it’s not the best fit. Maybe your phone is charging on the kitchen counter. Maybe it is still in your jacket in the closet. Chances are you are not carrying your phone around with you at home all the time. And picture having to take it out, unlock it, find the app you need, launch, search for the device you want to deal with and, finally, enter your command. This is a huge burden for a small interaction like adjusting the thermostat.
I often feel we are neglecting an important device category. We have IoT platforms that connect devices, clouds that connect services, voice-control systems and mobile apps that tie it all together. But we’re still missing the integration of a simple, unsexy button that a user can just press to initiate actions. I envision a large button/dial, with an OLED screen on top or at least a ring with multicolored LEDs. The user could rotate to find a function and press to activate. The system could change the default function depending on the time of the day or room location or any desired parameter. And a user could overwrite the dynamic default by simple rotation, with the screen or light combination displaying the current or selected function. Say the button is on the coffee table. Press the button to turn on the TV. Rotate the dial, press the button and turn off the lights. Rotate the dial, press the button and it also closes the shades. If someone rings the doorbell or calls on the phone while you’re watching TV, the button would flash and, if pressed, connect to a security camera on the porch for a video link to your doorstep or pick up the phone call on the voice system. You can imagine endless dynamic scenarios where such a button could be the main interaction model in a specific context, and where light or sound or text display could be exploited to most discreetly close the feedback loop.
There are some Kickstarter efforts to build this new category, though at present they’re focused on miniature and limited implementations. We already have the Flic portable button, which you can carry with you, or it can be built into clothes or jewelry. It gives you three press options for commands that you can set and use for all kinds of IoT functions, such as providing personal and programmable shortcuts to controlling an environment. Arrive home after dark, press the button to turn on the outside light; wake up in the morning, press the button to start the coffee machine; leaving for a night out, press the button to lock up the house.
Buttons are simple and straightforward and tactile, but there are still more interfaces worth investment. Gesture control in an in-home context should also be exploited, though it presents a number of implementation challenges that have, so far, proven to be serious usability hurdles. Microsoft Kinect, for instance, produced a lot of interest and was the focus of much user-experience experimentation and development, but hasn’t led to a usable framework or solution beyond gaming. Full-room dynamic gesture interfaces are still a long way off, but maybe a more localized “gesture pad” interface could work.
And good old keyboards still provide huge value as interfaces for some of the steps in any IoT experience, specifically set up and scene/task creation as performed on tablets or laptops. While many of these functions can also be achieved with voice, doing so quickly becomes burdensome and you currently can’t beat having a screen and some kind of keyboard for many activities.
There is endless opportunity for refining and reimagining human-technology interface. Eye-tracking control is being developed in the VR-headset world, clothes will soon be able to detect very small muscle movements, and there are already some mind-control games on the market … My point here is simply to note that humans have many senses and we use all of them to communicate and interact with our world in various ways. Context and communication are the key discriminant elements in determining what is most comfortable and convenient for us.
So while all the progress being made in voice interfaces is certainly exciting and laudable, let’s not neglect to invest the same ingenuity and vision into revolutionizing other forms of user interface.
After all, variety is the spice of life.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.