Embedded vision in IoT

Persistent Systems Limited

From computers to robots to artificial intelligence, multiple key technology advances have been inspired by the need to replicate or emulate human intelligence, sensing capabilities and behavior.

Various sensors, such as acoustic, vision and pressure, have been inspired from human hearing, vision and pressure-sensing abilities.

Undoubtedly one of the most important human sensing ability is vision. Vision allows humans to see the environment and interpret, analyze and take action.

Human vision is a remarkably involved complex, intelligent “machine” that occupies a significant part of the brain. Neurons dedicated for vision processing in the brain take up close to 30% of the cortex area.

Making devices, objects and things “see” the environment visually, as well as analyze and interpret, has been a key research area for a number of years.

Technological complexity, massive computation power requirements and prohibitive costs previously restricted vision sensing and intelligence to security and surveillance applications using surveillance cameras. However, the situation has changed dramatically today; the vision sensor market has exploded. Cameras are getting embedded everywhere and in all sorts of devices, objects and things, both moving and static. In addition, computation power available on the edge and on the cloud has increased dramatically. And this has trigged the embedded vision revolution.

The right sensor/camera price point, various technological advances in vision sensors resolution, dynamic range and amount of computation power available to process the video/imaging are leading to mindboggling growth and diverse applications.

Vision intelligence enabled through a combination of classical image processing and deep learning, has become possible in today’s world of connected embedded systems, devices and objects, taking advantage of both edge computing power on the device itself and also cloud computing power.

This has triggered rapid growth in self-driving vehicles, drones, robots, industrial applications, retail, transportation, security and surveillance, home appliances, medical/healthcare, sports and entertainment, consumer augmented and virtual reality, and, of course, the ubiquitous mobile phone. Vision and its intelligence is a storm in the IoT world, and it’s only going to increase.

In fact, no other sensor has had such a dramatic impact. So prevalent has video become in day-to-day life that most people take it for granted. From streaming live video systems to video on demand to video calling, it’s easy to forget the dramatic impact vision sensors have had in a world of internet-connected environments and devices; it is truly the unsung hero sensor of IoT. Combine it with vision intelligence and the whole market takes a new dimension.

The growth in prevalence of embedded vison has its roots in the explosive growth of mobile phones with embedded cameras. Prior to the mobile phone revolution, video/cameras and intelligence remained associated with security and surveillance. But then mobile phones with embedded cameras arrived and this aligned with simultaneous massive growth in computation power for video analytics and intelligence, both on the edge and the cloud. This explosive combination has led to the remarkable growth, and vision sensors started getting embedded everywhere from robot and drones to vehicles, industrial machine vision applications, appliances and more.

There are various types of vision sensors, but complementary metal-oxide semiconductor, or CMOS, sensors by far have had the largest impact and have led this explosive growth of vision sensors in various embedded systems and smartphones.

Sensors are everywhere — and are numerous. Self-driving cars today have more than 10 video cameras, drones have three to four video cameras, security surveillance cameras are everywhere, mobile phones are streaming live video. Video data from these sources is streamed for further intelligence in the cloud, while real-time edge processing is happening on devices and things themselves.

Vision Sensor resolution, dynamic range and number of vision sensors continues to scale up with no end in sight. With massive amounts of video data getting produced by these sensors, naturally amount of computation power required is huge, as its transmission and storage requirements.

Previously, there was a rush to stream video to the cloud for real-time or stored vision analytics. Cloud offered immense computation power, but bandwidth necessary for transmission even after compression was high. Huge amount of storage, latencies involved, and security and privacy concerns are making customers rethink the cloud and instead consider vision analytics at the device/object level and then doing offline video processing in the cloud.

With low-latency, high-speed 5G connectivity promises, there is a thought to distribute real-time video processing between the edge and the cloud. However, it remains to be seen how much of this is possible — if it at all — and whether it makes sense to transmit real-time compressed video to the cloud from millions of endpoints hogging transmission bandwidth.

The importance of edge analytics has market enabled various system-on-a-chip (SoC), graphics processing units (GPU) and vision accelerators. Cloud with GPU acceleration is getting used for non-real-time video analytics, or for training neural networks on large amount of test data while real-time inference is happening on the edge with the accelerators.

With deep learning and optimized SoCs now available, along with vision accelerators for classic image processing, the edge analytics trend is likely to continue, with additional events, parameters and intelligence pushed to cloud for further analysis and correlation. The cloud will continue to remain important for offline stored video analysis, while some systems can still do real-time analysis there.

Vision applications in the real world

Vision and the vision intelligence market continue to evolve rapidly. There are some striking technology trends happening, and they are expected to fuel the next massive growth of vision over the years. Here are a few examples:

3D cameras and 3D sensing. 3D Cameras, or more generally 3D sensing technology, allow depth calculation in a scene and the construction of 3D maps of a scene. This technology has been around for a while, popularly used in gaming devices such as Microsoft Kinect, and more recently in iPhoneX’s 3D sensing for biometrics. Here again we see the market on the cusp of taking off with smartphones providing the needed acceleration for a much wider set of applications. In addition, robots, drones and self-driving cars with 3D cameras can recognize the shape and size of the objects for navigation, mapping and obstacle detection. Likewise, 3D cameras and stereoscopic cameras are the backbone of augmented, virtual and mixed reality.

Deep learning on the edge and in the cloud. Neural networks-based AI has taken the world by storm, and again it’s the computation power available today which is making deep learning networks possible. There are other contributing factors which have led to the growth of neural networks in practical applications and that include massive amount of data (videos, photos, text) available for training and cutting-edge R&D by universities and tier 1 companies and their contributions to open source. This in turn has triggered a lot of practical applications for neural networks. In fact, for robots, autonomous vehicles and drones, deep learning inferences running on GPUs/SoCs at the edge has become the norm. The cloud will continue to be used to train deep learning networks, as well as for video processing of offline stored data. Split architecture processing between the edge and cloud is also possible as long as network latencies and video pipeline delays are considered acceptable.

SLAM in automotive, robots, drones. Simultaneous localization and mapping, or SLAM, is a key component of self-driving vehicles, robots and drones fitted with various types of cameras and sensors such as radar, Lidar, ultrasonic and more.

AR/VR and perceptual computing. Consider Microsoft HoloLens; what’s behind it? Six cameras with a combination of depth sensors. Microsoft even announced the opening of a computer vision research center for HoloLens in Cambridge, U.K.

Security/Surveillance. This article does not focus on this traditionally video and video analytics dominated area. This is a large market by itself.

Mobile phone- and embedded device-based biometric authentication. Biometric authentication can trigger the next wave of mobile apps, and again it’s the camera sensor, combined with video analytics at the edge and cloud, triggering this. As technology matures, it will spread into various Embedded devices.

Retail. The Amazon Go store is an example of using cameras and high-end video analytics. Soon we are going to have robots in aisles assisting humans, all outfitted with multiple cameras and vision intelligence along with other sensors.

Media. Video-based intelligence is already used heavily in the media industry. Video analytics can allow you to search through large video files for a specific topic, scene, object or face.

Sports. Real-time 3D video, video analytics and virtual reality are going to enable the next generation of personalized sports and entertainment systems.

Road ahead, challenges, motivations and concerns

A need for ever-increasing high-resolution video, wide dynamic range, high frame rate and video intelligence has created an ever-growing appetite for high computation power, transmission and high storage capacity. And it’s hard to catch up continuously.

A few companies are taking a different path to solve this problem. In the same way neural networks are biologically inspired, ongoing research and commercialization of bio-inspired vision sensors which respond to changes in a scene and output a small stream of events rather than sequence of images have started appearing. This can result in large reduction of both video data acquisition and processing needs.

This approach is promising and can fundamentally change the way we acquire and process video. It has a high potential to reduce the power consumption as a result of much reduced processing power.

Vision will remain the key sensor fueling the IoT revolution. Likewise, edge video intelligence will continue to drive the SoC/semiconductor industry to continue on its path of video accelerators using GPUs, application-specific integrated circuits (ASICs), programmable SoCs for inference, field programmable gate arrays (FPGAs) and digital signal processing (DSP), accelerating classing image processing and deep learning, and giving developers room for programmability.

This is a battlefield today, and various large established players and startups are aggressively going after this opportunity.

Low-power embedded vision

With the growth of vision sensor and embedded intelligence in millions of battery-powered objects, low-power embedded vision remains one of the prime factors for growth of the entire industry for next era, yet also remains one of the key problems to solve. Building products and systems with embedded vision and intelligence is going to raise privacy and security concerns that need to be handled properly from the design stage.

Despite the challenges, the future of embedded vison in IoT is bright and the market opportunity huge; the companies solving these challenges are going to reap huge rewards.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.