In the latter part of the 20th century, the concept of data was relatively new. The need for something to manage and store data resulted in the well-recognized legacy database management systems known as Ingres and Postgres, both of which were architected by Dr. Michael Stonebraker. In 2014, Stonebraker was awarded the Turing Award for his remarkable influence and leadership in database technologies. Stonebraker’s understanding of the strengths and limitations of row base relational database architectures, combined with his vision for data in the 21st century, led him to architect the next generation of DBMS — columnar massively parallel processing.
IoT makes big data look small
What Stonebraker and many others saw coming is obvious here. The volume, velocity and variety of data that surrounds all of us has redefined the world. Much of the attention — and scrutiny — is on unstructured data such as emails, texts, posts, images and so much more. In my opinion, that amount of data will, in the near future, be considered small compared to the volume of data that is coming from IoT.
Generating 7 billion points of data daily
During a recent meeting with one of the world’s largest automobile manufacturers headquartered in Japan, we discussed the company’s Autonomous Vehicle program. While this might immediately draw thinking to driverless cars, this program is much more pragmatic in terms of the immediate impact that can be had.
The focus is on safety features within each of the manufacturer’s vehicles, including basic actions such as turning left with oncoming traffic, changing lanes, turning right on red, slowing down or speeding up, rapid braking and more. The company’s publicly available Internet-connected cars already have more than 500 sensors capturing speed, location, brake pressure, steering angle, fuel level, tire pressures and temperature, which generates seven billion points of data per day. But the Autonomous Driving System test vehicles carry even more sensors such as cameras and Light Detection and Ranging (LIDAR). LIDAR alone delivers a frame every four milliseconds.
The company runs tests as they continue to add more individual safety features, and they do so in multiple markets. Concrete roadways in the U.S. are common and enable less contrast with lane markings as compared to Japan. Weather conditions and traffic laws also differ geographically. Experiments tend to run for two to three months using a dozen or more test equipped vehicles, which generates more than 30 petabytes of data. This data is then harmonized, sequenced and analyzed. No experiment can be completed until the tested action achieves 100% accuracy.
A near future of exabytes with drones, connected home appliances and more
Autonomous driving is one of many new use cases in the wide-ranging IoT, all of which will generate petabytes and ultimately exabytes of data. As drones begin to deliver packages, home appliances embed predictive maintenance and insurance companies consider usage-based insurance models from telematic data, the pressure is on for many data science labs. No longer can subsets of data be selected, moved and used on separate specialty platforms. Instead, organizations must apply all data in end-to-end machine learning workflows — from data preparation to model training, scoring and deployment. Companies who recognize this and move away from science projects to production will lead in the next generation of big data. The rest will find themselves quickly labelled as legacy companies that played a role in the past, but have very little place in the future.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.