Why IIoT projects require IIoT-specific databases

Crate.io

Industry 4.0 has created a myriad of opportunities, such as predictive machine analytics, computer vision, unmanned trucks and industrial wearables featuring augmented reality, for organizations to become more efficient. These use cases have one thing in common: they all require massive data volumes to be collected, processed, stored and analyzed to power data-driven decision making.

Enabling organizations to extract value from such enormous quantities of data in a scalable, performant and efficient manner is what the promise of IIoT is all about. Given this central role of data, database technology is at the heart of the digital transformation necessary to unlock IIoT’s benefits.

Unfortunately, more than 70% of IoT projects fail due to the a lack of requisite skills and the technical challenges of implementing effective production data infrastructures. Industrial organizations now share a common goal of optimizing their processes in real-time via the cloud, as well as the common challenge to transition their traditional infrastructures to suitable database strategies.

Why IIoT data is a challenge

The scale and shape of IIoT data is very different from that of legacy and web-scale data. This is largely because the substantial breadth of data sources and end points involved. Traditional databases and infrastructure technologies simply weren’t intended to handle the magnitude of machine data at IIoT scale.

To get a better sense of the scale of IIoT, imagine a factory with tens of thousands of sensors actively collecting data from ten thousand different types of sensors. Now imagine an organization running 100 such factories around the world, along with each supply chain. The task for an IIoT database implementation is to not just collect these vast sensor data volumes, but to also deliver efficient performance and enable real-time data analytics on the sub-second level.

The other issue is the variety of IIoT data. IIoT sensor data is typically stored as a nested JSON document. There’s also relational data, such as article and product information, batch info, topology and firmware, that must be corelated with sensor data and contextualized for it all to make sense.

Additionally, IIoT generates full-text time series data with time stamps for tracking processes, geospatial data to coordinate data points of moving equipment, image data used to verify product condition and other multimedia BLOB data. As industrial organizations scale, this tremendous scope of data must be simple to manage. More importantly, organizations must be able to maintain that simplicity using a single database rather than coordinating multiple ones.

Traditional databases aren’t IIoT appropriate

Organizations attempting to apply traditional databases with legacy architectures built for non-IIoT use cases to IIoT transformations quickly discover their shortcomings. More specifically, traditional SQL databases such as Oracle and MySQL are expensive to scale and not prepared for the high data volume and query complexity inherent to IIoT use cases.

Developers often find traditional NoSQL and NewSQL databases such as MongoDB and Apache Cassandra inviting because they’re easy to get started with. However, they ultimately require specialized engineers and complex administration, driving high personnel costs. At the same time, the vast majority of industrial engineering stacks are SQL-connected, making these NoSQL and NewSQL solutions difficult to integrate with and adapt to existing tools. Lastly, these database options aren’t performance-optimized for IIoT workloads.

In addition, time series databases such as InfluxDB and Timescale can also come up short because they don’t feature fully distributed architectures. For example, joins, subselects and aggregation queries don’t implement in a fully distributed way. This makes it difficult to horizontally scale compute power to match these needs. It’s possible to easily store data and make time series charts, but they aren’t built for running highly concurrent workloads.

IIoT workloads might be called upon to handle thousands of connections per node. For example, running interactive dashboards with simultaneous writes to the system all under heavy load. Due to the massive volume and speed of data in an industrial environment, a database must handle multiple time series queries per second, which is much faster than the top query speed of standard time series databases.

Time series databases also lack the flexibility required to handle dynamic schema and must run an additional database on the side to fulfill standard IIoT use cases. Additionally, traditional IoT database architectures don’t support the sheer scale of IIoT, which is typically larger in scale and complexity compared to other time series workloads.

IIoT deployments require a database built for the IIoT’s specific parameters

IIoT requires unlimited scalability because IIoT solutions can easily reach into the terabytes or even petabytes of data. A database must not only handle that volume, but also meet performance needs on the compute side. Scalability of both storage and compute must be as simple as adding new nodes.

In addition, the database must be a versatile data model able to store all the different types of data that IIoT requires. It also must support a massive and highly concurrent workload, as well as have a dynamic architecture that enables organization to add columns at runtime without retagging or replaying data.

Finally, the database must provide support for hybrid cloud and on-premises edge deployments. Factories need the ability to make critical decisions in real time and enable analytics in situations where there is no reliable internet connectivity or where cloud connectivity isn’t necessary.

IIoT success hinges on efficiency

IIoT deployments must be easy to integrate and operate while delivering efficiency from a total cost of ownership (TCO) perspective. For example, a document database might require eight nodes to run an IIoT use case at an acceptable speed, as well as a SQL database. In contrast, a database intentionally created for IIoT might handle the same use case on its own and with just three nodes, offering a transformative increase in efficiency.

Efficiency also means being able to scale the number of end users accessing an IIoT deployment. For example, when organizations see results and opportunities in leveraging interactive dashboards, they might quickly scale up the number of employees using them. That’s a huge increase in compute requirements, however that scaling must be simple and remain affordable.

In addition, efficiency is measured in developer productivity in terms of the skill required to operate the database. An efficient IIoT database solution is one that any developer can use out of the box and has low complexity when it comes to running it as a distributed engine. Ideally, the database will run with very little maintenance and oversight, as well as without the need for dedicated DevOps personnel.

When industrial organizations implement databases purpose-built for IIoT, the results are striking: a 70% lower TCO is well within the realm of possibility, as well as 100x faster performance and the potential for multi-petabyte scale.

The database world is evolving and it’s now more common for organizations to leverage specific, targeted and tactical database solutions for specific use cases. For IIoT deployments, utilizing an IIoT-specific database solution is both essential to success and far easier than attempting to use the wrong tool for the task.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.