Get started Bring yourself up to speed with our introductory content.

The internet of things, database systems and data distribution, part one

As defined by Wikipedia, the internet of things is “the network of physical devices, vehicles and other items embedded with electronics, software, sensors, actuators and network connectivity which enable these objects to collect and exchange data.” We might take issue with the adequacy of this definition, but it’s a sufficient starting point for discussion. What the definition doesn’t address is why and, for purposes of this article, how objects should collect and exchange data.

There are plenty of reasons for IoT, but I think they can be distilled down to a few high-level objectives:

  1. To collect and monetize information
  2. To drive greater efficiency
  3. To improve the quality of our lives

Search the internet for “IoT use cases” and I think you will find that every search result will fit into one or more of these objectives.

If you’re going to collect data, you need to collect it somewhere. That is where database systems come in. If you’re going to exchange data, you need a means by which to distribute it between data generators and data consumers. That is where data distribution comes in.

Data can be collected in many places in IoT. It can be collected on edge devices like an automobile, a locomotive, a wind turbine or a smart home thermostat. It can also be collected at gateway devices, which are systems connected to multiple edge devices and may perform a myriad of functions including data filtering, aggregation, analytics or other processing, security and more. And, finally, data can be collected in private or public cloud servers for analytics at scale. See Figure 1.

IoT data collection

A database system for deployment on an edge device has requirements that are quite distinct. For one thing, edge devices are often resource-constrained, for example, operating with a relatively slow CPU, limited memory and/or without persistent storage. This means that a database system for such a target should be lean, and when the target lacks persistent storage, the database needs to operate entirely in memory. Being lean is important: Having a long execution path is anathema to a slow CPU. When the target has both limited memory and no persistent media, it puts a premium on both a small code size and highly efficient use of storage space.

Because edge devices often have resource constraints, every component of the technology stack has to be selected with this in mind. Another consideration for database systems is the choice of operating system. There are an abundance of embedded/real-time operating systems that are more appropriate for resource-constrained edge devices than Linux and Windows, which dominate the gateway and server markets. It is also still relatively common (too common, I would posit) for creators of edge devices to embrace the “roll your own” approach and write a rudimentary operating system or a simple round-robin scheduler. A database system that can operate in embedded/real-time and homegrown operating systems cannot make assumptions about available services such as memory management and interprocess communication.

At the other end of the spectrum, a database system in the cloud (again, I make no distinction between public and private clouds) has very different requirements. Whereas a database system in an edge device needs to fit within the constraints of the device, it also does not typically need to deal with a large volume of data or with complex analytics. In the cloud, however, the volume of data can be both large and fast. In other words, the server database system is receiving data from all the edge devices within that IoT system, either directly or, more likely, through gateways. Depending on the nature of the IoT system, edge devices and gateways can produce high volumes of data in short amounts of time. Therefore, cloud database systems as well as the overall system architecture must be designed for fast ingestion of data. Taking it a step further, however, ingesting data quickly into an empty database is easy. Ingesting data quickly into a database that is 10 or 100 terabytes is quite a different thing. Cloud database systems must be able to sustain performance while scaling.

There are two aspects of scaling: Vertical and horizontal. Vertical scaling is the ability of a database system to handle the growing size of a single physical database. Horizontal scaling is the ability of a database system to spread (distribute) a single logical database across multiple physical databases. Elastic scalability is ability to increase the number of physical databases (often called shards) in a logical database. The separation of logical and physical databases is paramount: The concept of a logical database isolates client applications from the physical topology. In other words, a client application should not be concerned with whether a logical database is physically implemented as 1, 10 or 20 shards, and whether it is 30 shards tomorrow as the database has scaled. In summary, a cloud database system should be able to scale vertically (handle the growth in size of a single database or shard) and horizontally (add shards to a logical database to maintain performance at scale).

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.