In-memory HTAP computing: Create smart IIoT applications with continuous learning

GridGain Systems

To achieve maximum ROI, many IIoT use cases involve decision models that must update in real time. For example, a fleet of package-delivery drones must be able to respond immediately to major and previously not experienced changes in weather conditions requiring new flight patterns, or to a new and unexpected increase in failure rate for a motor component. Creating a continuous learning system that can learn from changes in the system and adjust in real time requires unprecedented levels of computing performance.

Gartner has defined an architecture for achieving this: in-process hybrid transactional/analytical processing (HTAP) architectures, or in-process HTAP. For a continuous learning system, in-process HTAP requires a continuous learning framework capable of continuously updating machine learning or deep learning models as new data enters the system. In-memory computing is the most cost-effective way to power this continuous learning framework.

Consider the following IIoT continuous learning use cases that could be enabled by an affordable continuous learning system:

Long-haul trucks — A fleet of long-haul trucks might be frequently impacted by major and previously not experienced changes in weather, or by changes to the road network due to new road openings, road closings due to construction or major new changes to road conditions. A continuous learning system could incorporate the impacts these changes have on current trucking results and then create an updated model that can suggest more optimal routes based on required arrival times, fuel costs, truck availability and more in real time.
Mobile payments — 24-hour mobile access and payment systems have increased the potential for payment fraud. Fraud detection requires a machine learning model that can incorporate new fraud vectors into the model in real time. A continuous learning system could update this model in real time with the very latest payments and fraud data to detect emerging fraud strategies and prevent them from spreading.
Smart cities — One of the foundations of what we think of as a smart city is the ability of self-driving cars to dramatically reduce traffic congestion. To accomplish this, a machine learning or deep learning model must analyze data from multiple sources — including traffic cams, weather stations, police reports, event calendars and more — to provide guidance to self-driving cars. When major changes occur in the system, such as new roads opening, major new construction projects launching or major shifts in traffic patterns, a continuous learning system can quickly incorporate the impact of these changes into its model and begin immediately providing better guidance to self-driving vehicles.

In-process HTAP is essential for these continuous learning systems because it eliminates the inherent delay architected into the traditional model. Most organizations today have deployed separate transactional databases (OLTP) and analytical databases (OLAP). An extract, transform, load (ETL) process is used to periodically move the transactional data into the analytical database, introducing a time delay that prevents real-time model training. In-process HTAP combines the OLTP and OLAP capabilities into a single data store, eliminating the ETL delay.

Of course, the OLTP-ETL-OLAP model was originally established for a good reason. Attempting to analyze live transactional data could seriously impact operational performance. So the question becomes: How do you implement in-process HTAP without potentially impacting performance?

The answer is in-memory computing.

Achieving in-process HTAP with in-memory computing

Several in-memory computing technologies are required to achieve in-process HTAP and continuous learning systems for IIoT use cases:

In-memory data grid or in-memory database — An in-memory data grid, deployed on a cluster of servers which can be on-premises, in the cloud or a hybrid environment, can use the cluster’s entire available memory and CPU power for in-memory processing. An in-memory data grid is easily deployed between the data and application layers of existing applications — there’s no need to rip-and-replace the existing database — and the cluster can be scaled out simply by adding new nodes to the cluster. By contrast, an in-memory database stores data in memory and provides RDBMS capabilities, including support for SQL and ACID transactions. However, an in-memory database requires ripping and replacing the entire data layer for existing applications, typically making it the appropriate option primarily when building new applications or undertaking a major re-architecting of an existing application.
Streaming analytics — A streaming analytics engine takes advantage of in-memory computing speed to manage the complexity around dataflow and event processing. This is critical to enabling users to query active data without impacting transactional performance. Support for machine learning tools, such as Apache Spark, may also be included.
Continuous learning framework — The continuous learning framework is based on machine learning and deep learning libraries that have been optimized for massively parallel processing. These optimized libraries — fully distributed and residing in memory — enable the system to parallel process machine learning or deep learning algorithms against the data residing in memory on each node of the in-memory computing cluster. This enables the machine learning or deep learning model to continuously incorporate new data, even at petabyte scale, without degrading performance.
Memory-centric architecture — A memory-centric architecture is vital for the scale of IIoT use cases described above. A memory-centric architecture supports keeping the entire, fully operational data set on disk with only a user-defined subset of the data maintained in memory. This design provides organizations with the flexibility to determine their own optimal tradeoff between infrastructure costs and application performance. In production environments, the memory-centric architecture supports fast recovery following a restart. Since the data on disk can be accessed immediately, there is no need to wait for all data to be loaded into memory before processing begins. In-memory-centric architectures can be built using a distributed ACID and ANSI-99 SQL-compliant disk store deployed on spinning disks, solid state drives, Flash, 3D XPoint or other storage-class memory technologies.

Successful IIoT applications across a range of industries will depend on being able to achieve in-process HTAP — that is, real-time learning based on live data — at petabyte scale. Thanks to the decline of memory costs, mature open source in-memory computing platforms will likely remain the most cost-effective path to developing this continuous learning capability for the foreseeable future.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.