The volume of highly sensitive personal and IP data is growing exponentially with the rapid adoption of the internet of things. In a recent survey of enterprise IT development and architecture professionals by Database Trends and Applications, 44% of respondents report adoption of IoT, ranging from proof-of-concept stage, to use in one or more lines of business, to IoT being “part of our ongoing business strategy.”1
The IoT trend in turn is a major driver of the exploding growth of the Hadoop data lake where most IoT data lands. According to a TDWI report2 on a survey of 252 enterprise respondents worldwide, 53% have deployed a data lake on Hadoop and 24% have deployed on Hadoop in combination with a relational database management system. Top use cases include advanced analytics (data mining, statistics, complex SQL, machine learning), and data exploration and discovery. While the data lake is becoming more common, barriers to adoption include lack of security for Hadoop, lack of governance and risks of breach and data privacy compliance posed by exposure of personal data in analytics.
On first glance, the requirement to protect data privacy might seem in conflict with objectives to enable big data analytics that could increase data exposure risk, which often involve digging into user behavior, customer transactions, detailed consumer demographics and processing in untrusted environments such as Hadoop. Data privacy regulations mandate specific guidelines on the classes of data to be protected including personal data, protected health information and financial data. IoT sensor data, geolocation codes, vehicle identification numbers (VINs) and IP addresses, along with many other data elements, qualify as sensitive personal data under the General Data Protection Regulation (GDPR).
GDPR: A game changer for usable protected data
The GDPR establishes the most stringent regulations to date to protect EU citizens and residents from privacy and data breaches. Multinational firms around the world, whether they have operations in the EU or not, are realizing that they process EU personal data and this regulation therefore applies to them. The GDPR recommends pseudonymization and encryption as two mechanisms that can be used to protect personal data, but it must support two requirements: 1) the ability to decrypt the data when necessary, and 2) the ability to continue to run business processes on the encrypted data.
Format-preserving encryption (FPE), an innovation pioneered by HPE to protect data while maintaining its structure and context for application usability and which persists with the data, is a trustworthy and comprehensive data-centric approach to address the risk of inappropriate data exposure to users and applications. FPE is able to protect data independent of the underlying platforms that rely on a “system-centric” security controls approach which doesn’t extend or scale outside of that IT system. To the point where FPE enables analytics in the data lake, while at the same time, data privacy is maintained for compliance with the GDPR.
Case in point: A top automotive manufacturer
To address data privacy compliance for its customers, while enabling safe analytics on IoT-generated data in its Hadoop data lake, a major auto manufacturer is using FPE at a field level to protect in-car sensor data, VINs and geolocation data streaming from customers’ cars. The data is used for multiple purposes, including vehicle quality control. Engineers look at sensor data to identify potential problems in specific components or groups of vehicles, while data scientists run thousands of reports against vehicle data for internal research purposes. The company’s volumes of real-time data are predicted to grow to around 20 petabytes within just a couple of years. Data is protected by FPE prior to ingestion into the data lake (Hadoop and Teradata EDW). With FPE, this leading auto manufacturer is enabling analytics on vast amounts of data in its protected form, thus safely providing broader access for analytics, not only to its data scientists, but also to engineers, developers and other employees as BI objectives dictate.
The benefits of using the field-level encryption technology deployed by this manufacturer include:
- Referential integrity, with encrypted data which retains its characteristics such as length and data type, requiring no changes to applications and systems for use;
- The ability to perform almost all analytics on encrypted data with no requirements to re-identify data to its original form, mitigating exposure of personal data and breach trigger notification requirements; and as a result,
- Enabling compliance with multiple data privacy regulations, including GDPR, but also within other systems and platforms.
All of this is achieved with a single enterprise-grade, scalable platform to protect sensitive personal and IoT data not only in the Hadoop data lake, but also across other systems and platforms.
The best of both words with usable security
The need to comply with data privacy regulations worldwide is driving organizations to adopt FPE to protect customer personal data at the field level, using a data-centric approach so that analytics can be performed on the data in its protected form, with context maintained, in order to extract value from the data in the form of analytic insights. Recent advances in FPE enable enterprises to deploy highly scalable data protection for environments such as the Hadoop data lake, as well as their other vulnerable systems and applications deployed across cloud. This technology provides an organization with a template to roll out data protection across other applications, platforms and systems, enabling a framework that adapts to rapidly hybrid IT environments.
1 “Internet of Things Market Survey” by John O’Brien, CEO Radiant Advisors, with Database Trends and Applications
2 “Data Lakes: Purposes, Practices, Patterns, and Platforms” by Philip Russom, Senior Research Director for Data Management, TDWI, The Data Warehousing Institute
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.