Big data is a bit of a misnomer. Certainly, the volume of information coming from the Web, modern call centers and other data sources can be enormous. But the main benefit of all that data isn't in its size. It's not even in the business insights you can get by analyzing individual data sets in search of interesting patterns and relationships. To get true business intelligence from big data analytics applications, user organizations and BI and analytics vendors alike must focus on integrating and analyzing a broad mix of information -- in short, wide data.
Future business success lies in ensuring that the data in both big data and mainstream enterprise systems can be analyzed in a coherent and coordinated fashion. Numerous vendors are working on one possible means of doing so: products that provide SQL access to Hadoop repositories and NoSQL databases. The direction they're taking, particularly with SQL-on-Hadoop technologies, matters because far more people know SQL than know Hadoop.
Hadoop is a powerful technology for managing large amounts of unstructured data, but it's not so great for quickly running analytics applications, especially ones combining structured and unstructured data. Conversely, SQL has a long and successful history of enabling heterogeneous data sources to be accessed with almost identical calls. And the business analysts who do most of the work to provide analytics to business managers and the CxO suite typically are well versed in using SQL.
In addition, most users want evolutionary advances in technology, not revolutionary ones. That means intelligently incorporating the latest technologies into existing IT ecosystems to gain new business value as quickly and as smoothly as possible. The result: Information from Hadoop clusters, NoSQL systems and other new data sources gets joined with data from relational databases and data warehouses to build a more complete picture of customers, market trends and business operations. For example, customer sentiment data that can be gleaned from social networks and the Web is potentially valuable -- but its full potential won't be realized if it's compartmentalized away from data on customer leads and other marketing information.
Don't ignore what sensors have to say
The Internet of Things (IoT) also needs be taken into account. Sensors and other tracking devices are proliferating in products and industrial equipment, and they can send the operational data they capture back to corporate systems via the Internet. But many people have a mind-set that the IoT solely provides better command and control of machinery, as in remote sensors monitoring oil pipelines or gathering maintenance-related information from trucks, tractors and other vehicles.
Although such uses are important, even bigger issues are at stake. Looking for trends in massive amounts of sensor data can help users better identify and understand quality control issues, geographical differences in equipment performance and other critical factors for long-term planning. The information generated by the IoT is structured and, over time, likely will dwarf the data collected from the Web. Once again, a narrow focus on unstructured data will lead to organizations missing out on a valuable form of business information.
Architectural flexibility is called for, too. One reason data warehouses have never reached their theoretical potential is that leveraging real-time data is difficult when you have to massage the information and dump it into star schemas. At the same time, there's no need to manage either historical data or geospatial and other slowly changing data with the same urgency as real-time data demands. A smart BI and analytics platform will be able to handle both real-time and longer-latency data appropriately, combining data warehouses and big data systems as needed and using in-memory processing when appropriate.
Big data flow needs to be governed
The bigger picture also involves how much data flows from source to viewer -- and how it gets there. The theorists who defined enterprise data warehousing shuddered at the growth of departmental and divisional data marts, decrying the lack of a "single version of the truth" and the difficulty of ensuring appropriate data governance. Ahhh, those were the days!
Now mobile devices and self-service BI tools have rapidly changed how widely information is delivered. When data reaches a smartphone, it's hard to control where it goes from there. Are only approved people seeing the information? Where's the audit trail? Effective BI and big data management is a matter not only of collecting and processing information, but also of governing the use of very diverse data sets by very dispersed business users.
Yes, data volume is a technical concern, but the real problem is dealing with wide data -- pulling it together from a variety of sources, processing it and making it available to a large audience of end users for analysis and decision making. To support wide data environments, vendors need to concentrate on these key capabilities:
- Providing access to both structured and unstructured data types, and integration between them
- Enabling different data sets to be managed in different ways, according to latency requirements
- Supporting strong data governance models
The next generation of BI and analytics technologies must address the fact that the breadth and complexity of the data flowing into corporate systems is more important than the data volume is. The big data era isn't just about bigness. Width matters, too, and BI and analytics managers should make sure the vendors they work with understand that fact.
About the author:
David A. Teich is principal consultant at Teich Communications, a technology consulting and marketing services company. Teich has more than three decades of experience in the business of technology. Email him at [email protected].
See what Keith B. Carter thinks are the biggest big data management mistakes companies make
Get tips from Claudia Imhoff and Colin White on extending a BI architecture to meet new analytics needs
Read advice from consultant Rick van der Lans on choosing SQL-on-Hadoop tools
Learn why simplicity is key for effective big data visualization