Data scientist has been a trending title for analytics professionals the last couple of years — and it’s no surprise since a recent Glassdoor report found that the data science field offers not only a good work-life balance, but also hefty pay and lots of opportunities. This has not gone unnoticed within the industry. The annual Open Data Science Conference took place recently and featured a number of engaging panel discussions on where the data science industry is heading.
Data science is all the hype
The distinction between data science, artificial intelligence (AI), machine learning and deep learning has become blurred. One panel recognized that AI has quickly become today’s favored buzzword, meaning organizations are restructuring their infrastructures to account for the increase in data volume AI devices and machines will generate.
Yet all the hype sometimes gives way to interesting changes in the market. For instance, companies like Uptake in Chicago tout their ability to provide “disruptive transformation” to help more established businesses better deal with massive amounts of IoT data. This coupled with the help of high-end data science capabilities can yield insights that make a very real difference for those companies.
But companies still need to consider whether they should outsource the ability to generate such insights or if they should instead develop these capabilities on their own since no one knows their business and its needs better than themselves.
Coping with change
The technology market is filled with a history of examples of how markets cope with change brought on by innovation. As relational databases became the norm and the need for greater speed and flexibility placed higher demands on programmers, the advent of PowerBuilder and other “fourth-generation languages” in effect made more people “programmers” by abstracting some of the difficulty from the process.
This is the progression we are seeing with data scientists, but it in no way lessens the importance of the data scientist. It does, however, point to a number of technology innovations that both lighten the load as well as decrease the points of entry for some people to engage in this work.
Tools like Logi Analytics’ “data scientist in a box” are aimed at making the process easier. Moreover, certain accommodations with profound implications, such as the use of statistical models to generate high-value approximate answers, can help quickly gain insight into increasingly large and complex data. The time and resource requirements of data at scale makes this more and more difficult since the answer lies within the notion that many questions do not require an exact answer.
For this reason, innovative companies are looking to leverage statistical models over data to augment data lakes and provide for high value approximation — an approach that achieves fast, high-value answers resulting in actionable insight from data while relieving demands on network capacity.
Balancing volume is key
We have come to think that more data is better, and in some cases that makes sense. But, for example, simple edge processing that filters out the inconsequential messages from those that are consequential would likely reduce the daily number of messages in a 1,000-room building that has eight sensors in each room — taking a 24 kb reading once a second from 16.58 Tb down to 100 GB (less than 1%).
But not all data can be collapsed so easily. There will be instances where there are mountains of data from a variety of sources where insight can be gained though proper levels of exploration and correlation.
For instance, a fast-food restaurant owner might want to combine data coming from all different sources — e.g., the IoT-enabled fryer, cooler, lighting system, HVAC system and inventory, as well as city-supplied vehicle traffic data, etc. — to gain insights as to how to optimize his operations based on certain conditions.
Utilizing statistical models for high-value approximation has the ability to turn these multi-hour long-running queries using 100 nodes or more down to a 10 second, one node approximate query to gain the equivalent insight. This approach is certainly not for all use cases, but the data scientist in particular will benefit more and more over time with this approach tied to both machine learning as well as to basic exploratory analytics.
The tidal wave of internet of things, AI and machine-generated data streaming in from smart devices, sensors, monitors and meters (to name a few) is testing the capabilities of traditional database technologies as big data analytics continues to grow at a tremendous rate thanks to the IoT explosion. It is no wonder why the data scientist continues to be one of the most in-demand jobs as companies look to highly skilled individuals with an open mind, creative zest and ability to use different techniques to mine through data.
Emerging technologies will not make everyone a data scientist, but they will help make data scientists much more productive and create opportunities for some of this work to be shared by those now on the sidelines.
All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.