The IoT: Lets Get Ready to Wrangle!
posted on
Jul 20, 2016 10:55PM
I started by writing about The Five Things That We Never Talk About When We Talk About Sensors And Sensor Data as a reality check on some of the more breathless commentaries about the Internet of Things (IoT). Now, on to the third thing on that list.
The data produced by sensors comes in many shapes and sizes – but at its simplest, it typically takes the form of a series of time-stamped measurements (“time-series” data). And not all of these data points are created equal.
For some applications, some of the time, detecting the important data points is relatively straightforward. For example, we may need to understand when the data from a particular device exceeds a certain threshold value.
In other scenarios, the “signal” that we care about may be subtler. In one case I remember the pattern of readings “low – medium – low” turned out to be an important predictor. The significance of that apparently innocuous pattern was not anticipated by anyone involved with the project at its outset.
And so the third thing about sensor data is this: extracting useful signal from noisy time-series data takes a lot of work – and oftentimes, extradata.
Did you know the first sensor data analytic applications are over a decade old? And studying them can tell us a lot about both the challenges and the opportunities associated with the next generation of sensor data analytic applications.
Expensive – and often safety critical – systems like aero engines, trains and electricity distribution networks have already long been instrumented. And at Teradata, we’ve used the data produced by those sensors to build preventative maintenance solutions for multiple different customers, across multiple geographies and in multiple industries.
We’ve helped the U.S. Air Force increase the readiness of its helicopter fleet by between 5 and 8 percent. We’ve helped Union Pacific Railroadcut wagon bearing-related derailments by 75 percent. We help one of the leading manufacturers of paper mills avoid the Mother-of-all paper-jams in its mile-long plants. We help keep the lights on by predicting failures in the electricity distribution network before they occur. And we help a European train operator figure out when trains will fail up to 36 hours before they actually do. More on that in a moment.
The interesting thing about these different projects is that, despite their apparent diversity, from 50,000 feet they actually look remarkably similar. Certainly, these projects depend on the availability of relevant sensor data – temperature, pressure, vibration, and lots of other measurements.
But, typically, we first need to pre-process the raw sensor data to identify important changes in state – “events” – which requires time-series analytics.
Then we need to label the sensor data in order to “train” a supervised predictive model, which often requires that we use text analytics to extract fault and resolution details from engineering reports – and may additionally require us to supplement the sensor data with operations data.
When we have labelled event data, understanding the sequence of events that leads to a particular condition (the “path to failure”) is normally important – which requires path analytics.
And because understanding associations and relationships – between different events and different components – is often vital, we typically also need to apply graph and affinity analytics to understand which variables are likely to be predictive of the target that we are trying to model. And so on.
The predictive models that underpin many analytic IoT projects are often based on relatively simple analytic techniques, like logistic regression, decision trees and random forests. But the process of creating a useful analytic dataset – the fuel that we feed those methods – from raw sensor data is often anything butsimple.
This is the second part of a four-part series demystifying how sensor data and the Internet of Things should be managed and interpreted. Read Part 1 here.
Ready to change your perspective on IoT? Take a look at “Changing the Game with IoT” as seen in the Summer 2016 issue of MIT Sloan Management Review.