I’ve been intrigued of late by the challenge of managing the big data needs of autonomous cars. The driverless car has been a tech dream for decades, and now that broadband connectivity, cloud computing and artificial intelligence are increasingly available, I believe we’ll see autonomous cars going mainstream in the near future, provided certain technical and regulatory milestone are reached.
Which brings me to the problem of big data and self-driving cars. Autonomous cars generate a staggering amount of data; Intel CEO Brian Krzanich estimated one car generates 4 terabytes of data in eight hours of operation. Multiple image, radar/lidar, time-of-flight, accelerometer, telemetry and gyroscope sensors generate data streams that must be analyzed in order to perform the calculations and adjustments required to safely navigate a car. That analysis needs to happen in real-time if the car is to keep up with constantly changing driving conditions (other cars or pedestrians moving around the vehicle, changing weather, traffic signs, etc.). These real-time performance requirements mean there’s no time to upload data to a central server, conduct the necessary analytics and then send instructions back to the car for execution. Data that is critical to safely navigate the car must be analyzed locally by the car itself (essentially the car is an edge device in a cloud network). Not only does the car need to analyze data on its own, it must also learn to pick and choose between different data streams to identify the ones best suited for analysis at any given moment to keep the car driving safely.
That last requirement, the need to determine what data is required to perform an analysis, is tricky. While predefined filters can help a car’s machine learning routines learn what data to use and when to use it, those filters can’t be updated in real-time. Accordingly, an autonomous car will need to run machine learning and analytics engines powerful enough to recognize mission critical data requiring immediate analysis and action.
We need analytics and machine learning algorithms for autonomous cars that can:
- Identify data in all formats.
- Recognize what data is required for mission critical operations and perform analysis of that data locally.
- Compress or aggregate non-critical data for uploading to the cloud for future use
- Schedule uploads of non-critical data from the car to the cloud when less expensive communications are available (for example, when the car is parked overnight at home and can access the owner’s Wi-Fi instead of a metered cellular network).
- Know how to call for legacy data from the cloud so the AI can use it for future analytics.
The last bullet is particularly important. An autonomous car manufacturer will be responsible for storing vast amounts of data generated by cars operating around the world, and much of that data will likely have no real value when initially captured. However, that data’s value may be revealed in the future as the manufacturer’s autonomous driving applications evolve and improve. Today’s non-critical data can be useful for future applications, provided the data is properly stored and catalogued so it can be easily found. Without careful cataloguing of data as it’s captured, autonomous car vendors run the risk of creating a “dark data” problem. Dark data is the term used to describe data assets an organization collects, but fails to take advantage of because they don’t know how to or have forgotten they have it. I believe this will be a particularly significant problem for self-driving cars because of the sheer volume of data they generate. And as we see more vendors enter the autonomous driving market, the ones that will ultimately win out over others will be those vendors best prepared to analyze data at the local level and have catalogued their databases properly so future autonomous applications can find the legacy data they need, when they need it.
If you are new to HPCC Systems, I encourage you to check out the website to learn more about how HPCC Systems is an ideal platform for big data needs, including those required for autonomous car use cases.