Data management for driverless cars

The complexity of gathering, processing, and storing vast amounts of data for autonomous vehicles mirrors the sophisticated data used in building cars.

September 22, 2015

The freedom from driving oneself has been a cornerstone of luxury from the sedan chair to the limousine. Fusing such luxury with the freedom of the automobile and offering it to a mass market is a dream that requires little elaboration, but is one that eluded the labor-saving advancements and ambitions of the 20th century. While driving is almost second nature to human beings and can be taught to any adolescent with relative ease, the computing power and information management challenges to driving are remarkably complex. So far it has been difficult to match the spatial awareness and decision-making capacity afforded by millions of years of evolution. For example, how can a computer be taught the difference between a lamppost and a pedestrian?

Engineers have been working on the problem, and a few truly viable systems are beginning to emerge, most notably, the system developed by Google. Google’s cars have driven themselves hundreds of thousands of miles on public roads without great incident or notice. At the same time, automobile manufacturing is entering an era of increased automation and reliance on data from diverse sources around design, simulation, and testing.

As driverless cars and manufacturing become more complex in their integration of diverse data sets, a data management strategy that can serve these needs will become an increasingly integral part of future advancements.

Human element of automation

The auto industry is no stranger to dealing with large and diverse datasets, and there is some confluence between conventional auto design and manufacturing and the systems that will eventually support widespread adoption of self-driving vehicles. Like most manufacturing, automobile production is highly automated, and the industry is accustomed to working with datasets derived from simulation of performance and other parameters such as component testing, design, and assembly coordination. For engineers, managing these datasets can be a major challenge because the approach to storing data is often defined more by the underlying information technology (IT) infrastructure than it is by the needs of engineers tasked with working with the data. For example, if test data on a component diverges from expectations projected in simulations, access to that original simulation data becomes imperative, yet access is often constrained.

The knowledge of the location of datasets often resides in the memories of the engineers themselves. This knowledge inevitably becomes fragmented throughout time as updates to the IT infrastructure disrupt pathnames, individuals leave the organization, and human memories fail. As the industry begins to address these challenges in manufacturing, it is simultaneously integrating technologies such as crash avoidance and predictive maintenance, that will continue to collect, manage, and compute sensor data while the vehicle is on the road. Should these data sets need to be collected and analyzed en masse in the future – in an effort to improve safety, maintenance, and design – these challenges will face automakers long after design, manufacture, and testing is complete.

Driverless car sophistication

University of Michigan pushes driverless cars

While Google and a handful of automakers are developing and testing autonomous vehicles in California, Detroit is also making a major push for the promising technology. Major automakers, suppliers, and technology companies have teamed up with the University of Michigan (U-M) to open Mcity, the world’s first controlled environment designed to test the potential of connected and automated vehicle technologies leading the way to mass-market driverless cars.

“We believe that this transformation to connected and automated mobility will be a game changer for safety, for efficiency, for energy, and for accessibility,” says Peter Sweatman, director of the U-M Mobility Transformation Center (MTC). “These technologies truly open the door to 21st century mobility.”

Designed for testing new technologies, Mcity is a 32-acre simulated urban and suburban environment that includes roads with intersections, traffic signs and signals, streetlights, building facades, sidewalks, and construction obstacles.

Mcity allows researchers to simulate the environments where connected and automated vehicles will be most challenged. Even seemingly minor details a vehicle might encounter in urban and suburban settings have been incorporated into Mcity, such as road signs defaced by graffiti and faded lane markings.

The types of technologies that will be tested at the facility include connected technologies – vehicles talking to other vehicles or to the infrastructure, commonly known as V2V or V2I – and various levels of automation all the way to fully autonomous, driverless vehicles. A key MTC goal is to put a shared network of connected, automated vehicles on the road in Ann Arbor, Michigan, by 2021.

MTC is working closely with 15 Leadership Circle member companies, each investing $1 million spanning three years, and engaging in thought leadership. Thirty-three affiliate members are also contributing, and investing $150,000 across three years. Current Leadership Circle companies are:

  • Delphi Automotive
  • Denso Corp.
  • Econolite Group Inc.
  • Ford Motor Co.
  • General Motors Co.
  • Honda Motor Co.
  • Iteris Inc.
  • Navistar Inc.
  • Nissan Motor Co.
  • Qualcomm Technologies Inc.
  • Robert Bosch LLC
  • State Farm
  • Toyota Motor Corp.
  • Verizon Communications Inc.
  • Xerox Corp.

The data management challenges of the most advanced conventional automobile manufacturing are mirrored in the design and operation of driverless vehicles, but at a different magnitude. While manufacturing demands management of a range of simulation and test data, the operation of Google’s driverless car integrates several types of sensor data with pre-existing environmental data demanded in real time.

Google’s driverless car collects a GPS coordinate 1.5 million measurements per second from a 64-beam laser, four radars gage distance to objects in the car’s immediate vicinity, and velocity and rotation information comes from the wheels and chassis. These streams of real-time data are analyzed in conjunction with high-resolution maps and models of the environment derived from satellite imaging, which themselves require periodic updating.

Aggregation of these different types of environmental data affords driverless cars a reliable picture of their environment, and poses a potential challenge to data management. The constantly shifting pathnames for files associated with the hardware refreshes and manual selection of storage locations associated with traditional data management methods create inefficiency in engineering processes. However, the potential to cause problems of safety and efficiency when driverless cars move beyond the closely monitored experimental stages of development is even greater.

Driverless car networks

In addition to the challenges to data access, driverless cars rely on disparately collected and stored datasets. Eventual widespread adoption of driverless cars will almost certainly involve inter-vehicle data sharing and communication that will make ease of data access even more critical.

Certain aspects of driving (distinguishing between that lamppost and pedestrian) are natural to humans, but quite hard for computers. In other areas, computers have a clear advantage. With a network of driverless cars on the road, each producing their own streams of sensor data, the potential exists for central coordination and optimization of road capacity, as well as predictive maintenance. However, with the increase in data sources and data users comes additional challenges to access. The cost and complexity of upgrading the hardware on which data is stored in this context would be massive with a conventional approach to data management.

Effective data management

Avoiding these barriers to data access will require use of a data management system that can insulate data users from changes to hardware. These IT upgrades and refreshes are necessary, but need not interfere with data access. A successful approach to managing this data will include virtualization that can separate the hardware on which data is stored from the system that keeps track of where it is stored, so that pathnames remain unchanged and storage can be scaled up as needed. Abstraction offers the possibility of a unified namespace for storing critical and diverse datasets together, which makes aggregation simpler.

In the short run, the ability to scale storage capacity will allow such a network to respond nimbly to changes in demand without interrupting the availability of critical files. Scalability is valuable not only in raw capacity, but also across different storage mediums.

Real-time access will require fast and relatively expensive storage media such as flash storage. With virtualization those data can be automatically moved to less-expensive media, such as disk, as their value shifts from offering information for real-time decision-making to safety and design analysis months, years, or decades later. Software design, safety, and maintenance improvements are well-served by a system where data on vehicle behavior and safety remain accessible in the long run.

If the vision of a vast network of driverless cars becomes a reality, such a system will be widespread and likely in use for decades. So the way data is stored must be fundamentally separated from the sphere of hardware. In the nearer term, auto manufacturers will find that abstraction and virtualization are indispensable for the design and manufacture process as well.

As self-driving technologies such as crash avoidance become more integrated into conventional vehicles, the potential for making sensor data from vehicles on the road readily available to the design and manufacture process offers significant near-term benefits to a system that may, eventually, become the backbone of driving.


Peaxy Inc.


About the author: Manuel Terranova is president and CEO of data management company Peaxy Inc. He can be reached at