Page 202 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 202
182 Big Data Analytics for Connected Vehicles and Smart Cities Building a Data Lake 183
The Foundation for Large-Scale Proactive Analytics
The data lake is also the basis for further efforts related to large-scale proactive
analytics. While this will also require cultural and organizational change, the
existence of the data lake opens the way for the smart city organization to apply
large-scale analytics that will guide many aspects of planning and delivery for
smart city transportation. This enables the adoption of results-driven actions
and the establishment of scientific approaches to transportation service delivery,
based on observation, understanding of mechanisms, and data.
Steppingstone Toward Automation Through Predictive Analytics and Machine
Learning
There is considerable interest in activity in the concept of an automated vehicle,
and it would seem relevant to also consider how automation can be applied
to back-office processes in the smart city. While it may not be appropriate or
even desirable to leap toward an automated back office overnight, the establish-
ment of analytics and the development of ability to make predictions can form
the basis for the past toward automation. The availability of the data in the
data lake can also form the raw material for the support of machine learning
and deep learning techniques that support the stated development of artificial
intelligence in the smart city back office. It is likely that this will begin with
sophisticated decision support for the humans involved, with full automation a
possibility over the longer term.
Reduce Costs Due to Data Management Duplication and Processing Duplication
Adopting a fragmented approach to data collection storage and management
will inevitably lead to duplication. In fact, the cost of duplication may be buried
within the overall cost of operating and maintaining the current data collec-
tion, storage, and management system. The process of creating a data lake is
likely to shine a light on the volume of duplication and provide estimates of
the costs involved. Cost savings are likely to be identified in data collection, as
well as data storage and processing. Based on experience, the average transporta-
tion agency supports multiple redundancy with respect to data collection, with
considerable amount of ad hoc data collection for project-specific purposes. If
such data is not visible across the organization, then it is likely that other ad
hoc initiatives will collect the same or similar data. In some cases, awareness of
the data is insufficient, and an inability to access the data in a reasonable time
frame forces project specific data collection to go ahead even if duplication is
understood. Cost savings are also likely to be realized with respect to software
licenses. Multiple software licenses may have been procured to support a frag-
mented approach to data storage and management. As the data lake is created,
opportunities may be revealed to save money by consolidating software licenses.