Page 197 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 197
178 Big Data Analytics for Connected Vehicles and Smart Cities Building a Data Lake 179
mation. A robust approach to the creation of a data lake would accommodate
such data streaming analysis as well as the analysis of archived or static data.
9.5 How a Data Lake Works
Another perspective on data lakes that provides good insight into their nature
and characteristics is to consider how they operate. Figure 9.2 shows a generic
data lake configuration that can be considered as a model for a smart city data
lake.
The major elements of the data lake are discussed in the following sections.
Data sources
One of the important aspects of a data lake lies in its ability to ingest data from
multiple sources. Within smart city and transportation contexts, this would
include infrastructure-based sensors such as traffic and passenger counters and
probe data emanating from connected and autonomous vehicles. Data for the
data lake could also be sourced from existing systems and databases such as those
previously deployed for traveler information, traffic management, freight, and
transit management. Data could also take the form of social media feeds such as
twitter, image and video data from cameras, and other image processing–based
sensors. A rich stream of data could also be sourced from smart phone apps
operated by the public or private sector. In developing a roadway transportation
data plan, the U.S. DOT [3] considers the following sources of data:
• Infrastructure data: Roadway geometry, roadway inventory, intersection
characteristics, and the state of system controls.
• Travel data: Vehicle location, presence and speed within the system, in-
ternal vehicle status, transit vehicle location, speed and status, passenger
counts, and schedule adherence data. Fred vehicle location and position-
ing with cross weight or data regarding the type and time critical nature
of goods carried.
• Climate data: Prevailing weather and pavement surface conditions col-
lected from roadway weather information systems (RWISs).
• Modal data: This includes border crossing data from U.S. customs and
border protection regarding trucks, trains, containers, buses, personal
vehicles, passengers, and pedestrians.
• Travel behavior data: Travel behavior, changes in travel characteristics
over time, travel behavior related to demographics, and the relationship
of demographics and travel over time