Page 62 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 62
42 Big Data Analytics for Connected Vehicles and Smart Cities What Is Big Data? 43
• Search;
• Sharing;
• Storage;
• Transfer.
In describing these challenges from a transportation perspective, it is pos-
sible to offend the data analysts and data scientists who will read this book,
since a simplistic view has been adopted. However, no offense is intended. The
focus is explaining the value to transportation rather than developing a techni-
cal description of the subject. The objective is to provide an awareness of the
challenges, to illustrate their nature and, to provide an overview of how they are
addressed in data science.
Complexity Analysis
This is an emerging field in data analysis and data science that categorizes data
according to its complexity. As data sets rapidly increase in scale, and process-
ing becomes automatic, multiple systems can be connected together; this leads
to increasing complexity. If this is left unmanaged, it can lead to unpredictable
behavior within the system and difficulties in processing the data. A typical
engineering approach would attempt to remove the complexity, but this runs
counter to obtaining maximum value from big data. As discussed earlier in Sec-
tion 3.5, the real value lies in the detail, so complexity cannot be avoided. Tools
and techniques have been developed in the field of complexity analysis that en-
able the understanding of complexity and the development of new approaches
to modeling and controlling complexity in systems.
Capture
Relative to big data, data capture represents another challenge. While the trans-
portation community is adept at capturing automated data from sensors and
other roadside devices, the world of big data requires that multiple data sets be
combined to give us the insights that we’re looking for. This means that unless
the amount of resources we invest in data capture is expanded, automated so-
lutions must be considered. Data capture includes the process of bringing the
data back to a central repository and the work required to bring the data into
the repository. In the data world this is referred to as extraction, transformation,
and loading (ETL). If the multiple data sources include data from beyond the
organization, then the data capture process will also include the establishment
of some form of data-sharing agreement.