Page 53 - Big Data Analytics for Connected Vehicles and Smart Cities

P. 53

34 Big Data Analytics for Connected Vehicles and Smart Cities What Is Big Data? 35

data sets because these are richer sources of insights and understanding. A large
data set allows an enterprise-wide or organization-wide view that can yield more
information than a series of silos or smaller data sets.
Big data can be considered to be an evolution of data science with some
aspects that are new and some are not. For example, most potential transporta-
tion big data applications address safety, efficiency, and enhanced user experi-
ence. These are issues that the transportation profession has been addressing for
a number of years. Aspects that are new include exponential growth in data sizes
and new availability of data—both structured and unstructured. This combines
with rapid acceleration in many dimensions (volume, velocity, variety, variabil-
ity, and complexity).
Other new aspects featured by big data include the following:

• Analytics: The ability to conduct graph and path analytics, and analytics
on new, nonrelational data types coupled with existing relational data.

• Tools: New tools that can help to uncover insights from data such as text
in accident reports or patterns in visuals, to quickly find the signal in
the noise.
• Economics: New capabilities with reduced cost mean that data can be
retained. It is not necessary to throw away signal timings, speed, flow,
and occupancy data. By leveraging new techniques, it is possible to apply
the appropriate storage mechanism in terms of cost and performance to
the appropriate data set. This also enables appropriate access to the dif-
ferent data types.

• Architecture: The emergence of a hybrid ecosystem that allows both old
and new tools to work together within a single framework to enable
rapid discovery analytics on new data.

My first exposure to the term big data in 2011 sparked an interest in how
long the term had been in use. Subsequent research on the origin of the term
uncovered that it can be attributed to one of two people (according to the New
York Times [4]). Anecdotal evidence suggests that it was first introduced by John
Massey from Silicon Graphics in the mid 1990s. He wanted to use a single term
to describe a range of issues in data storage and data management. The other
possible author of the term is John Diebold of the University of Pennsylvania,
who first used the term in association with macroeconomics in his paper Big
Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting,
which was first presented in 2000 and published in 2003. Today the term has
to come to represent not just volume of data but also a range of dimensions,
listed as follows:

48 49 50 51 52 53 54 55 56 57 58