Page 62 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 62

42	  Big	Data	Analytics	for	Connected	Vehicles	and	Smart	Cities	  	  What Is Big Data?	  43


                 • Search;
                 • Sharing;
                 • Storage;
                 • Transfer.


                 In describing these challenges from a transportation perspective, it is pos-
            sible to offend the data analysts and data scientists who will read this book,
            since a simplistic view has been adopted. However, no offense is intended. The
            focus is explaining the value to transportation rather than developing a techni-
            cal description of the subject. The objective is to provide an awareness of the
            challenges, to illustrate their nature and, to provide an overview of how they are
            addressed in data science.


            Complexity Analysis
            This is an emerging field in data analysis and data science that categorizes data
            according to its complexity. As data sets rapidly increase in scale, and process-
            ing becomes automatic, multiple systems can be connected together; this leads
            to increasing complexity. If this is left unmanaged, it can lead to unpredictable
            behavior within the system and difficulties in processing the data. A typical
            engineering approach would attempt to remove the complexity, but this runs
            counter to obtaining maximum value from big data. As discussed earlier in Sec-
            tion 3.5, the real value lies in the detail, so complexity cannot be avoided. Tools
            and techniques have been developed in the field of complexity analysis that en-
            able the understanding of complexity and the development of new approaches
            to modeling and controlling complexity in systems.

            Capture
            Relative to big data, data capture represents another challenge. While the trans-
            portation community is adept at capturing automated data from sensors and
            other roadside devices, the world of big data requires that multiple data sets be
            combined to give us the insights that we’re looking for. This means that unless
            the amount of resources we invest in data capture is expanded, automated so-
            lutions must be considered. Data capture includes the process of bringing the
            data back to a central repository and the work required to bring the data into
            the repository. In the data world this is referred to as extraction, transformation,
            and loading (ETL). If the multiple data sources include data from beyond the
            organization, then the data capture process will also include the establishment
            of some form of data-sharing agreement.
   57   58   59   60   61   62   63   64   65   66   67