Page 197 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 197

178	       Big	Data	Analytics	for	Connected	Vehicles	and	Smart	Cities	                	                        Building a Data Lake	                    179


          mation. A robust approach to the creation of a data lake would accommodate
          such data streaming analysis as well as the analysis of archived or static data.



          9.5  How a Data Lake Works

          Another perspective on data lakes that provides good insight into their nature
          and characteristics is to consider how they operate. Figure 9.2 shows a generic
          data lake configuration that can be considered as a model for a smart city data
          lake.
               The major elements of the data lake are discussed in the following sections.


          Data sources
          One of the important aspects of a data lake lies in its ability to ingest data from
          multiple sources. Within smart city and transportation contexts, this would
          include infrastructure-based sensors such as traffic and passenger counters and
          probe data emanating from connected and autonomous vehicles. Data for the
          data lake could also be sourced from existing systems and databases such as those
          previously deployed for traveler information, traffic management, freight, and
          transit management. Data could also take the form of social media feeds such as
          twitter, image and video data from cameras, and other image processing–based
          sensors. A rich stream of data could also be sourced from smart phone apps
          operated by the public or private sector. In developing a roadway transportation
          data plan, the U.S. DOT [3] considers the following sources of data:


               • Infrastructure data: Roadway geometry, roadway inventory, intersection
                characteristics, and the state of system controls.
               • Travel data: Vehicle location, presence and speed within the system, in-
                ternal vehicle status, transit vehicle location, speed and status, passenger
                counts, and schedule adherence data. Fred vehicle location and position-
                ing with cross weight or data regarding the type and time critical nature
                of goods carried.
               • Climate data: Prevailing weather and pavement surface conditions col-
                lected from roadway weather information systems (RWISs).
               • Modal data: This includes border crossing data from U.S. customs and
                border protection regarding trucks, trains, containers, buses, personal
                vehicles, passengers, and pedestrians.

               • Travel behavior data: Travel behavior, changes in travel characteristics
                over time, travel behavior related to demographics, and the relationship
                of demographics and travel over time
   192   193   194   195   196   197   198   199   200   201   202