Page 195 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 195

176	       Big	Data	Analytics	for	Connected	Vehicles	and	Smart	Cities	                	                        Building a Data Lake	                    177


          the years and brings these tools to bear on each project. Carpenters do not look
          to a larger organization for these tools but expect to bring them as part of the
          experience and knowledge that they have gained over the years. This approach
          breaks down when specialist tools are required, and it may not be feasible to
          own these on a personal basis. It also does not provide the basis for sharing the
          cost of developing and implementing the tools.
               The old English term bodger comes to mind. The term was first used to
          describe skilled carpenters who would set up an impromptu wood shop near a
          forest and, relying on their skills rather than specialist tools, create furniture and
          other wooden objects. These days the term refers to anyone who creates objects
          from a mishmash of found or improvised materials. While the skill required to
          be a bodger is laudable, the whole approach does not lend itself to the time sav-
          ings and quality improvements that can be achieved using specialist tools on a
          shared basis. This is the essence of the conversion of data within an organization
          from silos and cockpits to an enterprise-wide data horizon. This also suggests
          that organizational and cultural change will be required to take full advantage
          of centralized data repositories, or data lakes.
               The benefits of economy of scale, the ability to apply specialist tools to
          data management, and the conversion of data into information mitigates to-
          ward the centralization of data. In an ideal situation, centralized data would
          be used in combination with appropriate decentralized data, and the resulting
          information would be distributed in an optimal pattern across the enterprise.
          It would also be possible for anyone within the enterprise to view a catalog
          of available data and to be able to explore the value of enterprise data to the
          specific job function. In many situations, this is not the case, and data is kept
          in a computer or server next to a desk that is not visible to other members
          of the department or to other departments. The individual or team doesn’t
          expect to rely on a central data repository or look to the enterprise to provide
          for information needs. While this is very efficient from the myopic viewpoint
          of the team or individual, it is inefficient in not supporting an enterprise- or
          organization-wide view of data. We have learned over the years that data is best
          used when it can be shared across multiple job functions. The old adage “put
          data in once, use many times” still holds good. Taking a function-specific view
          of data collection and management also prevents the achievement of coopera-
          tion and economy of scale. Fragmentation also causes duplication and hardware
          and software resources and provides a challenge with respect to configuration
          management—the need to keep one single version of the truth with respect to
          data for the organization.
               Another  issue  with  this  conventional  approach  to  data  collection  and
          management is that it becomes very difficult to know what data has been col-
          lected by the organization in its entirety. In the fragmented approach, it is high-
          ly likely that data is collected many times and perhaps used once, if at all. As a
   190   191   192   193   194   195   196   197   198   199   200