Page 195 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 195
176 Big Data Analytics for Connected Vehicles and Smart Cities Building a Data Lake 177
the years and brings these tools to bear on each project. Carpenters do not look
to a larger organization for these tools but expect to bring them as part of the
experience and knowledge that they have gained over the years. This approach
breaks down when specialist tools are required, and it may not be feasible to
own these on a personal basis. It also does not provide the basis for sharing the
cost of developing and implementing the tools.
The old English term bodger comes to mind. The term was first used to
describe skilled carpenters who would set up an impromptu wood shop near a
forest and, relying on their skills rather than specialist tools, create furniture and
other wooden objects. These days the term refers to anyone who creates objects
from a mishmash of found or improvised materials. While the skill required to
be a bodger is laudable, the whole approach does not lend itself to the time sav-
ings and quality improvements that can be achieved using specialist tools on a
shared basis. This is the essence of the conversion of data within an organization
from silos and cockpits to an enterprise-wide data horizon. This also suggests
that organizational and cultural change will be required to take full advantage
of centralized data repositories, or data lakes.
The benefits of economy of scale, the ability to apply specialist tools to
data management, and the conversion of data into information mitigates to-
ward the centralization of data. In an ideal situation, centralized data would
be used in combination with appropriate decentralized data, and the resulting
information would be distributed in an optimal pattern across the enterprise.
It would also be possible for anyone within the enterprise to view a catalog
of available data and to be able to explore the value of enterprise data to the
specific job function. In many situations, this is not the case, and data is kept
in a computer or server next to a desk that is not visible to other members
of the department or to other departments. The individual or team doesn’t
expect to rely on a central data repository or look to the enterprise to provide
for information needs. While this is very efficient from the myopic viewpoint
of the team or individual, it is inefficient in not supporting an enterprise- or
organization-wide view of data. We have learned over the years that data is best
used when it can be shared across multiple job functions. The old adage “put
data in once, use many times” still holds good. Taking a function-specific view
of data collection and management also prevents the achievement of coopera-
tion and economy of scale. Fragmentation also causes duplication and hardware
and software resources and provides a challenge with respect to configuration
management—the need to keep one single version of the truth with respect to
data for the organization.
Another issue with this conventional approach to data collection and
management is that it becomes very difficult to know what data has been col-
lected by the organization in its entirety. In the fragmented approach, it is high-
ly likely that data is collected many times and perhaps used once, if at all. As a