Page 206 - Big Data Analytics for Connected Vehicles and Smart Cities
P. 206
186 Big Data Analytics for Connected Vehicles and Smart Cities Building a Data Lake 187
just launch the data lake, but to ensure that it does not degrade over time. This
also includes taking the necessary steps to monitor, manage, and ensure data
quality both entering the data lake and being maintained within the data lake.
Lack of Self-Service Capabilities and Long Development Times
Very often in technology-driven data lake implementations, the data lake is
developed as a tool that requires highly specialist operation. This leads to a situ-
ation in which end users cannot have direct access to the data or the analytics.
The absence of such self-service capabilities can produce a heavy workload on
a few members of staff, leading to long development times and slow responses
to end user needs.
Lack of Features to Motivate and Enable Smart City and Transportation Exponents
In a similar vein to the above challenge, an information technology–driven data
lake program can ignore the need to motivate and enable end users in smart city
and transportation contexts. This can lead to a lack of interest on the part of the
end users and a consequent inability to monetize the investment made in the
data lake. Early experience indicates that it is not sufficient to merely share data;
it is also necessary to provide models and illustrations that can motivate the end
users. This would include helping end users to understand the data lake and
to understand the analytics possibilities through the communication of model
analytics and the support of a dialogue on the development of custom analytics
for the end user’s specific job function.
9.8 An Approach to Building a Data Lake
Early experience in the creation of data lakes for multiple organization types in
both the public and private sector has revealed that there are a few pitfalls to
be avoided in the successful development of a data lake. Learning lessons from
this early experience, it is possible to put together a robust approach that mini-
mizes the chances of encountering these barriers while maximizing the chances
of success. To provide practical advice on the creation of a transportation data
lake, a data lake creation methodology has been identified and is defined in
subsequent sections of this chapter. The approach is based on the experiences
of a company called Think Big [4], and it has been evolved as a direct result of
experience gained in working with public- and private-sector clients. The origi-
nal approach methodology has been adapted based on experience with several
transportation clients to create an approach that is specifically adapted to the
needs of transportation and smart cities. Figure 9.3 presents an overview of the
approach methodology.