Page 35 - FULL REPORT 30012024
P. 35
2.3.2 Process in Big Data Analytics
The process of big data analytics involves several stages, which are described
as below.
i. Data Acquisition
Data from several sources, including structured, unstructured, and
semi-structured data forms, are gathered at this stage (H. Li, 2021). It
encompasses the process of gathering data from diverse sources such
as social media, websites, sensors, and databases.
ii. Data Storage
After data is collected, the data need to be stored in a way that can
easily access for analysis (H. Li, 2021). Traditional storage solutions
may not be able to handle the scale and complexity of the data due to
the huge amounts of data involved with big data. As a result, large
data settings frequently adopt distributed storage systems. Large
datasets may be stored and managed over a cluster of linked devices
using distributed storage systems, such as Hadoop Distributed File
System (HDFS) and cloud-based storage options.
iii. Data Management
In this phase, the data must be managed and organised to be accurate,
comprehensive, and current. To do this, data cleaning procedures may
be used to eliminate mistakes and discrepancies in the data (Rawat,
2021). This stage focuses on ensuring the accuracy, completeness,
and the data is up-to-date.
Data cleaning techniques play a significant role in achieving this
objective. Data cleaning involves identifying and rectifying errors,
inconsistencies, in the dataset. This process includes removing
duplicate records, handling missing values, correcting inaccurate
data, and standardizing formats and representations. By applying data
18