Page 35 - FULL REPORT 30012024
        P. 35
     2.3.2  Process in Big Data Analytics
                                The process of big data analytics involves several stages, which are described
                                as below.
                                i.     Data Acquisition
                                       Data  from  several  sources,  including  structured,  unstructured,  and
                                       semi-structured data forms, are gathered at this stage (H. Li, 2021). It
                                       encompasses the process of gathering data from diverse sources such
                                       as social media, websites, sensors, and databases.
                                ii.    Data Storage
                                       After data is collected, the data need to be stored in a way that can
                                       easily access for analysis (H. Li, 2021). Traditional storage solutions
                                       may not be able to handle the scale and complexity of the data due to
                                       the huge amounts of data involved with big data. As a result, large
                                       data  settings  frequently  adopt  distributed  storage  systems.  Large
                                       datasets may be stored and managed over a cluster of linked devices
                                       using distributed storage systems, such as Hadoop Distributed File
                                       System (HDFS) and cloud-based storage options.
                                iii.   Data Management
                                       In this phase, the data must be managed and organised to be accurate,
                                       comprehensive, and current. To do this, data cleaning procedures may
                                       be used to eliminate mistakes and discrepancies in the data (Rawat,
                                       2021). This stage focuses on ensuring the accuracy, completeness,
                                       and the data is up-to-date.
                                       Data  cleaning  techniques  play  a  significant  role  in  achieving  this
                                       objective. Data cleaning involves identifying and rectifying errors,
                                       inconsistencies,  in  the  dataset.  This  process  includes  removing
                                       duplicate  records,  handling  missing  values,  correcting  inaccurate
                                       data, and standardizing formats and representations. By applying data
                                                               18





