Page 35 - FULL REPORT 30012024
P. 35

2.3.2  Process in Big Data Analytics




                                The process of big data analytics involves several stages, which are described
                                as below.


                                i.     Data Acquisition


                                       Data  from  several  sources,  including  structured,  unstructured,  and

                                       semi-structured data forms, are gathered at this stage (H. Li, 2021). It

                                       encompasses the process of gathering data from diverse sources such
                                       as social media, websites, sensors, and databases.


                                ii.    Data Storage


                                       After data is collected, the data need to be stored in a way that can

                                       easily access for analysis (H. Li, 2021). Traditional storage solutions

                                       may not be able to handle the scale and complexity of the data due to
                                       the huge amounts of data involved with big data. As a result, large

                                       data  settings  frequently  adopt  distributed  storage  systems.  Large
                                       datasets may be stored and managed over a cluster of linked devices

                                       using distributed storage systems, such as Hadoop Distributed File
                                       System (HDFS) and cloud-based storage options.



                                iii.   Data Management


                                       In this phase, the data must be managed and organised to be accurate,
                                       comprehensive, and current. To do this, data cleaning procedures may

                                       be used to eliminate mistakes and discrepancies in the data (Rawat,
                                       2021). This stage focuses on ensuring the accuracy, completeness,

                                       and the data is up-to-date.


                                       Data  cleaning  techniques  play  a  significant  role  in  achieving  this

                                       objective. Data cleaning involves identifying and rectifying errors,
                                       inconsistencies,  in  the  dataset.  This  process  includes  removing

                                       duplicate  records,  handling  missing  values,  correcting  inaccurate

                                       data, and standardizing formats and representations. By applying data
                                                               18
   30   31   32   33   34   35   36   37   38   39   40