Page 108 - FULL REPORT 30012024
P. 108

Figure 4.26 The snippet code of the data cleaning.


                                       Column names were homogenised, and extraneous information was
                                       eliminated. The datasets were combined based on similar columns,

                                       providing a coherent and integrated view of the data. Interpolation
                                       was used within each nation to handle missing data in the 'Total Death'

                                       measure.  Rows  with  incomplete  key  measurements  were  then

                                       eliminated in a systematic manner to guarantee the integrity of the
                                       data.  The  last  stage  was  integrating  the  aggregated  dataset  with

                                       statistics   on   daily   smoking    prevalence,   obtained   from
                                       OurWorldInData.org, without doing individual data cleansing. The

                                       conclusion of these actions resulted in a cohesive, cleaned dataset,
                                       later stored as "merged_data.csv," ready for the next step, which is

                                       putting  it  into  PowerBI  for  visualization.  Figure  4.27  displays  the

                                       merged datasets inside the Microsoft Excel software.















                                                    Figure 4.27 The data of the merged dataset.
                                                               91
   103   104   105   106   107   108   109   110   111   112   113