Page 109 - FULL REPORT 30012024
P. 109

ii.     Malaysia Stroke Mortality Dataset


                                          The  data  cleaning  procedure  for  the  Malaysia  stroke  mortality

                                          dataset,  acquired  from  the  eStatistik  database,  requires  a  full
                                          restructure  owing  to  the  original  condition  of  disarray  and

                                          inappropriate formatting. The dataset revealed a lack of order, with

                                          variables and records not adhering to a defined pattern suitable to
                                          analysis.  To  remedy  this,  the  dataset  underwent  a  rigorous

                                          rearrangement  inside  Microsoft  Excel.  Figure  4.28  displays  the
                                          basic dataset before the cleaning procedure.




















                                             Figure 4.28 Pre-Cleaned Malaysia Stroke Mortality Dataset Structure.

                                          Columns  were  designated  to  represent  each  pertinent  statistic,

                                          including year, number of stroke cases, and fatality rates. The rows
                                          were  arranged  in  a  manner  that  matched  each  unique  record,

                                          creating  a  consistent  grid  of  data  points.  The  data  inputs  were
                                          entered manually into the newly organised Excel sheets, which was

                                          an essential step to verify that each piece of data was accurately

                                          placed within the hierarchy of the dataset.


                                          The cleaning method primarily emphasised the reformatting and
                                          rearranging of the dataset. The process included standardising the

                                          headers to provide clarity and uniformity, which is crucial for any

                                          subsequent  analytical  undertaking.  The  cell  formatting  was
                                                               92
   104   105   106   107   108   109   110   111   112   113   114