Page 109 - FULL REPORT 30012024
P. 109
ii. Malaysia Stroke Mortality Dataset
The data cleaning procedure for the Malaysia stroke mortality
dataset, acquired from the eStatistik database, requires a full
restructure owing to the original condition of disarray and
inappropriate formatting. The dataset revealed a lack of order, with
variables and records not adhering to a defined pattern suitable to
analysis. To remedy this, the dataset underwent a rigorous
rearrangement inside Microsoft Excel. Figure 4.28 displays the
basic dataset before the cleaning procedure.
Figure 4.28 Pre-Cleaned Malaysia Stroke Mortality Dataset Structure.
Columns were designated to represent each pertinent statistic,
including year, number of stroke cases, and fatality rates. The rows
were arranged in a manner that matched each unique record,
creating a consistent grid of data points. The data inputs were
entered manually into the newly organised Excel sheets, which was
an essential step to verify that each piece of data was accurately
placed within the hierarchy of the dataset.
The cleaning method primarily emphasised the reformatting and
rearranging of the dataset. The process included standardising the
headers to provide clarity and uniformity, which is crucial for any
subsequent analytical undertaking. The cell formatting was
92