Page 108 - FULL REPORT 30012024

P. 108

Figure 4.26 The snippet code of the data cleaning.

Column names were homogenised, and extraneous information was
eliminated. The datasets were combined based on similar columns,

providing a coherent and integrated view of the data. Interpolation
was used within each nation to handle missing data in the 'Total Death'

measure. Rows with incomplete key measurements were then

eliminated in a systematic manner to guarantee the integrity of the
data. The last stage was integrating the aggregated dataset with

statistics on daily smoking prevalence, obtained from
OurWorldInData.org, without doing individual data cleansing. The

conclusion of these actions resulted in a cohesive, cleaned dataset,
later stored as "merged_data.csv," ready for the next step, which is

putting it into PowerBI for visualization. Figure 4.27 displays the

merged datasets inside the Microsoft Excel software.

Figure 4.27 The data of the merged dataset.
91

103 104 105 106 107 108 109 110 111 112 113