Page 111 - FULL REPORT 30012024
P. 111
4.3.1.2 ETL Process
According to Yulianto (2019), an essential phase in data warehousing is the
ETL process, which stands for Extract, Transform, and Load, which involves
removing data from several sources, transforming it to fit the needs of the data
warehouse, and then loading the data warehouse with the converted data. This
ETL process is only being performed for the stroke mortality dataset.
i. Extract
Data can be extracted from the source CSV file, merged_dataset.csv,
during the extract step. A stroke database was built, including a
collection named 'strokedashboard'. After choosing the CSV file to
import, the MongoDB Compass import interface is shown. This
interface plays a crucial role in setting the data types for each field to
guarantee data consistency throughout the import process. The
interface seen in Figure 4.30 demonstrates the careful assignment of
data types to each column. Specifically, the 'Country' field is
designated as a 'String', the 'Year' field as 'Int32', and the different
numerical fields such as total deaths and stroke death rates are assigned
as 'Double'. Precise data type is essential for preserving the integrity of
the data throughout the ETL process.
Figure 4.30 Data Import Interface in MongoDB Compass.
94