Page 122 - FULL REPORT 30012024
P. 122
4.3.2.2 Model Training
The predictive model training is an organised series of activities on the given
dataset that begins with the loading the cleaned_dataset.csv. The dataset is
subjected to a preliminary selection of factors considered relevant for stroke
prediction, such as demographic characteristics, health history, and lifestyle
choices. This choice is based on previous domain information suggesting the
importance of these variables in stroke incidence, as seen in Figure 4.44.
Figure 4.44 Data processing for the stroke prediction dataset
Afterwards feature selection, feature engineering is used to categorise the age
and BMI variables into distinct categories. This transformation, shown in
Figure 4.45, is based on recognised medical and demographic criteria that link
risk levels to age groups and BMI categories. This classification serves two
purposes: it simplifies the model's learning process by converting continuous
variables to categorical ones, and it captures non-linear interactions that may
occur within age and BMI ranges.
Figure 4.45 Categorization of numerical in PowerBI
105