Page 122 - FULL REPORT 30012024
P. 122

4.3.2.2 Model Training



                               The predictive model training is an organised series of activities on the given

                               dataset that begins with the loading the cleaned_dataset.csv. The dataset  is
                               subjected to a preliminary selection of factors considered relevant for stroke

                               prediction, such as demographic characteristics, health history, and lifestyle
                               choices. This choice is based on previous domain information suggesting the

                               importance of these variables in stroke incidence, as seen in Figure 4.44.










                                            Figure 4.44 Data processing for the stroke prediction dataset


                               Afterwards feature selection, feature engineering is used to categorise the age

                               and  BMI  variables  into  distinct  categories.  This  transformation,  shown  in
                               Figure 4.45, is based on recognised medical and demographic criteria that link

                               risk levels to age groups and BMI categories. This classification serves two

                               purposes: it simplifies the model's learning process by converting continuous
                               variables to categorical ones, and it captures non-linear interactions that may

                               occur within age and BMI ranges.

























                                               Figure 4.45 Categorization of numerical in PowerBI

                                                               105
   117   118   119   120   121   122   123   124   125   126   127