Page 123 - FULL REPORT 30012024
P. 123

Categorical  encoding  is  then  used  to  convert  non-numeric  variables  into  a

                               format that machine learning algorithms can understand. Label encoding was

                               chosen because it is efficient in converting categories into numerical values,
                               which is required for algorithms that need numerical input. This procedure is

                               meticulously carried out for variables such as 'gender', 'smoking_status', and
                               'Residence_type,' among others, to ensure that the subtleties of categorical data

                               are kept in a numerical format.


                               The Synthetic Minority Oversampling Technique (SMOTE) is used to resolve

                               the imbalance in the dataset, which might possibly lead to a biassed model. As
                               seen  in  In  Figure  4.46,  it  artificially  synthesises  additional  samples  of  the

                               minority  class  in  order  to  offer  a  balanced  representation  of  classes.  The

                               argument  for  utilising  SMOTE  is  its  ability  to  attenuate  the  skewed  class
                               distribution, hence improving the model's generalizability.







                                                   Figure 4.46 The code to employ SMOTE.



                               Based  on  the  literature  study  in  the  preceding  chapter,  the
                               RandomForestClassifier was selected for its resilience and efficacy in dealing

                               with both categorical and numerical data. It is trained on a balanced dataset,

                               which was chosen because of the classifier's capacity to minimise overfitting
                               and  provide  feature  relevance  ratings.  The  performance  of  the  model  is

                               thoroughly tested using conventional measures such as accuracy, precision,
                               recall, and the F1 score. The measurements provide a full evaluation of its

                               predictive capabilities. Table 4.6 depicts the table including the metrics and

                               assessment findings.










                                                               106
   118   119   120   121   122   123   124   125   126   127   128