Page 123 - FULL REPORT 30012024
        P. 123
     Categorical  encoding  is  then  used  to  convert  non-numeric  variables  into  a
                               format that machine learning algorithms can understand. Label encoding was
                               chosen because it is efficient in converting categories into numerical values,
                               which is required for algorithms that need numerical input. This procedure is
                               meticulously carried out for variables such as 'gender', 'smoking_status', and
                               'Residence_type,' among others, to ensure that the subtleties of categorical data
                               are kept in a numerical format.
                               The Synthetic Minority Oversampling Technique (SMOTE) is used to resolve
                               the imbalance in the dataset, which might possibly lead to a biassed model. As
                               seen  in  In  Figure  4.46,  it  artificially  synthesises  additional  samples  of  the
                               minority  class  in  order  to  offer  a  balanced  representation  of  classes.  The
                               argument  for  utilising  SMOTE  is  its  ability  to  attenuate  the  skewed  class
                               distribution, hence improving the model's generalizability.
                                                   Figure 4.46 The code to employ SMOTE.
                               Based  on  the  literature  study  in  the  preceding  chapter,  the
                               RandomForestClassifier was selected for its resilience and efficacy in dealing
                               with both categorical and numerical data. It is trained on a balanced dataset,
                               which was chosen because of the classifier's capacity to minimise overfitting
                               and  provide  feature  relevance  ratings.  The  performance  of  the  model  is
                               thoroughly tested using conventional measures such as accuracy, precision,
                               recall, and the F1 score. The measurements provide a full evaluation of its
                               predictive capabilities. Table 4.6 depicts the table including the metrics and
                               assessment findings.
                                                               106





