Page 127 - Data Science Algorithms in a Week
P. 127

Texture Descriptors for The Generic Pattern Classification Problem   111

                                                   Table 1: Tested datasets

                                   DATASET             Short name  N° patterns  N° features
                                   breast                breast        699          9
                                   heart                  heart        303          13
                                   pima                   pima         768          8
                                   sonar                  sonar        208          60
                                   ionosphere             iono         351          34
                                   liver                  liver        345          7
                                   haberman               hab          306          3
                                   vote                   vote         435          16
                                   australian             aust         690          14
                                   transfusion            trans        748          5
                                   wdbc                   wdbc         569          31
                                   breast cancer image    bCI          584         100
                                   pap test               pap          917         100
                                   tornado                torn        18951         24
                                   german credit          gCr         1000          20

                          The testing protocol used in the experiments is the 5-fold CV method, except for the
                       Tornado  dataset  since  it  is  already  divided  into  separate  training  and  testing  sets.  All
                       features  in  these  datasets  were  linearly  normalized  between  0  and  1,  using  only  the
                       training data for finding the parameters to normalize the data; this was performed before
                       feeding features into a SVM. The performance indicator used is the area under the ROC
                       curve (AUC).
                          In the following experiments, we optimized SVM for each dataset, testing both linear
                       and radial basis function kernels.
                          The first experiment is aimed at evaluating the five methods for reshaping a linear
                       feature  vector  into  a  matrix  as  described  in  section  2.  In  Table  2,  we  report  the
                       performance of each reshaping approach coupled with each matrix descriptor, as detailed
                       in section 2.
                          Examining  the  results  in  Table  2,  it  is  clear  that  TR  performs  rather  poorly;
                       moreover, RS, coupled with LPQ and CLBP, have numerical problems in those datasets
                       where few features are available (thereby resulting in small matrices). The best reshaping
                       method is FFT, and the best tested descriptor is HOG.
                          The second experiment is aimed at evaluating the fusion among different reshaping
                       methods and different descriptors for proposing an ensemble that works well across all
                       tested datasets. The first four columns of Table 3 show the fusion of reshaping methods
                       (except  Tr,  due  to  its  low  performance)  for  each  descriptor  (labelled  Dx,  specifically,
                       DLPQ, DCLBP, DHoG, and DWave). The last four columns report the fusion of methods
                       obtained  by  fixing  the  descriptor  and  varying  the  reshaping  procedures  (labelled  Rx,
                       specifically, RTr, RCW, RRS, RDCT, and RFFT).
   122   123   124   125   126   127   128   129   130   131   132