Page 127 - Data Science Algorithms in a Week
P. 127
Texture Descriptors for The Generic Pattern Classification Problem 111
Table 1: Tested datasets
DATASET Short name N° patterns N° features
breast breast 699 9
heart heart 303 13
pima pima 768 8
sonar sonar 208 60
ionosphere iono 351 34
liver liver 345 7
haberman hab 306 3
vote vote 435 16
australian aust 690 14
transfusion trans 748 5
wdbc wdbc 569 31
breast cancer image bCI 584 100
pap test pap 917 100
tornado torn 18951 24
german credit gCr 1000 20
The testing protocol used in the experiments is the 5-fold CV method, except for the
Tornado dataset since it is already divided into separate training and testing sets. All
features in these datasets were linearly normalized between 0 and 1, using only the
training data for finding the parameters to normalize the data; this was performed before
feeding features into a SVM. The performance indicator used is the area under the ROC
curve (AUC).
In the following experiments, we optimized SVM for each dataset, testing both linear
and radial basis function kernels.
The first experiment is aimed at evaluating the five methods for reshaping a linear
feature vector into a matrix as described in section 2. In Table 2, we report the
performance of each reshaping approach coupled with each matrix descriptor, as detailed
in section 2.
Examining the results in Table 2, it is clear that TR performs rather poorly;
moreover, RS, coupled with LPQ and CLBP, have numerical problems in those datasets
where few features are available (thereby resulting in small matrices). The best reshaping
method is FFT, and the best tested descriptor is HOG.
The second experiment is aimed at evaluating the fusion among different reshaping
methods and different descriptors for proposing an ensemble that works well across all
tested datasets. The first four columns of Table 3 show the fusion of reshaping methods
(except Tr, due to its low performance) for each descriptor (labelled Dx, specifically,
DLPQ, DCLBP, DHoG, and DWave). The last four columns report the fusion of methods
obtained by fixing the descriptor and varying the reshaping procedures (labelled Rx,
specifically, RTr, RCW, RRS, RDCT, and RFFT).