Page 124 - Data Science Algorithms in a Week
P. 124

108             Loris Nanni, Sheryl Brahnam and Alessandra Lumini

                       machine learning (Loris Nanni et al., 2012). In (Z. Wang & Chen, 2008; Z. Wang et al.,
                       2008)  classifiers  were  developed  for  handling  two-dimensional  patterns,  and  in  (Loris
                       Nanni et al., 2012) it was shown that a continuous wavelet can be used to transform a
                       vector into a matrix; once in matrix form, it can then be described using standard texture
                       descriptors (the best performance obtained in (Loris Nanni et al., 2012) used a variant of
                       the local phase quantization based on a ternary coding).
                          The  advantage  of  extracting  features  from  a  vector  that  has  been  reshaped  into  a
                       matrix  is  the  ability  to  investigate  the  correlation  among  sets  of  features  in  a  given
                       neighborhood;  this  is  different  from  coupling  feature  selection  and  classification.  To
                       maximize performance, it was important that we test several different texture descriptors
                       and  different  neighborhood  sizes.  The  resulting  feature  vectors  were  then  fed  into  an
                       SVM.
                          The following five methods for reshaping a linear feature vector into a matrix were
                                                        
                       tested in this paper. Letting    ∈     be the input vector,    ∈ ℜ    1 ×   2  the output matrix

                                                                           
                       (where  d1  and  d2  depend  on  the  method),  and  a ∈ ℜ   a  random  permutation  of  the
                       indices [1..s], the five methods are:

                          1.  Triplet (Tr): in this approach d1 =d2 =255. First, the original feature vector q is
                       normalized  to  [0,255]  and  stored  in  n.  Second,  the  output  matrix     ∈ ℜ 255×255    is
                       initialized  to  0.  Third,  a  randomization  procedure  is  performed  to  obtain  a  random
                       permutation  aj  for  j=1..100000  that  updates  M  according  to  the  following  formula:
                       M(n(aj(1)), n(aj (2))) = M(n(aj(1)), n(aj (2))) + q(aj (3));
                          2.  Continuous wavelet (CW) (Loris Nanni et al., 2012): in this approach d1 =100 d2
                       =s. This method applies the Meyer continuous wavelet to the s dimensional feature vector
                       q and builds M by extracting the wavelet power spectrum, considering the 100 different
                       decomposition scales;
                          3.  Random  reshaping  (RS):  in  this  approach  d1=d2=s0.5  and  M  is  a  random
                       rearrangement of the original vector into a square matrix. Each entry of matrix M is an
                       element of q(a);
                          4.  DCT: in this approach the resulting matrix M has dimensions d1 = d2 = s and
                       each entry M(i, j) = dct(q(aij(2..6)), where dct() is the discrete cosine transform, aij is a
                       random permutation (different for each entry of the matrix), and the indices 2..6 are used
                       to indicate that the number of considered features varies between two and six. We use
                       DCT in this method because it is considered the de-facto image transformation in most
                       visual systems. Like other transforms, the DCT  attempts to decorrelate the input data.
                       The 1-dimensional DCT is obtained by the product of the input vector and the orthogonal
                       matrix whose rows are the DCT basis vectors (the DCT basis vectors are orthogonal and
                       normalized).  The  first  transform  coefficient  (referred  to  as  the  DC  Coefficient)  is  the
                       average value of the input vector, while the others are called the AC Coefficients. After
                       several tests we obtained the best performance using the first DCT coefficient;
   119   120   121   122   123   124   125   126   127   128   129