Page 91 - programme book
P. 91

ST-025
                New Classification Algorithm for High Dimensional Data based on Robust
                                                       SIMPLS


                                         Habshah Midi 1, a)  and Abdullah Rasyid 2,b)

                                   1 Department of Mathematics and Statistics, Faculty of Science
                                   Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
                                               2 Institute for Mathematical Research
                                   Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia

                                         a)  Corresponding author: habshah@upm.edu.my
                                                 b)  habshahmidi@gmail.com


               Abstract. The ordinary least squares (OLS) method is often used to estimate the parameters of a linear
               regression model because of tradition and ease of computation. However, for high dimensional data
               where p > n , the OLS fails to produce an estimate because the X’X becomes singular. The statistically
               inspired modification of the partial least squares (SIMPLS) is put forward to rectify this problem. The
               idea of SIMPLS is to extract uncorrelated components in such a way that the components of response
               and predictor variables will have maximum covariance. Nonetheless, it is now evident that the SIMPLS
               estimates are imprecise with inflated standard errors of the estimates when outliers are present in a data
               set.  In this paper, a robust SIMPLS is proposed by integrating a new weight function in order to reduce
               the effect of vertical outliers and high leverage points (HLPs).  A new diagnostic plot is also established
               based on the proposed robust SIMPLS. The proposed diagnostic plot is very successful in classifying
               observations into regular observations, vertical outliers, good and bad HLPs (outlying observations in
               X-space).
               Keywords: high dimensional data; high leverage points;  multicollinearity;  robust  method;  robust
               SIMPLS






















                                                                                                       89
   86   87   88   89   90   91   92   93   94   95   96