Page 91 - programme book
P. 91
ST-025
New Classification Algorithm for High Dimensional Data based on Robust
SIMPLS
Habshah Midi 1, a) and Abdullah Rasyid 2,b)
1 Department of Mathematics and Statistics, Faculty of Science
Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
2 Institute for Mathematical Research
Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia
a) Corresponding author: habshah@upm.edu.my
b) habshahmidi@gmail.com
Abstract. The ordinary least squares (OLS) method is often used to estimate the parameters of a linear
regression model because of tradition and ease of computation. However, for high dimensional data
where p > n , the OLS fails to produce an estimate because the X’X becomes singular. The statistically
inspired modification of the partial least squares (SIMPLS) is put forward to rectify this problem. The
idea of SIMPLS is to extract uncorrelated components in such a way that the components of response
and predictor variables will have maximum covariance. Nonetheless, it is now evident that the SIMPLS
estimates are imprecise with inflated standard errors of the estimates when outliers are present in a data
set. In this paper, a robust SIMPLS is proposed by integrating a new weight function in order to reduce
the effect of vertical outliers and high leverage points (HLPs). A new diagnostic plot is also established
based on the proposed robust SIMPLS. The proposed diagnostic plot is very successful in classifying
observations into regular observations, vertical outliers, good and bad HLPs (outlying observations in
X-space).
Keywords: high dimensional data; high leverage points; multicollinearity; robust method; robust
SIMPLS