- Black Sea Journal of Engineering and Science
- Volume:7 Issue:6
- The Effect of Regularized Regression and Tree-Based Missing Data Imputation Methods on Classificatio...
The Effect of Regularized Regression and Tree-Based Missing Data Imputation Methods on Classification Performance in High Dimensional Data
Authors : Buğra Varol, İmran Kurt Omurlu, Mevlüt Türe
Pages : 1263-1269
Doi:10.34248/bsengineering.1531546
View : 39 | Download : 77
Publication Date : 2024-11-15
Article Type : Research Paper
Abstract :Missing data is an important problem in the analysis and classification of high dimensional data. The aim of this study is to compare the effects of four different missing data imputation methods on classification performance in high dimensional data. In this study, missing data imputation methods were evaluated using data sets, whose independent variables between mixed correlated with each other, for binary dependent variable, p=500 independent variables, n=150 units and 1000 times running simulation. Missing data structures were created according to different missing rates. Different datasets were obtained by imputing the missing values using different methods. Regularized regression methods such as least absolute shrinkage and selection operator (lasso) and elastic net regression were used for imputation, as well as tree-based methods such as support vector machine and classification and regression trees. At the end of simulation, the classification scores of the methods were obtained by gradient boosting machine and the missing data prediction performances were evaluated according to the distance of these scores from the reference. Our simulation demonstrates that regularized regression methods outperform tree-based methods in classifying high dimensional datasets. Additionally, it was found that the increase in the amount of missing values reduced the classification performance of the methods in high dimensional data.Keywords : Gradient boosting machine, High dimensional data, Imputation, Classification, Simulation