- Sakarya Üniversitesi Fen Bilimleri Enstitüsü Dergisi
- Volume:23 Issue:6
- Effect of Imputation Methods in the Classifier Performance
Effect of Imputation Methods in the Classifier Performance
Authors : Pinar CİHAN, Oya KALIPSIZ, Erhan GÖKÇE
Pages : 1225-1236
View : 17 | Download : 6
Publication Date : 2019-12-01
Article Type : Research Paper
Abstract :Missing values in a data set present an important problem for almost any traditional and modern statistical method, since most of these methods were developed under the assumption that the data set was complete. However, in the real world no complete datasets are available and the issue of missing data is frequently encountered in veterinary field studies as in other fields. While imputation of missing data is important in veterinary field studies where data mining is newly starting to be implemented, another important issue is how it should be imputed. This is because in many studies observations with any variables having missing values are being removed or they are completed by traditional methods. In recent years, while alternative approaches are widely available to prevent removal of observations with missing values, they are being used rarely. The aim of this study is to examine mean, median, nearest neighbors, mice and missForest methods to impute the simulated missing data which is the randomly removed with varying frequencies insert ignore into journalissuearticles values(5 to 25% by 5%); from original veterinary dataset. Then highly accurate methods selected to impute original dataset for observation of influence in classifier performance and to determine the optimal imputation method for the original dataset.Keywords : missing value, multiple imputation, classification, naive bayes, decision tree, machine learning, veterinary