2019 : ENHANCING THE PERFORMANCE OF SMOTE ALGORITHM BY USING ATTRIBUTE WEIGHTING SCHEME AND NEW SELECTIVE SAMPLING METHOD FOR IMBALANCED DATA SET

Dr.Eng. Chastine Fatichah S.Kom, M.Kom


Abstract

SMOTE is one of the well-known algorithms for balancing train data by adding synthetic data on minor class data. One of the stages in SMOTE is finding the nearest neighbors (kNN) as the basis for creating synthetic data using Euclidean distance. In cases where a small number of attributes having high correlation value than others, finding kNN using Euclidean without considering this correlation may not find representative neighbors. This paper introduces AWH-SMOTE (Attribute Weighted and kNN Hub on SMOTE), which enhances SMOTE in improving neighbors and noise identification using attribute weighting and also improving selective sampling method using occurrence data in the kNN hub. Wojna and Information Gain methods are used for attribute weighting. A small number of occurrences in the kNN hub results in more synthetic data generated so that minority data in dangerous region are more represented. Nine public datasets from Keel repository are used to evaluate AWH-SMOTE. Evaluation shows AWH-SMOTE has better performance on minority precision and minority f-measure for both pruned and unpruned condition than other oversampling algorithms. Information Gain as attribute weighting method in AWH-SMOTE achieves best performance in unpruned condition when compared to other weighting methods for minority recall, minority precision and minority f-measure.