Skip to main content

A comprehensive survey of imbalanced learning methods for bankruptcy prediction


Tuong Le

Source title: 
IET Communications, 16(5), 2021 (ISI)
Academic year of acceptance: 

In practical datasets used for supervised learning, the uneven distribution of the amounts of data between classes is known as the class imbalance problem, and can reduce the performance of basic classifiers. The class imbalance problem arises in various areas such as medical diagnosis, spam filtering, and fraud detection. Bankruptcy prediction is of particular research interest due to the current economic upheaval. In a bankruptcy prediction dataset, there are two classes containing bankrupt and normal companies, and this problem can therefore be solved with binary classification methods. Several advanced models for handling the class imbalance problem in bankruptcy prediction have been developed to improve the predictability performance on a Korean bankruptcy (KB) dataset. To give an overview of imbalanced learning methods for bankruptcy prediction, this study first reviews several state-of-the-art approaches for handling this problem in bankruptcy prediction, including an oversampling-based framework, a cost-sensitive method (the CBoost algorithm), a combination of resampling techniques and a cost-sensitive framework, and an ensemble-based model (the XGBS algorithm). We also conduct empirical experiments to evaluate the methods surveyed here in terms of two performance metrics, the area under the receiver operating characteristic (ROC) curve and the geometric mean. The results show that the ensemble-based model outperforms other methods in terms of bankruptcy prediction on the KB dataset.