go back

Improved Sample Type Identification for Multi-Class Imbalanced Classification with Real-World Applications

Jiawen Kong, Wojtek Kowalczyk, Kees Jonkers, Stefan Menzel, Thomas Bäck, "Improved Sample Type Identification for Multi-Class Imbalanced Classification with Real-World Applications", 18th International Conference on Data Science (ICDATA22), 2022.


Driven by studying the nature of imbalanced data, researchers proposed to consider different types of samples (safe, borderline, rare samples and outliers) in the minority class. The idea was first proposed and evaluated on binary imbalanced classification problems and then extended to multi-class scenarios. However, simply extending the identification rule in binary scenarios to multi-class scenarios results in several problems, for example, a higher percentage of unsafe samples in minority classes and a false identification of outliers. In this paper, we first show the drawbacks when extending this idea from binary to multi-class scenarios. Then, we propose a new identification rule for multi-class scenarios. In our experiments, we consider oversampling different types of samples before performing classification, where oversampling is a data-level approach to deal with the imbalance in the datasets. Experimental results on benchmark datasets indicate that the proposed rule can decrease the probability of false identification and improve the classification performance on minority class(es) on average by 7.4%. In addition, we apply our proposed rule on surface inspection data from the steel industry and confirm its effectiveness and potential usefulness in real-world applications.

Download Bibtex file Per Mail Request