Class Imbalance in Classification

Slide Note

Class imbalance in classification refers to scenarios where the number of samples in different classes is significantly uneven. Addressing this issue is crucial to avoid biased models and inaccurate predictions. Various methods such as oversampling, undersampling, and rescaling can be employed to handle class imbalance effectively and enhance the performance of classification models.

jaaza Follow

Uploaded on Apr 28, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Class imbalance in Classification 2017/03/28

Class imbalance Problem Assumption: Positive and negative samples have similar numbers What if we have 998 negative samples and 2 positive samples? common in the real world If we return a classification always return negative, we can gain 99.8% accuracy on the training set. But is it we want? How to solve it?

Class imbalance-rescaling Taking linear classification as an example: ? = ??? + ? Compare y with a threshold (0.5) if ????????> ?.? positive ? if ????????< ?.? negative 1 ?> 1 ???????? Assumption: we have similar numbers of positive and negative samples 1 ?>m+ ? ? : ????????

Class imbalance-rescaling 1 ?>m+ ? ? : ???????? We let ? 1 ? m ? 1 ? = ?+ And judging class by ? . Not Practical for real: Not Practical for real: Assumption: we can predict the sample distribution according to the Assumption: we can predict the sample distribution according to the training set. training set.

Methods to solve class imbalance Oversampling Add some minor class samples first and then learn SMOTE Undersampling Remove some major class samples first and then learn ENN Edited Nearest Neighbor EasyEnsemble Threshold-moving Use rescaling

Oversampling-SMOTE SMOTE: A State-of-the-Art Resampling Approach SMOTE stands for Synthetic Minority Oversampling Technique. For each minority Sample Find its k-nearest minority neighbors Randomly select j of these neighbors Randomly generate synthetic samples along the lines joining the minority sample and its j selected neighbors (j depends on the amount of oversampling desired)

Oversampling-SMOTE

Oversampling-SMOTE

Overgeneralization

Oversampling-SMOTE Overgeneralization Overgeneralization SMOTE s procedure is inherently dangerous since it blindly generalizes the minority area without regard to the majority class. This strategy is particularly problematic in the case of highly skewed class distributions since, in such cases, the minority class is very sparse with respect to the majority class, thus resulting in a greater chance of class mixture. Lack of Flexibility Lack of Flexibility The number of synthetic samples generated by SMOTE is fixed in advance, thus not allowing for any flexibility in the re-balancing rate.

Oversampling-Borderline SMOTE1

Oversampling-Borderline SMOTE2 Similar with Borderline SMOTE2 FOR p in DANGER: Find k-NN samples ??and ??in S and L Generate samples using SMOTE selecting ? rate in ?? Generate samples using SMOTE selecting 1 ? rate in ??

Undersampling-Edited Nearest Neighbor Delete those majority samples most of whose K neighbor is minor Repeated Edited Nearest Neighbor

Undersampling-EasyEnsemble Extract some major class samples and minor class samples to form a adaboost classifier Each classifier undersamples the original data Decide the result by summing these weak classifiers results up. For i = 1, ..., N: (a) L Li |Li| = |S|. (b) Li S AdaBoost Fi

Class Imbalance in Classification

Download Presentation

Presentation Transcript

Related

More Related Content