Addressing Class Imbalance Problem in Semantic Segmentation using Binary Focal Loss
Image segmentation is a foundational technique in computer vision with wide-ranging applications, especially in medical imaging. This study explores the use of binary focal loss to tackle class imbalance issues in semantic segmentation, focusing on the CANDID-PTX dataset and employing the U-Net architecture.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Addressing Class Imbalance Problem in Semantic Segmentation using Binary Focal Loss Rushikesh Chopade, Aditya Stanam, University of Iowa, & Shrikant Pawar. Department of Geology and GeophysicsIndianInstitute of Technology, KharagpurKharagpurWest Bengal 721302 India University of IowaIowaCity IA 52242-5000 USA Department of Computer Science and Biology Claflin University Orangeburg SC 29115 USA
Introduction Image segmentation is a foundational technique in computer vision with wide-ranging applications, including its critical role in medical imaging for object identification, automatic labeling, and disease diagnosis. Advancements in deep learning have significantly improved the accuracy and efficiency of image segmentation, making it an increasingly valuable tool in various domains. Class imbalanced datasets are a frequent problem experienced when trying to train segmentation networks. Class imbalance occurs when some classes (semantic categories) in the image have significantly more instances (pixels) than others. In semantic segmentation, this often happens because certain object categories are more prevalent in the real world or dataset, while others are rarer. When training a deep learning model for semantic segmentation, this imbalance can lead to several problems. In this article, we have experimented with the class weightage parameters of binary focal loss to address the class imbalance problem in semantic segmentation. By utilizing the CANDID-PTX dataset, we have utilized U-Net architecture containing upsampling (encoder) and a downsampling (decoder) network for comparing binary focal loss rates among different alpha and gamma coefficients class weights. Doing so, we found that the adjustment of class weights in the loss function could notably help in resolving the class imbalance problems.
Methods Dataset Digital Imaging and Communications in Medicine (DICOM) format chest radiograph images from CANDID-PTX dataset are utilized in this study. It contains a total of 19,237 images, of which 335 images containing acute rib fractures have been used in this study. An acute rib fracture was defined as any rib with cortical disruption visible on a chest radiograph without evidence of healing such as callus formation. There are a total of 973 different annotations provided by different radiologists for these 335 images. The same image having different annotations by different radiologists is treated as different images for dataset enhancement.
Methods Preprocessing All the images have 1024 * 1024 pixel resolution with three channels (RGB). The 335 acute rib fracture images have been provided with a run-length encoding notation. Run-length encoding is a simple Morse-like representation of a 2D image.The 1024 * 1024 image is represented as a one-dimensional array with rows appended one after the other. The run-length encoding provided for the images with the dataset contains a string of comma-separated numeric values. When the mask begins, the first number in the string is the pixel number. The numbers hence following are the lengths of the mask and background pixels. Such masks have been derived from the RLE strings for all 973 annotations (replication code link provided in the supplementary section). The maximum pixel intensity in each image is different which can cause problems while training the algorithm. So the pixel intensity of every image has been normalized in a range of 0 1. The masks formed from RLE are 2D and have been converted to 3D to be compatible with the U-Net architecture. Finally, the original image and the masked image have been downsampled to 512 * 512 before feeding to the U-Net algorithm. Evaluation Metrics
Results The optimal gradual decay for loss values was found when the alpha and gamma were tuned to 0.01 and 0.1, respectively. The training loss was found to decrease continuously from a maximum value of 0.63 0.05, whereas the validation loss was initially found to be very low (0.18) subsequently increasing to a value of 5.
Discussion and Conclusion An adjustment of class weights in the loss function of the semantic segmentation algorithm could help in resolving the class imbalance problems. Negative losses can be observed when the class weightage to the positive class is not structured properly. The tuning of hyperparameters alpha and gamma in the binary focal loss function can significantly help in addressing the class imbalance problem. In summary, addressing class imbalance is crucial for improving the performance of semantic segmentation models. Employing a combination of these strategies can help mitigate the effects of class imbalance and lead to more accurate and balanced semantic segmentation results, especially for datasets with unequal class distributions.
Funding Source This work was primarily supported by the National Science Foundation EPSCoR Program under NSF Award # OIA-2242812.