Utilizing Financial Data for Predicting Company Bankruptcy - TBC Study

Slide Note

The Taiwanese Bankruptcy Court aims to identify early signs of potential bankruptcy in companies using financial data to offer timely assistance and prevent insolvency. Steps include regression analysis, model selection, data preparation, and handling imbalanced datasets to enhance predictive accuracy.

lorrai Follow

Uploaded on Feb 23, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

PREDICTING COMPANY BANKRUPTCY BIT 5534 - SPRING 2023 GROUP 3: DONNA LEKANG BRIAN LEMASTER CEM ROTH

BUSINESS PROBLEM The Taiwanese Bankruptcy Court (TBC) is looking at how to utilize company financial data to help identify what factors may correlate with companies going bankrupt. An increasing number of companies seek aid too late from the TBC when they are already insolvent. They need a way to identify companies at an earlier stage that may be more prone to bankruptcy based on the previously collected financial data publicly disclosed on companies. This way, the TBC can provide alternative solutions such as offering debt relief programs to help these companies avoid bankruptcy.

OUR ANALYSIS PROCESS Performed Stepwise Regression to reduce dimensionality Performed Logistic Regression with key predictors Compared the results of our models Selected the best- performing model Obtained Original Dataset: Company Bankruptcy Prediction | Kaggle Ran Decision Tree Model Model Evaluation & Selection Results Presentation Model Development Project Definition Data Collection Data Preparation Ran Neural Network Model Defined business problem and objectives Verified there were no missing values Removed outliers Transformed highly skewed variables to improve normality & performance Improved Data Balance Presented results & recommendations Ran Random Tree (Cutoff of 0.13) Ran Random Tree with Oversampling (Cutoff of 0.5) Ran multiple models using JMP Imbalanced Data Add-On

DATA PREPARATION We used Kaggle s Company Bankruptcy Dataset which contained data from the Taiwan Economic Journal for the years 1999-2009 (Link) Our initial dataset contained 6,819 records with 95 predictor variables, 220 cases resulting in bankruptcy and 6,559 cases resulting in no bankruptcy. After performing Stepwise Regression, we reduced the number of predictor variables from 95 to 12, not including our target variable. After improving the balance of our dataset and recognizing our 1,500-case minimum, our dataset contained 1,500 records with 220 of them resulting in bankruptcy and 1,280 of them resulting in no bankruptcy. After removing outliers, our dataset contained 195 (13%) cases resulting in bankruptcy and 1,305 (87%) resulting in no bankruptcy. Our dataset is highly imbalanced, which must be taken into consideration when running models and interpreting performance statistics. After viewing the summary statistics, distributions and correlations of our predictor variables, we performed natural log transformations on two of our variables, significantly reducing their skewness.

UNDERSTANDING THE DATASET Attribute Name Variable Type Definition Formula (if applicable) Target/Response binary variable; value of 1 indicates bankruptcy, value of 0 indicates no bankruptcy. Return on Total Assets, calculated before interest and depreciation after tax. binary *Bankrupt_Flag continuous ROA Persistent Earned Per Share (EPS) in the last four seasons; calculated as earnings per share minus net income. continuous EPS Net Income Persistent EPS 4Q A ratio of total liability over total equity; calculated as total liability divided by equity ratio. continuous Total Liability / Equity Ratio Total Debt/Total Net Worth A ratio of liability over total assets (%); calculated as liability divided by total assets. continuous Liability / Total Assets (Inventory + Accounts Receivables) / Equity Debt Ratio Inventory and Accounts Receivable over Net Value; calculated by adding Inventory to Accounts Receivable and dividing the sum by Equity. continuous continuous Inv_AR_NV Total asset turnover Total Asset Turnover continuous Fixed assets turnover frequency Fixed Assets Turnover Frequency continuous Cash / Total Assets Cash/(Total Assets) proportion Cash/Total Assets continuous Current Liabilities / Liability Current Liabilities/Liability proportion Current Liabilities/Liability continuous Cash turnover rate (cash to sales) Cash Turnover Rate continuous Cash Flow to Liability proportion Cash Flow to Liability continuous Net Income to Total Assets proportion Net Income to Total Assets *For our analysis, Bankrupt_Flag is our target variable, where a value of 1 indicates the company went bankrupt and value of 0 indicates the company did not go bankrupt.

DATA ANALYSIS: LOGISTIC REGRESSION MODEL Metric Value Results Significant Variables/Features (with p-values): Cash / Total Assets (0.00007) Total Asset Turnover (0.00222) Persistent EPS 4Q (0.00445) Cash Turnover Rate (0.03193) Current Liabilities / Liability (0.07047) Accuracy 91.5% Sensitivity 0.544 Specificity 0.970 Precision 0.731 Misclassification 0.085 ROC AUC 0.936 F1-Score 0.624 R-Square (Generalized) 0.559

DATA ANALYSIS: DECISION TREE Results Split History chart shows optimal number of splits at the highest validation model R2value, which was 8 splits. Metric Value (Training) Value (Validation) Accuracy 91.8% 92.1% Sensitivity 0.715 0.757 Specificity 0.948 0.947 Precision 0.669 0.7 Misclassification 0.082 0.079 ROC AUC 0.912 0.881 F1-Score 0.691 0.727 R-Square (Generalized) 0.541 0.518

DATA ANALYSIS: *RANDOM FOREST MODELS Random Forest w/Standard Partition Training Set: 825 cases (55%) Validation Set: 225 cases (15%) Test Set: 450 cases (30%) Random Forest w/Oversampling Training Set: 194 cases (21%) Validation Set: 523 cases (55%) Test Set: 230 cases (24%) Random Forest w/Oversampling (Success Rate = 0.13 and Cutoff = 0.5) Random Forest w/Standard Partition (Cutoff = 0.13) Measure Accuracy 84% 76% Sensitivity 0.825 0.733 Specificity 0.842 0.765 Important Features: 1. ROA (0.53) 2. Total Debt/Total Net Worth (0.32) 3. Total Asset Turnover (0.30) 4. Persistent EPS 4Q (0.28) 5. Cash/Total Assets (0.26) Precision 0.431 0.319 Misclassification 0.160 0.239 AUC 0.876 0.865 F1-Score 0.566 0.444 *Random Forest Models were run using Excel with Analytic Solver software.

DATA ANALYSIS: *BOOTSTRAP FOREST MODELS Bootstrap Forest w/SMOTE plus Tomek Sampling Bootstrap Forest w/SMOTE Sampling Bootstrap Forest w/SMOTE Sampling Measure Accuracy 87% 88% Training Set: 1,050 cases Test Set: 450 cases Sensitivity 0.831 0.759 Bootstrap Forest w/SMOTE & Tomek Sampling Specificity 0.880 0.898 Precision 0.510 0.534 Training Set: 1,050 cases Test Set: 450 cases Misclassification 0.127 0.120 AUC 0.930 0.916 F1-Score 0.632 0.620 *Bootstrap Forest Models were run using JMP s Imbalanced Classification Add-In software.

DATA ANALYSIS: NEURAL NETWORK Neural Network Model, Random Holdback Holdback Proportion: 0.33 Training set: 1000 cases, Validation set: 500 cases Metric Value (Training) Value (Validation) Accuracy 92.6% 90.8% Sensitivity 0.708 0.646 Specificity 0.959 0.947 Precision 0.719 0.646 Misclassification 0.074 0.092 ROC AUC 0.955 0.912 F1-Score 0.713 0.646 R-Square (Generalized) 0.639 0.479 Neural Network Structure Diagram

*MODEL COMPARISON RSquare (Gen.) Accuracy Sensitivity Specificity Precision Misclassification AUC F1 Score Model (ROC Curve) Best Overall 91.5% 0.544 0.970 0.731 0.085 0.936 0.624 0.559 Logistic Regression 92.1% 0.757 0.947 0.7 0.079 0.881 0.727 0.518 Decision Tree 84.0% 0.825 0.842 0.431 0.160 0.876 0.566 -- Random Forest w/Std Partition 76.1% 0.733 0.765 0.319 0.239 0.865 0.444 -- Random Forest w/Oversampling 87.3% 0.831 0.880 0.510 0.127 0.930 0.632 0.380 Bootstrap Forest (SMOTE) 88.0% 0.759 0.898 0.524 0.120 0.916 0.620 0.364 Bootstrap Forest (SMOTE plus Tomek) 90.8% 0.646 0.947 0.646 0.092 0.912 0.646 0.479 Neural Networks *Model Comparison shows each modeling technique s respective validation or test model, if applicable.

FINDINGS AND RECOMMENDATIONS Our team conducted many models looking at the data for the best overall model for the Taiwanese Bankruptcy Courts (TBC): Logistic Regression, Decision Tree, Random Forest (with STD Partition and with Oversampling), Bootstrap Forest SMOTE (and SMOTE with Tomek) and Neural Network. We performed descriptive statistics on the dataset looking at bankruptcy factors. We recommend that the TBC ultimately deploy our best model, the Logistic Regression model. This model had the strongest metrics overall including the highest R2value of all of our models at 0.559, an Accuracy of 91.5%, a Misclassification Rate of 0.085 and an F1-Score of 0.624. We found that the following variables showed the highest correlation in companies declaring bankruptcy (in order of importance): ROA, Total Debt/Total Net Worth, Total Asset Turnover, Persistent EPS 4Q and Cash/Total Assets. We recommend that TBC look at those variables with new data going forward to identify at-risk companies. If the TBC is able to acquire this data on current companies of various sizes in Taiwan, they will be able to deduce which companies are in jeopardy of declaring bankruptcy and be able to reach out to help them before they do. This can be very beneficial for their country and their entire economy to keep these companies in business.

Utilizing Financial Data for Predicting Company Bankruptcy - TBC Study

Download Presentation

Presentation Transcript

Related

More Related Content