
Enhancing Accounting Fraud Detection Models Using Clustering Techniques at Rutgers University-Newark
Explore how Rutgers University-Newark is enhancing accounting fraud detection models through clustering techniques applied to financial ratios. The research focuses on identifying financial reporting peer firms, utilizing machine learning algorithms and clustering analysis to group similar firms based on key financial ratios. The study also highlights the success of clustering in finding firms with similar financial ratios and the potential for improving accounting fraud detection models.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Redoing ratio analysis with clustering techniques Kexing Ding Xuan Peng Miklos Vasarhelyi Yunsen Wang Rutgers University-Newark
Rutgers University- Newark Peer selection with clustering We apply machine learning techniques on financial ratios to identify financial reporting peer firm. Financial reporting peer firms: firms that have similar financial reporting qualities. Observation: SIC codes are commonly used Obsolete Infrequent update Focus on production process
Rutgers University- Newark Financial reporting quality models Beneish (1997, 1999): Financial reporting quality is associated with eight ratios Either the accounting distortion or the preconditions that encourage managers to conduct misreporting. Widely used in academic and in practice ever since Sales in receivables (Receivables/Sales), Gross Margin (Sales-Costs of Goods Sold/Sales), Asset Quality (1 - (Current Assets + PPE)/Total Assets), Sales Growth (Salest/Salest-1), Depreciation Rate (Depreciation/NetPPE), SGA Rate (Sales, general, and administrative expenses/Sales), Leverage (Total Debt/Total Assets), Accruals (Total Accruals/ Total Assets).
Rutgers University- Newark Clustering method Clustering analysis is one of the data mining methodologies that groups a set of objects in such a way that objects in a same cluster is more similar to each other than to those in the other clusters. The K-medians algorithm operates on a set (X) of n points. It chooses k centers {?1,?2, ,??} from X and form k clusters {?1,?2, ,??} that the sum of the distances from each ??to the center of its clusters ci(?) is minimized. An observation with eight financial ratios can be viewed as a point (??) in an eight-dimensional space. Every year, the clustering procedure assigns each point in one of the 300 clusters.
Rutgers University- Newark Results (1) Clustering successfully finds firms with similar financial ratios. Low within-group deviation High inter-group deviations Panel A: Within-group deviation Sales in receivables Gross margin Asset quality Sales ratio Depreciation rate SG&A rate Leverage Accruals SIC 3-digit 0.830 3.382 0.700 0.611 0.560 1.326 0.834 13.803 NAICS 4-digit 0.865 3.183 0.742 0.617 0.543 1.352 0.873 15.877 Clustering 0.541 0.974 0.552 0.259 0.647 0.974 0.443 7.502 Panel B: Inter-group deviation SIC 3-digit 3.321 15.529 0.727 0.688 0.542 4.796 3.155 13.989 NAICS 4-digit 3.253 17.818 0.692 0.681 0.568 4.603 2.690 9.305 Clustering 5.596 20.478 0.889 3.962 949.000 11.244 10.930 20.687
Rutgers University- Newark Results (2) Enhance accounting fraud detection model Dechow (2011) F-score model: ??????,?= ?0+ ?1????_?????????,?+ ?2? ???? ?? ????????????,?+ ?3? ???? ?? ??????????,?+ ?4???? ???????,?+ ?5? ???? ?? ??? ?????,?+ ?6? ???? ?? ????,?+ ?7??????,?+ ? ?????????? ????? 1 ?????????? ????? ????????? ??????????? = F-score = The predicted probability / the unconditional expectation of accounting fraud The higher the F-score, the more suspicious Revised model: ??????,?= ?0+ ?1????_?????????,?+ ?2? ???? ?? ????????????,?+ ?3? ???? ?? ??????????,?+ ?4???? ???????,?+ ?5? ???? ?? ??? ?????,?+ ?6? ???? ?? ????,?+ ?7??????,?+???????????,?+ .
Rutgers University- Newark Rank the firms based on F-score: Distribution of F-score Model 1 Model (2a) N % Total Min. F N % Total Min. F Quintile 1 Fraud Non-fraud Quintile 2 20 6.47% 0.17 0.04 19519 20.05% 17 5.50% 0.09 0.03 19516 20.04% Fraud Non-fraud Quintile 3 27 8.74% 0.65 0.64 19506 20.03% 30 9.71% 0.41 0.4 19509 20.04% Incorporating firms deviation from financial reporting peer firms enhances the ability of F-score to detect fraud firms. Fraud Non-fraud Quintile 4 48 15.53% 0.81 0.81 19495 20.02% 41 13.27% 0.63 0.6 19488 20.01% Fraud Non-fraud Quintile 5 102 19434 19.96% 33.01% 0.99 0.98 19470 20.00% 66 21.36% 0.95 0.92 Fraud Non-fraud 112 19424 19.95% 1.27 19381 19.90% 1.42 36.25% 1.28 155 50.16% 1.44 Model 1: Dechow (2011) Model Model 2a: Revised Model
Rutgers University- Newark 1. We rank the firms based on F-score: Distribution of F-score Model (2a) Model (2b) Model (2c) N % Total Min. F N % Total Min. F N % Total Min. F Quintile 1 Fraud Non-fraud Quintile 2 17 5.50% 20.05% 0.09 0.03 23 7.44% 19.98% 0.14 0.01 22 7.12% 19.98% 0.15 0.02 19519 19513 19514 The ability of F-score derived from the SIC 4- digit (NAIC 4-digit) classification to detect fraud is similar to the original Dechow et al. (2011) model. Fraud Non-fraud Quintile 3 30 9.71% 20.03% 0.41 0.4 24 7.77% 19.98% 0.62 0.61 24 7.77% 19.98% 0.63 0.62 19506 19512 19512 Fraud Non-fraud Quintile 4 41 13.27% 20.02% 0.63 0.6 52 16.83% 19.95% 0.78 0.77 52 16.83% 19.95% 0.79 0.78 19495 19484 19484 . Fraud Non-fraud Quintile 5 66 21.36% 20.00% 0.95 0.92 103 19433 33.33% 19.89% 0.99 0.98 102 19434 33.01% 19.90% 1 19470 0.99 Fraud Non-fraud 155 19381 50.16% 19.90% 1.44 1.42 107 19429 34.63% 19.89% 1.26 1.25 109 19427 35.28% 19.89% 1.26 1.26 Model 2a: Revised Model with clustering ???????? Model 2b and 2c are Revised models with SIC 3-digit and NAIC 4-digit ????????
Rutgers University- Newark Type 1 error reduces from 40.8% to 34.7% Type 2 error reduces from 32.4% to 28.8%. Type I error is calculated as the percentage of non-fraud observations that are incorrectly predicted as fraud. Type II error is calculated as the percentage of fraud observations that are incorrectly predicted as non-fraud by the model.
Rutgers University- Newark Thank you!