
NASCAR Winston Cup Races Analysis: Beta Regression for Ford Prize Money Proportion
"Explore the methodology of Beta Regression for analyzing the proportion of prize money won by Ford cars in NASCAR Winston Cup Races from 1994 to 2000. The study includes predictor variables and a logit link function to model the rates and proportions effectively."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Beta Regression Proportion of Prize Money for Ford in NASCAR Winston Cup Races 1994-2000 Methodology: S.L.P. Ferrari and F. Cribari-Neto (2004). Beta Regression for Modelling Rates and Proportions, Journal of Applied Statistics, Vol. 31, #7, pp. 799-815. Data: L. Winner (2006). NASCAR Winston Cup Race Results for 1975-2003, Journal of Statistics Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html
Data Description Units: 267 Winston Cup Races for Years 1992-2000 Response: Proportion of Prize Money Won by Ford Cars Predictor Variables: Proportion of all Cars that are Fords for the race Track LENGTH (Miles) Track Turn BANK (Degrees) Number of LAPS Year Dummy Variables (Year1993-Year2000) Distribution: Beta (Scaled for responses between 0 and 1) Link Function: Logit: log( /(1- ))= 0+ 1X1+ + PXp
logit(y) vs Proportion of Ford Cars in Race 1.25 1 0.75 0.5 0.25 logit(y) 0 -0.25 -0.5 -0.75 -1 -1.25 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Proportion of Fords logit(y) versus Track Length 1.25 1 0.75 0.5 0.25 logit(y) 0 -0.25 -0.5 -0.75 -1 -1.25 0 0.5 1 1.5 2 2.5 3 Track Length
Time Series of logit(y) 1.25 1 0.75 0.5 0.25 logit(y) 0 -0.25 -0.5 -0.75 -1 -1.25 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Beta Distribution Likelihood Function ( )( + ) ( ) ( ) ( ) ( + + + 1 ( ) ( ) ( ) ( ) ( ) ( 1 ) 1 1 | , = = = 1 1 1 0 1; , 0 1 1 f y y y y y y dy ( ) ( ) ( ( ) ( ) ( ) ( ) ) ( ) ( ) ( ) + ) + 0 ) ( ) ( ( ) ( ) ( ) ( ) 1 + + + + + 1 1 E Y ( ) ( ) 1 1 = = = = = 1 1 1 yy y dy y y dy ( ) ( ) ) Y 1 0 0 ( ) = ( + + = ( ) ( )( E Y ) + ( ( ) ( ( ) ( ) ( ) ( ) ( ) + + + + + + + 1 ) )( 1 1 2 + ( ) ( ) 1 1 + = = = 2 2 1 1 1 1 y y y dy y y ( ) ( ) ( + ) 2 0 0 ( ) ( ) ) ( + ) + + 1 1 = ( ) ( )( ) ( ( )( ) + + 1 ) ( ) ( ( ) ( ) 2 + + + + + + + 2 + + 1 1 1 E Y ( ) E Y 2 = = = = = 2 Y 2 V Y ( )( ) )( ( ) ( )( ) + + + 2 2 1 + + + + 1 1 ) + 1 ( ) E Y = = + = = = = 1 V Y 1 ( ) ( ( ( ) ( ) ( ) 1 1 = = + + + = 1 x ' Li kelihood function: 1 Link Function: ... L y y g X X i i ) ( ) ) i 0 1 1 i i i i i p ip 1 i i ( ) x ' i e + x ' e + 1 e 1 i ( ) ( ) 1 = = = x ' i Logit Link: ln 1 i g L y iy 1 i e x ' i + 1 i i i x ' x ' 1 1 e 1 e e + i i i + x ' x ' 1 1 e i i
Beta Distribution Logit Link + + + ... X X 0 1 1 i p ip + 0 1 + + 0 ... X X 0 1 1 i p ip X x ' 1 1 i = = + + + = = = x ' i 1 ... X X X X x 1 i i 1 0 1 1 i ip i p ip X p ip + + + ... X X 0 1 1 i p ip p + + + + + + + + + ... ... ... X X X X X X x ' 0 1 1 0 1 1 0 1 1 i p ip i p ip i p ip = = = x ' i 1 X X i 1 i ip ' 0 1 p ( )( ) 1 1 ( ) x ' 1 1 1 1 e + i ( ) ( ) = = = = i i Logit Link: log ' i g g ( ) ( ) i i i x ' 2 1 1 1 e i i i i i i 1 i ( ) x ' i e + 1 e 1 ( ) 1 = x ' i 1 L y y 1 i e x ' i + 1 i i x ' 1 e e + i + x ' x ' 1 1 e i i ( ) L = = log-Likelihood (Logit Link): ln l i i x ' x ' 1 e 1 e e + e + ( ) i i ( ) ( ) y ( ) + + log log log 1 log 1 l og 1 iy i + + x ' x ' x ' x ' 1 1 1 1 e e i i i i
Beta Distribution Logit Link ( dz ) ( ) z log d x ' x ' ( ) z = = = x x ' i i i i ' log-Likelihood (Logit Link): ( ) ( ( ) ( ) ( ) ( ) ( ) L ( ) ( ) ( ) ) ( ) y ( ) ( ) = = + + ln log log log 1 1 log 1 1 log 1 l y i i i i i i i i x ' x ' 1 e 1 e e + e + ( ) i i ( ) ( ) y ( ) = + + log log log 1 log 1 log 1 y i i + + x ' ix ' x ' x ' 1 1 1 1 e e i i i l y ( ) ( ) ( ) ( ) ( ) ( ) y ( ) ( ) ( ) = + + log 1 = 1 log log 1 i i y i i i i i i 1 y i i l y ( ) ( ) ( ) ( ) = = 0 log 1 0 Useful in Fisher Scoring Algorithm i i E E i i 1 y i i x ' x ' l 1 e y e e + i i = + + x log i i ( ) i + x ' 2 x ' 1 1 1 e y + x ' i i 1 e i i ( ) ( ) x ' x ' x ' 1 e 1 e 1 e l e + e + e + i i i ( ) ( ) y ( ) = + + log l og 1 i y i i + + + x ' x ' x ' x ' x ' x ' 1 1 1 1 1 1 e e e i i i i i i g g l l n n ( ) ( ) ( ) = = = = i i g g g = = 1 1 i i
Beta Distribution Logit Link ' 2 l = i ( ) x ' x ' 1 e e + x ' 2 2 x ' x ' i i 1 e 1 e y e e e i i i + + + x x ' log ' ' i ( ) ( ) ( ) ( ) ( ) ( ) i i 3 4 x ' 1 y x ' x ' x ' 1 1 1 1 e e + x ' x ' 1 1 e e i i i i i i i ( ) ' + x ' 2 2 2 x ' 1 e l e e ( ) i i ( ) ( ) ( ) 2 = + = + 2 i x x ' x x ' ' ' 1 ' ' 1 i E ( ) ( ) ( ) i i i i i i i 4 x ' x ' 1 1 e x ' 1 e i i i ( ) ( ) ( ) ) + ' ' 1 i i = x x ' i i ( 2 ' g i ( ) ( ) ( ) ) + ' ' 1 ' ' 2 2 l n n l i i = = = x x ' X'WX i E E i i ( 2 ' g = = 1 1 i i i x ' x ' 1 ( ) + x ' 2 x ' 1 e e e ' 1 ( ( ) i i ( ) ( ) ) 2 + = + = 2 i ' ' 1 ' i j ( ) ( ) ( ) 2 i i i = = 4 X W x ' x ' 1 1 e x ' 1 e i i i ij 0 i j x ' n
Beta Distribution Logit Link 2 l = i y ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) + + + + x ' 1 ' 1 1 1 1 1 log i i i i i i i i i i i i i i 1 y i 2 l ( ) ( ) ( ) ( ) ( ) ( ) = + x ' 1 ' 1 1 1 i E i i i i i i i i i 1 ( ) = = 1 i j 2 l ( )( ( ) ( ) ( ) ) i i = = = ' g X' Tc ' ' 1 1 E T c i ij i i i i i 0 i j 2 2 l l ( )( ( ) ( ) ( ) ) 2 = = 2 i ' ' ' 1 1 i i E i i i 2 2 ( )( ( ) ( ) ) ( ) 2 + = 2 i 2 ' ' 1 1 - ' i j l ( ) D = = i i i trace E D ij 2 0 i j 2 2 l l ' X'WX c'TX X'Tc ( ) ( ) = = G E G ( ) D trace 2 2 l l 2 ' ^ ( ) 1 ~ ~ ^ ( ) ( ) = = Fisher Scoring Algorithm: At Convercence: MLE E G g New Old ~ ~ = ^ = Old Old
Variance-Covariance Matrix & Starting Values ( = ) = ^ ^ 1 ( ) V E G ^ Starting Values: y ~ ( ) h y = = x ' 1) log Fit a linear regression of log on ,..., and obtain i = X X i 1 Old p 1 1 y i ( ) ( ) + 1 1 ( ) = i i i i 2) 1 V Y ( ) i 1 V Y i ( ) h Y ( ) ( + ) ( h ) ( ) ( ) ( ) ( ) 2 2 3) ' ' ' h Y V h Y V Y h V Y V h Y h i i i i i i i i i i i ^ e e ^ ' ^ ^ e ( ) h Y ( ) h Y ( ) = = 4) Predicted values for step 1) are obtained with residuals: V h Y i i i i ' n p ~ 2 2 x ' e ^ ^ ^ ^ i Old = = 5) ' 1 h i i i i ~ x ' + 1 e i Old ^ ^ ^ ^ ^ ^ 1 1 1 i i i i i i 1 n 1 n n n ^ = = = 6) 1 1 1 ( ) ( ) ^ e e ^ V Y V Y 2 = = ' 1 1 i i ^ ^ i i 1 i i ' n p
First 6 Races & Preliminary OLS Regression Race FPrzp 1 0.479358 2 0.417734 3 0.627149 4 0.427101 5 0.442694 6 0.425168 Intercept FDrvp TrkLng Bank Laps Year93 Year94 Year95 Year96 Year97 Year98 Year99 Year100 1 0.333333 1 1 0.342857 1 0.309524 1 0.333333 1 2.5 31 23 14 24 24 36 200 492 400 328 367 500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.35 1.017 0.75 1.54 1.366 0.533 0.375 X'X X'Y* -29.7349 -11.9358 -45.1689 -674.032 -9883.03 -1.25432 4.878356 0.521718 -0.97256 -6.02386 -7.64708 -6.05889 -7.50756 267 128.34549 128.3455 62.772994 186.67959 389.002 186.67959 5493 2644.3213 87577 42199.767 108118.37 28 15.09928 29 15.974951 29 15.895728 29 15.32624 30 13.282392 31 13.27907 32 14.767442 32 14.232558 389.002 5493 87577 28 29 29 29 30 31 32 32 2644.321264 8227.602 133312 1816343.5 42199.76664 15.09928 15.97495 15.89573 15.32624 13.28239 13.27907 14.76744 14.23256 108118.369 39.092 41.592 41.592 41.592 1816343.5 586.5 595.5 595.5 32217689 9840 9942 9699 9840 28 0 0 9942 0 29 0 9699 0 0 29 9326 0 0 0 9658 0 0 0 9730 0 0 0 9930 0 0 0 9965 0 0 0 704.3114 8227.602 44.9 610 9658 46.4 630 9730 47.9 649 9930 47.9 649 9965 595.5 9326 phi.hat 95.54076 39.092 41.592 41.592 41.592 44.9 46.4 47.9 47.9 586.5 595.5 595.5 595.5 610 630 649 649 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 0 0 0 0 30 0 0 0 31 0 0 32 0 32 INV(X'X) 1.105979 -1.63231 4.3745291 -0.0110773 -0.000830552 -9.51016E-05 -0.660415 -0.711899 -0.700746 -0.616064 -0.240414 -0.178242 -0.323522 -0.250297 -0.1266 -0.0110773 0.0423678 -0.000632194 0.000241477 0.001802 0.002134 0.004128 0.007016 0.003209 0.004985 0.006181 0.005732 0.001269 -0.0008306 -0.0006322 5.9132E-05 -3.78788E-06 0.000154 0.000179 0.000145 -0.000808 -9.51E-05 0.0002415 -3.78788E-06 1.66908E-06 1.5E-05 0.208386 -0.6604147 0.0018022 0.000153588 1.49632E-05 0.172472 0.144542 0.142863 0.130087 0.073373 0.063992 0.085928 0.074873 0.225714 -0.7118992 0.0021345 0.000179066 1.96701E-05 0.144542 0.187445 0.151182 0.137455 0.076309 0.066226 0.089885 0.077964 0.214484 -0.7007457 0.0041276 0.000145057 3.3396E-05 0.142863 0.151182 0.172036 -0.616064 0.007016 8.00271E-05 5.29962E-05 0.130087 0.137455 0.136216 0.159282 0.071535 0.063098 0.083696 0.073341 0.040517 -0.2404138 0.0032089 6.15076E-05 2.84179E-05 0.073373 0.076309 0.075891 0.071535 0.084056 0.047493 0.010542 -0.1782418 0.004985 1.9721E-05 4.04394E-05 0.063992 0.066226 0.066078 0.063098 0.047493 0.077512 0.051302 0.048279 0.061803 -0.3235218 0.0061809 3.62621E-05 4.93482E-05 0.085928 0.089885 0.089415 0.083696 0.035403 -0.250297 0.0057316 2.65223E-05 4.5933E-05 0.074873 0.077964 0.077665 0.073341 Beta.OLS -0.73544 2.562906 -0.11191 -0.00196 -0.00077 -0.22386 -0.04374 -0.19341 -0.2045 -0.14514 -0.16035 -0.19072 -0.19231 -1.63231 -0.1265998 0.001269493 -0.000808033 0.208386 0.225714 0.214484 0.172036 0.040517 0.010542 0.061803 0.035403 8E-05 5.3E-05 6.15E-05 2.84E-05 1.97E-05 4.04E-05 3.63E-05 4.93E-05 2.65E-05 4.59E-05 1.97E-05 3.34E-05 0.18403 0.136216 0.075891 0.066078 0.089415 0.077665 0.05556 0.05151 0.05556 0.051302 0.093446 0.056734 0.05151 0.048279 0.056734 0.08375
Iterative Results for Parameter Theta.0 Intercept FDrvp TrkLng Bank Laps Year93 Year94 Year95 Year96 Year97 Year98 Year99 Year100 Theta.1 -0.731284 2.544194 -0.110613 -0.001962 -0.000760 -0.222567 -0.044158 -0.192415 -0.203192 -0.144177 -0.158511 -0.189241 -0.190480 Theta.2 -0.731246 2.544099 -0.110614 -0.001961 -0.000760 -0.222553 -0.044146 -0.192401 -0.203180 -0.144168 -0.158514 -0.189233 -0.190476 Theta.3 Theta.4 SE{Theta} z Pr(>|z|) -0.735444 2.562906 -0.111911 -0.001962 -0.000769 -0.223865 -0.043741 -0.193411 -0.204499 -0.145140 -0.160355 -0.190719 -0.192309 95.565889 102.345838 102.873907 -0.731246 2.544099 -0.110614 -0.001961 -0.000760 -0.222553 -0.044146 -0.192401 -0.203180 -0.144168 -0.158514 -0.189233 -0.190476 102.876659 -0.731246 0.207423 -3.525391 0.000423 2.544099 0.412854 6.162223 0.000000 -0.110614 0.040533 -2.728974 0.006353 -0.001961 0.001516 -1.293706 0.195767 -0.000760 0.000254 -2.988006 0.002808 -0.222553 0.081781 -2.721333 0.006502 -0.044146 0.085254 -0.517820 0.604584 -0.192401 0.084458 -2.278070 0.022722 -0.203180 0.078572 -2.585924 0.009712 -0.144168 0.057200 -2.520435 0.011721 -0.158514 0.055034 -2.880290 0.003973 -0.189233 0.060279 -3.139288 0.001694 -0.190476 0.057151 -3.332844 0.000860 102.876659 8.861139 11.609868 0.000000 delta 45.9681 0.2788564 7.57474E-06 5.4161E-15
Diagnostic Measures y ^ ^ Pseudo-R^2 0.390614873 ( ) g y = = 2 ix ' Pseudo Squared Correlation Between ln and i R g i i 1 y i ^ ^ ^ ^ = Deviance Residuals: sgn 2 , , , , r y l y y l y i i Di i i i i i i ( ) ( ( ) ( ) ( ) ( ) ( 1 ln 1 ( ) ( ) ( ) ( ) ) ( ) 1 ln ( ) ) = + + where: , , ln ln ln 1 1 l y y y i i i i i i i i i ^ ^ ^ = , , , , l y y l y i i i i i i y ^ ^ ^ ^ ^ ^ ^ ^ ( ) + + ln ln ln 1 ln 1 ln i y y y i i i i i i 1 y i ^ ^ y y = = Pear son Residuals: i i i i r Pi ^ ^ ^ V Y 1 i i i ^ + 1
Influence Measures h h h h = h h 11 12 1 n ( ) 1 21 22 2 n = 1/2 1/2 H W X X'WX X'W Hat Matrix: h h h 1 2 n n nn ^ ^ 2 ii Pi h r y y = = = Cook's D: i i i i C r ( ) i Pi 2 ' 1 ^ p h ^ ^ V Y 1 ii i i i ^ + 1 1 ( ) ( ) ( ) 1 = + = AM Af f'AM b A TX X'QX X'T Generalized Leverage: , GL ( ( ) ) '' ' g g 1 y ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) + + i Q diag ' ' 1 ln 1 i ( ) ( ) i i i i ( ) 2 1 y ' g i i i y ( ) ( ) y ( ) ( ) ln 1 c 1 1 1 ( ) ( ) 1 1 1 1 y 1 y y 1 1 1 1 = = M f b diag ( ) 1 y y i i y y ( ) ( ) n n ( ) ( ) ln 1 n y c ( ) 1 y y ( ) n n n 1 n n n
Pearson Residuals versus Linear Predictor 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Pearson Residual 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 -4.5 -5 -5.5 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Linear Predictor
Pearson Residuals versus Race Order 5.5 4.5 3.5 2.5 1.5 Pearson Residual 0.5 -0.5 -1.5 -2.5 -3.5 -4.5 -5.5 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
Cook's D 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 271 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
GL(Beta,phi) versus Predicted Values 0.1 0.09 0.08 Generalized Leverage 0.07 0.06 0.05 0.04 0.03 0.35 0.4 0.45 0.5 0.55 0.6 0.65 Predicted
R Program ### Fisher Scoring Method ford <- read.csv("http://www.stat.ufl.edu/~winner/data/nas_ford_1992_2000a.csv",header=T) attach(ford); names(ford) library(betareg) Year <- factor(Year) Track_id <- factor(Track_id) beta.mod1 <- betareg(FPrzp ~ FDrvp + TrkLng + Bank + Laps + Year) summary(beta.mod1) resid(beta.mod1,type="pearson") resid(beta.mod1,type="deviance") cooks.distance(beta.mod1) gleverage(beta.mod1) hatvalues(beta.mod1) par(mfrow=c(2,2)) plot(beta.mod1,which=1:4,type="pearson")