Generalized Linear Mixed Model Analysis in English Premier League Soccer 2003/2004 Season

generalized linear mixed model n.w
1 / 27
Embed
Share

Explore the application of a Generalized Linear Mixed Model in analyzing the English Premier League Soccer 2003/2004 Season. Understand the distribution of response variables, consider fixed and random factors, and delve into the performance of various teams. Discover insights into offensive and defensive strategies, goal-scoring patterns, and correlations in the league.

  • Soccer
  • Analysis
  • Premier League
  • Generalized Linear Model
  • Sports

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Generalized Linear Mixed Model English Premier League Soccer 2003/2004 Season

  2. Introduction English Premier League Soccer (Football) 20 Teams Each plays all others twice (home/away) Games consist of two halves (45 minutes each) No overtime Each team is on offense and defense for 38 games (38 first and second halves) Response Variable: Goals in a half Potential Independent Variables Fixed Factors: Home Dummy, Half2 Dummy, Game#(1-38) Random Factors: Offensive Team, Defensive Team Distribution of Response: Poisson?

  3. Preliminary Summary Team Arsenal Aston Villa Blackburn Charlton Everton Leeds United Liverpool Manchester United Newcastle Tottenham Off Goals 73 48 51 51 45 40 55 64 52 47 Def Goals 26 44 59 51 57 79 37 35 40 57 Team Southampton Wolverhampton Birmingham Bolton Chelsea Fulham Leicester City Manchester City Middlesbrough Portsmouth Off Goals 44 38 43 48 67 52 48 55 44 47 Def Goals 45 77 48 56 30 46 65 54 52 54 Half2 0 1 Goals 461 551 Goals by Game Order Home 0 1 Goals 440 572 45 40 35 30 Total Goals 25 20 15 DW 10 5 2.03335 0 0 5 10 15 20 25 30 35 40 Game Order

  4. Summary of Previous Slide Teams vary extensively on offense and defense Offense: min=38, max=73, mean=50.6, SD=8.85 Defense: min=26, max=79, mean=50.6, SD=13.75 Strong Negative correlation between off/def: r=-0.80 Home Teams outscore Away Teams 1.3:1 Second Half outscores First Half 1.2:1 No evidence of autocorrelation in total goals scored over weeks, Durbin-Watson Stat = 2.03

  5. Marginal Analysis No Team Effects Break Down Goals by Home/Half2 (380 Games) Goals Mean Variance Obs freqs 0 1 2 3 4 5 6+ Home1 0.6921 0.6886 Road1 0.5211 0.5141 Home2 0.8132 0.9122 Road2 0.6368 0.6277 Exp freqs 0 1 2 3+ 190.20 131.64 45.55 12.61 225.68 117.59 30.64 6.09 168.51 137.03 55.71 18.75 201.00 128.01 40.76 10.23 192 127 48 12 1 0 0 223 124 26 6 1 0 0 175 130 56 10 8 1 0 198 133 41 6 1 1 0 Chi-Sq 0 1 2 3+ Sum df CV(.05) P-value 0.0171 0.1633 0.1314 0.0120 0.3238 2 5.991 0.8505 0.0318 0.3493 0.7014 0.1350 1.2175 2 5.991 0.5440 0.2497 0.3604 0.0015 0.0034 0.6151 2 5.991 0.7353 0.0449 0.1946 0.0014 0.4846 0.7256 2 5.991 0.6957 Corr Home1 Road1 Home2 Road2 Home1 1.0000 -0.0445 0.0970 0.1184 Road1 -0.0445 1.0000 0.1079 0.0460 Home2 0.0970 0.1079 1.0000 -0.0794 Road2 0.1184 0.0460 -0.0794 1.0000

  6. Summary of Previous Slide Means (Variances) for 4 Half Types: Home/1stHalf: Mean = 0.692 Variance = 0.689 Away/1stHalf: Mean = 0.521 Variance = 0.514 Home/2ndHalf: Mean = 0.813 Variance = 0.912 Away/2ndHalf: Mean = 0.637 Variance = 0.628 Thus, means and variances in strong agreement Chi-Square Statistics for testing for Poisson: Df = (4 categories-1)-(1 Parameter estimated) = 2 P-values all exceed 0.50 (.8505, .5440, .7353, .6957) Goals scored consistent with Poisson Distribution

  7. Observed & Expected Counts 250 200 150 Frequency observed expected 100 50 0 0 1 2 3+ 0 1 2 3+ 0 1 2 3+ 0 1 2 3+ Home/1st Half Away/1st Half Home/2nd half Away/2nd Half

  8. Generalized Linear Models Dependent Variable: Goals Scored Distribution: Poisson Link Function: log Independent Variables: Home, Half2 Dummy Variables Models: ( ) = + Home + Model : 1 log ( ) 2 E Y Home Half 0 Half2 ( ) = + Home + + HomeHalf2 Model2 : log ( ) 2 * 2 E Y Home Half Home Half 0 Half2 Model fit using generalized linear model software packages

  9. Parameter Estimates / Model Fit Model 1 Distribution Poisson Link Function Log Dependent Variable goals Number of Observations Read 1520 Number of Observations Used 1520 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1517 1650.4574 1.0880 Scaled Deviance 1517 1650.4574 1.0880 Pearson Chi-Square 1517 1549.2570 1.0213 Scaled Pearson X2 1517 1549.2570 1.0213 Log Likelihood -1411.0226 Algorithm converged.

  10. Parameter Estimates / Model Fit Model 1 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6397 0.0588 -0.7549 -0.5245 118.48 home 1 0.2624 0.0634 0.1381 0.3866 17.12 half2 1 0.1783 0.0631 0.0546 0.3020 7.98 Scale 0 1.0000 0.0000 1.0000 1.0000 Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 home <.0001 half2 0.0047 Scale NOTE: The scale parameter was held fixed.

  11. Parameter Estimates / Model Fit Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1516 1650.3613 1.0886 Scaled Deviance 1516 1650.3613 1.0886 Pearson Chi-Square 1516 1549.7072 1.0222 Scaled Pearson X2 1516 1549.7072 1.0222 Log Likelihood -1410.9745 Algorithm converged.

  12. Parameter Estimates / Model Fit Model 2 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 -0.6519 0.0711 -0.7912 -0.5126 84.15 home 1 0.2839 0.0941 0.0995 0.4683 9.10 half2 1 0.2007 0.0958 0.0129 0.3885 4.39 home*half2 1 -0.0395 0.1274 -0.2891 0.2101 0.10 Scale 0 1.0000 0.0000 1.0000 1.0000 Parameter Pr > ChiSq Intercept <.0001 home 0.0026 half2 0.0363 home*half2 0.7566 Scale NOTE: The scale parameter was held fixed.

  13. Testing for Home/Half2 Interaction H0: No Home x Half2 Interaction ( HomeHalf2 = 0) HA: Home x Half2 Interaction ( HomeHalf2 0) Test 1 Wald Test Test 2 Likelihood Ratio Test : Test Wald 2 ^ 2 HomeHalf2 0395 . 0 = = = 2 obs T.S. : . 0 0961 X . 0 1274 ^ HomeHalf2 SE ( ) = = 2 1 . 0961 7566 . P P Likelihood ratio Test : (-2log(lik = = T.S. eihood(H )) (-2log(lik eihood(H )) 0 A 1411 P 1410 ( 2 = ( ( 2 = . 0226 )) ( 9745 . )) 0962 . 0 ( ) = 2 1 . 0962 7564 . P

  14. Testing for Main Effects for Home & Half2 Wald tests only reported here (both effects are very significant) Tests based on Model 1 (no interaction model) Home = Home Home Effect : : 0 : 0 H H 0 A 2 ( ) . 0 2624 = = = 2 obs 2 1 . : . S 17 13 . 17 13 . . 0001 T X P P . 0 0634 = Half2 Effect : : 0 : 0 H H 0 Half2 Half2 A 2 ( ) . 0 1783 = = = = 2 obs 2 1 . : . S . 7 98 . 7 98 0047 . T X P P . 0 0631

  15. Interpreting the GLM Model = : + Home + Half2 = 2 Home Half ( ) E Y e 0 = = = = Away/Half1 ( , 0 2 ) 0 ( ) Home Half E Y e 0 + Home = = = = Home/Half1 ( , 1 2 ) 0 ( ) Home Half E Y e 0 + Half2 = ) 1 = = = Away/Half2 ( , 0 2 ( ) Home Half E Y e 0 + Home + Half2 = ) 1 = = = Home/Half2 ( , 1 2 ( ) Home Half E Y e 0 Estimated Means : ^ ^ 6397 . = = = Away/Half1 . 0 5275 e e 0 ^ ^ ^ + Home 6397 . 0 2624 . 0 + = = = = Home/Half1 . 0 53 . 1 ( 30 ) . 0 686 e e 0 ^ ^ ^ + Half2 6397 . 0 1783 . 0 + = = = = Away/Half2 . 0 53 . 1 ( 20 ) . 0 630 e e 0 ^ ^ ^ ^ + Home + Half2 . 0 2624 . 0 + 1783 . 0 + = = = = 6397 Home/Half2 . 0 686 . 1 ( 20 ) . 0 820 e e 0

  16. Incorporating Random (Team) Effects Teams clearly vary in terms of offensive and defensive skills (see slide 3) Since many factors are inputs into team abilities (players, coaches, chemistry), we will treat team offensive and defensive effects as Random There will be 20 random offensive effects (one per team) and 20 defensive effects

  17. Random Team Effects All effects are on log scale for goals scored Offense Effects: oi ~ NID(0, o2) Defense Effects: di ~ NID(0, d2) In Estimation process assume COV(oi,di)=0 which seems a stretch (but we can still observe the covariance of the estimated random effects)

  18. Mixed Effects Model Fixed Effects: Intercept, Home, Half2 ( ) Random Effects: Offteam, Defteam ( ) Conditional Model (on Random Effects) ( ) = + + + + log 2 Home Half 0 Home = k Half2 = , , ijkl i j k Off l k Def l 2 , 1 = Home 2 , 1 = ,..., 1 = 20 ,..., 1 = 20 i j l = = , 0 , 1 2 , 0 2 1 Home Half Half 1 2 1 2 Intercept Home 2 nd Effect k Half effect 0 Home Half2 Offense Defense Effect for Team Effect for Team l , , Off k Def l ( , 0 ) ( , 0 ) ( ) 0 = 2 o 2 d ~ ~ , NID NID COV , , , , Off k Def l Off k Def l

  19. Model in Matrix Notation - Example = + = = + = + + Y e X Z X Z Z ( ) log( ) g O O D D League has 3 Teams: A, B, C Order of Entry of Games: A@B, A@C, B@C, B@A, C@A, C@B Order of Entry of Scores within Game: Home/1st, Away/1st, Home/2nd, Away/2nd 3 Offense Effects, 3 Defense Effects, 24 Observations 0 OA DA = = = Home O OB D DB Half2 OC DC

  20. Model Based on 3 Teams g Z X + = ) ( = + = + + y e X Z Z O O D D 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 X= Z0= ZD=

  21. Sequence of Potential Models 1. No fixed or random effects (common mean) 2. Fixed home and second half effects, no random effects 3. Fixed home and second half effects, random offense team effects 4. Fixed home and second half effects, random defense team effects 5. Fixed home and second half effects, random offense and defense team effects

  22. Results Estimates (P-Values) Home Half2 o2 d2 Res2 Model -2lnL AIC BIC 1 -.407 (.0001) N/A N/A N/A N/A 1.044 5001.9 5003.9 5009.3 2 -.6397 (.0001) .2624 (.0001) .1783 (.0052) N/A N/A 1.0213 4992.3 4994.3 4999.6 3 -.6413 (.0001) .2624 (.0001) .1783 (.0050) .01004 (.143*) N/A 1.0099 4985.6 4989.6 4991.6 4 -.6592 (.0001) .2624 (.0001) .1783 (.0040) N/A .0588 (.012*) 0.9630 4958.6 4962.6 4964.6 5 -.6605 (.0001) .2624 (.0001) .1783 (.0039) .0084 (.162*) .0549 (.012*) 0.9531 4951.9 4957.9 4960.9 Based on Z-test, not preferred Likelihood Ratio Test H0: o2 = 0 vs HA: 02>0 TS: 4958.6-4951.9=6.7 P=0.5P( 12 6.7)=.005 Based on AIC, BIC, Model with both offense and defense effects is best No interaction found between team effects and home or half2

  23. Goodness of Fit We Test whether the Poisson GLMM is appropriate model by means of the Scaled Deviance H0: Model Fits HA: Model Lacks Fit Deviance = 1570.7 DF = N-#fixed parms = 1520-3=1517 P-value=P( 2 1570.7)=0.1646 No Evidence of Lack-of-Fit* * If we use Scaled Deviance, we do reject, where scaled deviance=1570.7/0.9531=1647.9

  24. Best Linear Unbiased Predictors (BLUPs) Estimated Team (Random) Effects (Teams with High Defense values Allow More Goals) Estimated Fixed Effects Team Arsenal Aston Villa Birmingham Blackburn Bolton Charlton Chelsea Everton Fulham Leeds United Off Effect 0.1284 -0.0170 -0.0469 0.0049 -0.0142 0.0030 0.0941 -0.0325 0.0079 -0.0582 Def Effect Team -0.4016 -0.0873 -0.0262 0.1333 0.0914 0.0205 -0.3255 0.1046 -0.0549 0.3758 Off Effect -0.0120 0.0240 0.0281 0.0775 -0.0398 0.0065 -0.0208 -0.0414 -0.0201 -0.0712 Def Effect 0.2112 -0.2018 0.0649 -0.2348 0.0335 -0.1516 0.0630 -0.0724 0.1050 0.3529 Parameter Intercept Home Half2 Estimate -0.6605 0.2624 0.1783 Leicester City Liverpool Manchester City Manchester United Middlesbrough Newcastle Portsmouth Southampton Tottenham Wolverhampton For each Halfijkl compute exp{-0.6605+HOMEi+HALF2j+ok+dl} as the BLUP

  25. Comparison of BLUPs with Actual Scores For Each Team Half, we have Actual and BLUP Correlation Between Actual & BLUP = 0.2655 Concordant Pairs of Halves (One scores higher on both Actual and BLUP than other) = 452471 Discordant Pairs of Halves = 355617 Gamma = (452471-355617)/(452471+355617)=0.1199 Evidence of Some Positive Association Between actual and predicted scores

  26. "Distribution" of BLUPs by Actual Goals Scored 3 2.5 2 Normal Density 0 1 1.5 2 3+ 1 0.5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 BLUP Sources: Data: SoccerPunter.com Methods: Littell, Milliken, Stroup, Wolfinger(1996). SAS System for Mixed Models Wolfinger, R. and M. O Connell(1993). Generalized Linear Mixed Models: A Pseudo-Likelihood Approach, J. Statist. Comput. Simul., Vol. 48, pp. 233-243.

  27. SAS Code data one; infile 'engl2003d.dat'; input hteam $ 1-20 rteam $21-40 goals 47-48 half2 56 home 64 round 71-73; if home=1 then do; offteam=hteam; defteam=rteam; end; else do; offteam=rteam; defteam=hteam; end; %include 'glmm800.sas'; %glimmix(data=two, procopt=method=reml, stmts=%str( class offteam defteam; model goals = home half2 /s; random offteam defteam /s ; ), error=poisson, link=log); run;

Related


More Related Content