Ridge Regression for Carbon Emissions and Population Characteristics in China

ridge regression n.w
1 / 16
Embed
Share

Explore the impacts of population change on carbon emissions in China from 1978-2008 using ridge regression. The study analyzes variables such as population, urbanization rate, working-age population percentage, household size, and per capita expenditures. Ridge regression is employed to address high correlation among independent variables and produce more accurate estimates. Eigenvalues, VIF, and Ridge Estimator are discussed in this informative analysis.

  • Ridge Regression
  • China
  • Carbon Emissions
  • Population Characteristics
  • Regression Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Ridge Regression Population Characteristics and Carbon Emissions in China (1978-2008) Q. Zhu and X. Peng (2012). The Impacts of Population Change on Carbon Emissions in China During 1978-2008, Environmental Impact Assessment Review, Vol. 36, pp. 1-8

  2. Data Description/Model Data Years: 1978-2008 (n = 31 Years) Dependent Variable Carbon Emissions (million-tons) Independent Variables Population (10,000s) Urbanization Rate (%) Percentage of Population of Working Age (%) Average Household Size (persons/hhold) Per Capita Expenditures (Adjusted to Year=2000) ( ) ( ) P ( ) ( ) ( ) ( ) = + + = + + + + + + ln Short-hand Notation: ln ln + ln X ln + ln X C U W + H X E + 0 t P t U t W t H t E t t Y X X 0 1 1 2 2 3 3 4 4 5 5 t t t t t t t

  3. Correlation Transformation ( ) n n 2 X X X j X s X tj tj Y Y j tj = = = = * * tj 2 j = = 1 1 t t t Y X X s j t 1 n n 1 1 s n n Y j * 11 * 21 * 12 * 22 * 15 * 25 * X X X X X X Y Y 1 * = = * * 2 X Y * 31,1 * 31,2 * 31,5 * X X X Y 31 1 r r 15 r r r r 12 1 1 Y 21 25 2 Y = = = = * * * * X 'X R X 'Y R XX XY 1 51 r 52 r r 5 Y

  4. Data R_XX 1.0000 0.9753 0.9712 -0.9873 0.9847 R_XY 0.952534 0.974802 0.960147 -0.97861 0.982587 0.9753 1.0000 0.9801 -0.9907 0.9952 0.9712 0.9801 1.0000 -0.9802 0.9773 -0.9873 -0.9907 -0.9802 1.0000 -0.9942 0.9847 0.9952 0.9773 -0.9942 1.0000 Note that the X-variables are very highly correlated, causing problems when it is inverted and used to obtain the least squares estimate of and is variance-covariance matrix. Eigenvalues of X* X* : (X X)-1: Number 1 2 3 4 5 Eigenvalue 4.9346 0.0311 0.0249 0.0064 0.003 Percent 98.692 0.622 0.498 0.129 0.059 Cumulative 98.692 99.314 99.812 99.941 100 INV(R_XX) 50.62 36.87 -9.85 32.98 -44.13 36.87 147.72 -27.81 21.50 -134.78 -9.85 -27.81 31.40 13.30 19.91 32.98 21.50 13.30 122.38 54.80 -44.13 -134.78 19.91 54.80 213.60 VIF1 = 50.62 VIF2 = 147.72 VIF3 = 31.40 VIF4 = 122.38 VIF5 = 213.60

  5. Ridge Regression Method of producing a biased estimator of that has a smaller Mean Square Error than OLS Mean Square Error of Estimator = Variance + Bias2 Ridge estimator trades of bias for large reduction of variance when the predictor variables are highly correlated Problem: Choosing the shrinkage parameter c We will work with the standardized regression model based on the correlation transformed variables, then back transform the regression coefficients to original scale

  6. Ridge Estimator (Standardized X, Y) ) c V + = R = X 'X I X 'Y ( ) ( ) ^ ^ -1 -1 = * * * * * * 2 V = X 'X X 'Y X 'X OLS OLS ( ( ) ( )( ) ^ ^ -1 -1 -1 + + * * * * * * * * * * 2 0 c c c X 'X I X 'X X 'X I R 1 1 c 1 c ( ) ( ) ( ) ^ -1 -1 -1 = + = + = * * * * * * * * * * c X 'X I X 'Y I X 'X I X 'X X 'Y R 1 1 ( ) ( ) ( ) ^ -1 -1 -1 -1 = + = + * * * * * * * * X 'X I X 'X X 'Y X 'X I c c OLS ( ) ( ) ( ) -1 -1 -1 -1 A +B = A A+B B = B A+B A No te: ^ ^ OLS j V V , , R j p p p p 1 j = = ( ) 2 2 2 + c = = = = 1 1 1 1 j j j j j j * * th X 'X eigenvalue of j j Note the unconventional notation of as the standardized regression coefficient vector

  7. China Carbon Emissions Data (c = 0.20) _i 4.9346 0.0311 0.0249 0.0064 0.0030 Sum 1/ _i 0.2027 32.1543 40.1606 156.2500 333.3333 562.1010 INV(X*'X*) 50.617 36.874 -9.848 32.980 -44.128 beta_OLS -0.931 -1.045 0.207 -0.775 1.967 36.874 147.722 -27.808 21.499 -134.777 -9.848 -27.808 31.397 13.302 19.914 32.980 21.499 13.302 122.378 54.796 -44.128 -134.777 19.914 54.796 213.603 _i 4.9346 0.0311 0.0249 0.0064 0.0030 Sum _i/( _i+0.20)^2 0.1872 0.5823 0.4923 0.1502 0.0728 1.4848 INV(X*'X*+0.2I) 3.617 -0.746 -0.798 0.957 -0.906 beta_R 0.124 0.203 0.168 -0.215 0.234 -0.746 3.803 -0.888 0.936 -1.042 -0.798 -0.888 3.540 0.848 -0.788 0.957 0.936 0.848 3.891 0.971 -0.906 -1.042 -0.788 0.971 3.888 ^ ^ OLS j V V , , R j 5 5 = = Note the difference: 562.1010 1.4848 2 2 = = 1 1 j j The estimated regression coefficients have changed large amounts and in signs for Population and Urbanization rate

  8. Back-Transforming Coefficients to Original Scale Letting represent the coefficient in original (log, in this example) scale: j s s = = 1,..., j p Y j j j = Y X X 1 p 0 1 p s s ^ ^ = = 1,..., j p Y , , R j R j j ^ ^ ^ = Y X X 1 p ,0 ,1 , R R R p

  9. Choosing the Shrinkage Parameter, c Ridge Trace Plot of the standardized ridge regression coefficients versus c and observe where they flatten out CcStatistic Similar to Cp used in regression model selection PRESS Statistic extended to Ridge Regression Cross- Validation Sum of Squares for left-out residuals Generalized Cross-Validation Similar to PRESS, based on prediction Plot of VIFs versus c and observe where they all fall below 10

  10. Cc - Statistic ( ) 1 = + * * * * H X X 'X I X ' c c p ( ) ( ) 1 1 ( ) j = + = + = * * * * * * * * H X X 'X I X ' X 'X X 'X I tr tr tr c c ( ) c 2 + c = 1 j j SSE s SSE s ( ) ( ) ( ) = + + = = + + H H 2 2tr 2 1 tr c c C n C n c c c c 2 2 ^ ^ * * * * Y X where: ( ) c ( ) c SSE ' Y X R R c ^ ^ = = * * * * 2 Y X s MSE ' Y X OLS OLS OLS ( ) H 1+tr "Effective Sample Size", replacing in p C c p (Model) s SSE = + = + 2 ' p where # of predictors in current model and ' 1 C n p p p p 2 Goal: Choose c to minimize Cc

  11. PRESS Statistic 2 e n = , i c n PR Ridge 1 (1/ ) h = 1 i , ii c ^ = * i * x ' ( ) c e Y R , i c i = * i * 1 i * i * i x ' X X X 2 2 ( ) ^ 1 = + * * * * ( ) c c X 'X I X 'Y R h h h h 1 , n c h h 11, 12, c c ( ) 1 21, 22, 2 , n c c c = + = * * * * H X X 'X I X ' c c n c h h nn c h 1, 2, , n c Goal: Choose c to minimize PRRidge

  12. Generalized Cross Validation n 2 , i c e = = 1 i GCV ( ) 2 ( ) 1 tr + H n c ^ = * i * x ' ( ) c e Y R , i c i = * i * 1 i * i * i x ' X X X 2 2 ( ) ^ 1 = + * * * * ( ) c c X 'X I X 'Y R h h h h 1 , n c h h 11, 12, c c ( ) 1 21, 22, 2 , n c c c = + = * * * * H X X 'X I X ' c c n c h h nn c h 1, 2, , n c Goal: Choose c to minimize GCV

  13. Cc , PRESS, GCV for China Carbon Data c C_c 6.0000 5.9166 5.8859 5.8991 5.9491 6.0297 6.1359 6.2635 6.4088 6.5689 6.7412 6.9235 7.1139 7.3110 7.5134 7.7198 7.9294 PRESS 0.1348 0.1344 0.1344 0.1346 0.1352 0.1359 0.1369 0.1379 0.1391 0.1404 0.1417 0.1431 0.1445 0.1459 0.1474 0.1489 0.1504 GCV 0.0000 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.0010 0.0011 0.0012 0.0013 0.0014 0.0015 0.0016 0.00015984 0.00015931 0.00015912 0.00015920 0.00015951 0.00016001 0.00016067 0.00016145 0.00016234 0.00016332 0.00016436 0.00016546 0.00016660 0.00016778 0.00016899 0.00017021 0.00017145 All of these methods select very low values for c. The graphical methods tend to choose larger values for the stabilization of the regression coefficients and VIFs.

  14. Variance Inflation Factors ( ) -1 = -1 * * R X 'X diagonal elements of VIF For Ridge Regression , we have: ( ) ( ) -1 -1 + + * * * * * * X 'X I X 'X X 'X I ( ) diagonal elements of VIF c c c

  15. Final Model Estimated Regression Coefficients The Residual based measures Cc, PRESS, and GCV suggest very small values of c The Ridge Trace suggests larger value, with coefficients stabilizing above c = 0.15 or so The VIF plot suggests values above c = 0.03 having all VIF values less than 10 The authors used c = 0.20, based on the ridge trace Variable lnPop lnUrban lnWorkforce lnHholdSz lnExpend constant beta-hat 0.5540 0.3332 1.3212 -0.7835 0.1645 -2.0923

Related


More Related Content