Practical Approach to Weighting Sub-Populations in Mortality Research

Slide Note

Develop sub-population weights for life expectancy research using popular software like STATA and Excel. Explore challenges, proposed solutions, and empirical illustrations for estimating weights and analyzing mortality inequality.

jann_213 Follow

Uploaded on Mar 04, 2025 | 3 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

WEIGHTING SUB-POPULATIONS IN MORTALITY LONGEVITY RESEARCH: A PRACTICAL APPROACH Adam Szulc Institute of Statistics and Demography Warsaw School of Economics The 5th Polish Stata Users Group Meeting Warsaw School of Economics, November 27, 2017 1

THE GOAL: to develop sub-population weights for life expectancy research, using popular software (here: STATA and Excel) MOTIVATION: population structure assumed in life tables is different from the actual (e. g. due to migrations), hence using both types of population weights yields different average life expectancies (in the present study differences vary from 0.2 to 0.38 years) THE IDEA: to construct a set of weights holding two conditions: weighted average group-specific life expectancies yield overall life expectancy derived from life tables ensuring a minimum distance (will be defined formally later) from the actual population shares PROBLEM: scarcity of optimisation tools in STATA 2

POSSIBLE APPLICATIONS: in construction of aggregate life tables (e. g. world life tables) in calculation of mortality inequality indicators (between countries, regions etc.) EXISTING SOLUTIONS: 1. based on specialised software (e. g. MatLab) 2. matrix algebra by Anand, Shkolnikov et al (hereafter A & S) In both cases a solution to a quadratic programming problem is obtained DRAWBACKS: 1. MatLab price and availability 2. time consuming solutions (especially, when have to be repeated for ages from 0 to 110 for both sexes or to produce trends) and possibility of obtaining negative weights in the second solution 3

PROPOSED SOLUTIONS: 1. Applying STATA (or other popular software) constrained regressions to utilise the least squares minimisation algorithms: the solution is time effective, once the codes are written virtually has no restrictions on the dataset size negative weights are still possible (though less likely) 2. Using Excel Solver minimisation algorithms: the solution ensures weights positivity is time consuming and is restricted to small and medium dataset (from 50 to 100 sub-groups, depending on the optimisation method) 4

AN EMPIRICAL ILLUSTRATION: 1. 12 countries selected from Human Mortality Database (intentionally characterised by large disparities in life expectancy and size): men and women separately 2. 80 regions of Russia: men and women together THE CALCULATIONS: 1. estimation of weights using group-specific life expectancies, population shares and overall life expectancies 2. ranges, Gini and Theil inequality indices for the whole populations (i. e. 12 countries altogether and Russia as a whole) 3. decomposition of Theil indices between country groups for 12 countries 5

1. ESTIMATION OF THE WEIGHTS 1.1 The algorithms for minimising sum of deviation squares. The weights by which life-expectancies of n population sub-groups at age x (???) are weighted together to an average life-expectancy (??), may be written as a system of two equations: ??? ??= ?? ? (1) ?=1 ??? ??? ??= 1 ? (2) ?=1 where: ???- a number of the people at age x in i-th group (i = 1, 2, , n), ??- a total number of the people at age x. For the present purposes it is not necessary to know both ???and ??, therefore the weights ??? ??, being a sufficient solution (also in inequality calculations), are denoted hereafter as ???. 6

The algorithm proposed in the present study utilises constrained regression which is included in standard statistical/econometric packages. Hereafter, the age subscript x is dropped, as the algorithm is identical for each age group. Let vi denotes i-th actual population share. The weights wi are the solution to the following minimization problem n = i 2) min ( w v (3) i i w 1 such that n n = i and = = 1 w e e i w (4) i i = 1 1 i Alternatively, a minimisation of sum of absolute values may be employed (presented later on). 8

To take an advantage of minimisation algorithms included in statistical/econometric packages one should write an estimated weight wi as a function of the population share, say f(vi). For a quadratic specification (eqn 5) the weight may be estimated using a linear algorithm 2) = = + + ( ) ( w f v a v bv c (5) i i i i Hence, the minimization problem is equivalent to the constrained estimation of the parameters a, b and c by the least squared method, under following constraints n n n = = = 2 + + = a e v b e v c e e (6) i i i i i 1 1 1 i i i n n = i + b 2 i + = 1 a v v nc (7) i = 1 1 i Once the parameters are estimated, the weights may be calculated using eqn (5). 9

In this study the STATA constrained least squares method (command cnsreg) is used. It is also possible to rewrite eqns (5) - (7) in the way allowing estimation of constrained regression models when the only available constraint is imposing the intercept equal to zero. This method is described in details in the next section, presenting the algorithm based on minimisation of the absolute deviations, which may be an alternative to the least squares method. 10

1.2 The algorithm for minimizing sum of absolute deviations. The general principles of the estimation of the weights are identical. The only difference is in construction of egn (3) which takes the form n = i min w v (8) i i w 1 In the econometric theory this type of estimation is known as the least absolute deviations regression (LAD) or Laplace regression (Koenker and Bassett, 1978). In theory, there is no ground to give the previous type of optimisation superiority over this one, however one serious practical limitation exists: in the most of statistical/econometric packages constrained LAD optimization is not available. Among others, few (e. g. TSP - Time Series Processor) allow only one type of constraint: zero intercept (c in eqn 5). In that case, one have to rewrite dependent (say wwi) and independent (say vvi) variables and to estimate the equation by means of the LAD algorithm: 11

= ww b vv i i where: 1 ??1 21 2 ?3 ? ?1 ?1 ?1 + ???= ?? ? ?1?3 ?? ?1 ?2 ?1?2 2?2 ?1 (1 ?3 ?? ?1 ???= ?? ?? 2?1) ? ?1?3 ?1 and n n n n n = i = i = i = i = i 2 2 = = = = = 1 2 3 1 2 p e iv p e iv p ie q iv q iv , , , and . i i 1 1 1 1 1 Next, the following regression model should be estimated by means of the LAD 12

Once parameter b is estimated, a and c can be calculated using the equations 1 ? ?2 ?1?2 ?1 ??1 ?1 ? = ? ?1?3 ?1 ? =? ? ?2 ? ?3 ?1 and, finally, the eqn (5) is used to calculate the weights. Identical algorithm may be alternatively employed for minimising sum of squares, described in the previous section. These algorithms may be useful also when the minimization algorithm built in typical packages is unable to provide a solution to equations (5) (7), which may happen for some datasets. 13

1.3 Handling negative solutions in STATA The algorithms presented in previous sections, neither A & S method do not ensure all positive weights. Receiving negative ones is likely when sub-populations vary considerably in terms of sizes and some of them represent very small (well below 1%) shares. This problem may be handled in two ways. First, by adding an additional constraint in the estimation based on equations (5) (7). As standard statistical/econometric packages, including STATA, does not allow imposing positive solutions, it has to be written indirectly, after changing eqn (5) from quadratic to cubic. This reduces probability of non-positive solutions, however they are still likely for some data. 14

Additional constraint may take the form 3 min 2 min + + + = av bv dv c min v (9) min where vmin stands for a minimum population share. In that way, a minimum estimated share remains unchanged and therefore cannot be negative. If a weight wi is an increasing function of population share vi all solutions are positive. This condition is not necessary true, however. In some cases vmin might be replaced by a maximum (or any reliable) value, especially when estimated weight for highest population share is greater than actual one. Nevertheless, none of this conditions protects from receiving negative weights. If this happens one can use Excel add-in Solver (downloadable from the producer) allowing to reach non-negative weights. 15

1.4 Handling negative solutions in Excel Solver A non-negativity constraint may be added to mathematical programming problems directly. Though such a constraint may be only in the form greater or equal zero , a positivity condition may be imposed indirectly, however at the cost of additional constraint. Using Excel Solver has two serious limitations: requires time consuming matrix manipulations that might be avoided when using methods based on regression Solver cannot manage large datasets: the number of sub-populations cannot exceed 200 divided by the number of constraints; as a result, the weights for 80 Russia s regions may be calculated only by one of the methods presented below 16

Excel Solver is capable to provide both minimisation of squares and of absolute values: n = i 2) min ( w v (3) i i w 1 n = i min w v (8) i i w 1 The first one may be handled using built-in nonlinear procedure with two constraints: n n = i and = = 1 w e e i w (4) i i = 1 1 i Minimisation of absolute values may be performed using linear SIMPLEX method with several additional constraints. As |x| = max{x, -x}, wi non- negativity may be ensured by adding constraints ?: ?? ?? ?? ?? ?? ?? ?? ?? while the function minimised is (wi - vi). 17

2. EMPIRICAL ILLUSTRATION 2. 1 The data 12 developed countries included in Human Mortality Database, the last data available (2013 or 2014), men and women separately (hereafter: HMD12) 80 regions in Russia, 2010, men and women together (hereafter: RUSSIA80), source: Human Development Report, 2013 18

Table 1. Life expectancy and population shares for 12 countries (in last row life expectancy from life tables in parentheses) Life exp. women 81.15 82.86 83.84 86.63 83.43 83.42 80.92 76.29 83.71 84.74 81.29 76.21 81.13 (80.75) Population share 0.01312 0.10088 0.00988 0.15840 0.00066 0.00554 0.04876 0.18880 0.01174 0.00998 0.39238 0.05985 Life exp. men 75.15 77.99 80.29 80.23 79.37 79.8 72.98 65.1 80.1 80.52 76.54 66.31 74.69 (74.49) Population share 0.01352 0.1031 0.01035 0.160463 0.000703 0.005664 0.048824 0.173701 0.012477 0.01039 0.405933 0.054875 Country Czech Republic Germany Israel Japan Luxembourg New Zealand Poland Russian Federation Sweden Switzerland USA Ukraine Mean - - 19

2.2 Weights estimates HMD12, men: all positive for STATA and Solver procedures HMD12, women: all positive for Solver procedure, negative appear for STATA RUSSIA80: all positive for STATA and Solver procedures, minimisation of absolute values not possible due to Solver capacity 2.3 Inequality measures range (maximum minus minimum values): from 10.4 to 18.1 years Gini and Theil inequality indices: strong impact of weighting method Theil inequality index decomposition: less significant impact of weighting method 20

Table 2. Life expectancy ranges (in years) Women 12 Men 12 Russia 80 range: emax- emin 86.63 - 76.21 = 10.42 (Japan, Ukraine) 80.52 - 65.10 = 15.64 (Switzerland, Russia) 79.08 61 = 18.08 (Ingushetia, Tuva) 21

Table 3. Gini inequality indices under various weighting of sub-populations (percentage of unweighted index in parentheses) Women 12 Men 12 Russia 80 Weights Gini index * 100 3.4823 3.6647 (105.2%) 3.88038 (111.4%) 3.7847 (108.7%) 3.73571 (107.3%) no weights 1.9544 2.22533 (113.9%) 2.11644 1.84198 (87.0%) 1.80208 (85.1%) 1.70628 (80.6%) population shares STATA min. squares n. a. 2.23347 (114.3%) 2.19255 (112.2%) Solver min. squares Solver min. absolute values n. a. 22

Tab.4. Theil inequality indices under various weighting of sub-populations (percentage of unweighted index in parentheses) Women 12 Men 12 Theil index * 100 0.2337 0.25652 (109.8%) 0.28188 (120.6%) 0.26877 (115.0%) 0.26421 (113.1%) Russia 80 Weights no weights 0.0672 0.08577 (127.6%) 0.08709 0.05721 (65.7%) 0.05573 (64.0%) 0.05003 (57.4%) population shares STATA min. squares n. a. 0.08632 (128.5%) 0.08379 (124.7%) Solver min. squares Solver min. absolute values n. a. 23

Table 4. Decomposition of Theil index into within- and between-group inequality (post-commmunist countries, Western Europe, non-European countries) Women 12 Men 12 Weights within between within between no weights 36.1% 63.9% 25.6% 73.5% population shares 38.6% 61.4% 17.7% 82.3% STATA, min. squares n. a. n. a. 14.7% 85.3% Solver, min. squares 35.0% 65.0% 17.1% 82.9% Solver min. absolute values 35.1% 64.9% 17.2% 82.8% 24

CONCLUDING REMARKS: 1. To weight or not to weight no weighting: in comparisons of longevity/health status between countries (regions) weighting: when answering the question how unequal people are? 2. Weighting matters there are no rules of direction of the impact of weights on the inequality measures (varies between datasets) the resulting differences between types of weights are less important, though noticeable Excel Solver yields more theoretically consistent weights than constrained regression but is somehow awkward in multiple applications 25

REFERENCES Anand, S., F. Diderichsen, T. Evans, V. M. Shkolnikov and M. Wirth (2001), Measuring disparities in health: methods and indicators , in.: T. Evans, M. Whitehead, F. Diderichsen, A. Bhuiya and M. Wirth (eds.) Challenging inequities in health: from ethics to action, pp. 48-67. Oxford University Press. Human Mortality Database. University of California, Berkeley (USA) and Max Planck Institute for Demographic Research (Germany), www.mortality.org. Koenker, R. W. and G. W.Bassett, Regression Quantiles, Econometrica 46, pp. 33-50,1978 Sustainable Development: Rio Challenges, National Human Development Report for the Russian Federation 2013, UNDP, Moscow Shkolnikov, V. M., T. Valkonen, A. Begun and E. M. Andreev (2001), Measuring inter-group inequalities in length of life, Genus, Vol. 57, No. 3/4, pp. 33-62. 26