Statistical Office of Serbia: Calibration Lessons and Techniques
In 2016, the Statistical Office of the Republic of Serbia implemented calibration methods to enhance estimation accuracy in the Labour Force Survey. This involved adjusting final weights and introducing new conditions for precise calculations. The sampling plan includes a two-stage stratified rotation panel sample selection, while estimation processes involve adjusting initial weights for non-response. With the introduction of R programming language and CALMAR macro, the office aims to further improve its estimation system for ongoing surveys. Quarterly and upcoming monthly estimates are part of the office's data collection strategies.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistical Office of the Republic of Serbia Using R package ReGenesees in the phase of calibration: lessons learnt Marija Karasicevic Statistician methodologist in Department for sampling methodology marija.karasicevic@stat.gov.rs
Content Introduction Sampling plan Estimation Calibration in CALMAR Calibration in R Conclusion www.stat.gov.rs / stat@stat.gov.rs
Introduction In 2016, the estimation system for LFS in Statistical Office of the Republic of Serbia (SORS) was changed by introducing calibration method Calibration is procedure for calculating factors so that estimates calculated using final weights are in line with known totals. Final weights are sample weights multiplied by calibration factors. In SORS, we use software SAS and for calibration CALMAR macro Introduction of programming language R and its packages www.stat.gov.rs / stat@stat.gov.rs
Sampling plan Two-stage stratified, rotation panel sample is selected (2-2-2 rotation scheme) The units of the first stage are enumeration areas and the final sampling units are households. Enumeration areas, as primary units, are stratified by type of settlement, into urban and other, as well as by territory covered by the administrative districts (NUTS 3 level -25 areas) Every household and person is in the sample for two consecutive quarters, then two quarters is out of the sample and is again in the sample for two consecutive quarters www.stat.gov.rs / stat@stat.gov.rs
Sampling plan Since 2015, LFS in Serbia has been carried out continually Each subsample allocated to a quarter is uniformly and randomly distributed over 13 weeks Quarterly estimates are calculated in SORS and the plan is to introduce monthly estimates www.stat.gov.rs / stat@stat.gov.rs
Estimation In order to get estimates that are more precise for the observed population, the initial weights (design weights) are adjusted. Initial weight is the reciprocal of the product of the probabilities of selection in each stage for each stratum Adjusting weights for non-response: response rates are calculated on the level of household and used for adjusting design weights for each enumeration area In 2016, estimation system for LFS in Serbia is improved by introducing additional conditions that have to be met when calculating the final weights www.stat.gov.rs / stat@stat.gov.rs
Estimation For each quarter, estimates have to be in line with the current demographic projections: population distribution by gender and five-year age groups at the district level (NUTS 3 level) distribution of households according to the number of household members (six groups) at the district level The additional condition, that the household and each person from the observed household have the same final weight, provides consistent estimates on a household and person levels In order to calculate final weights, calibration procedure is used www.stat.gov.rs / stat@stat.gov.rs
Calibration in CALMAR In SORS, the final weight for the person and the household is calculated using the calibration method in CALMAR macro Example: the fourth quarter 2023 The first step is adjusting design weights for non-response www.stat.gov.rs / stat@stat.gov.rs
Calibration in CALMAR The first input data set for CALMAR contains sampled households of the respondents with following variables: intermediate weight (design weight adjusted for non-response) number of household members number of persons in each age group by gender Number of household members and number of persons in household in each age group by gender are CALIBRATION VARIABLES www.stat.gov.rs / stat@stat.gov.rs
Calibration in CALMAR The second input data set contains the names of calibration variables and the associated marginals (known population totals). Known totals by district are: number of households with 1, 2, 3, 4, 5 and 6+ members number of men 14 years old or less, number of men between 15 and 19 years old, .. number of women 14 years old or less, number of women between 15 and 19 years old ) www.stat.gov.rs / stat@stat.gov.rs
DITRICT VAR N MAR1 MAR2 ... MAR5 MAR6 0 NUMB_MEMBERS 1 NUMB_MEMBERS 2 NUMB_MEMBERS 6 6 6 ... 24 NUMB_MEMBERS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75+ 0-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75+ ... 75+ MEN WOMEN 24 0 www.stat.gov.rs / stat@stat.gov.rs
Calibration in CALMAR CALMAR is a SAS macro program that implements the calibration methods developed by Deville and S rndal (1992) logit method for calibration (0.2, 4) is the starting interval for a range-restriction. This interval is reduced through simulations. For districts where convergence is not achieved, the constraints/benchmarks are revised In the fourth quarter, it was not possible to achieve convergence for some districts, so it was necessary to merge for women the age groups in some districts. For some districts, household classes according to number of members are merged, as well. www.stat.gov.rs / stat@stat.gov.rs
Calibration in R ReGenesees (R Evolved Generalized Software for Sampling Estimates and Errors in Surveys) is an R package for design-based and model-assisted analysis of complex sample surveys The input data set are respondents, individuals. Each person is assigned: indicators of basic population contingents (employed, unemployed, active/not active and in/out of the labor force) gender five-year age group territory (district) intermediate weight (design weight adjusted for non-response) calibration variables x1 to x34 Calibration variables x1 x27 are indicators is person belongs to certain age group by sex. Calibration variables x29 x34 are reciprocal value of number of persons in the household. www.stat.gov.rs / stat@stat.gov.rs
Calibration in R Domains are districts and these are templates for known totals: X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 Women DOMAINS 0-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75+ 0 1 2 21 22 23 24 www.stat.gov.rs / stat@stat.gov.rs
Calibration in R X29 X30 X31 Households X32 X33 X34 DOMAINS 1 2 3 4 5 6 0 1 2 21 22 23 24 The ReGenesees package has functions that help determine certain statistics before calibration, e.g. a function that marks cell with small number of units to achieve convergence, a function that recommends an interval, etc. www.stat.gov.rs / stat@stat.gov.rs
Calibration in R It is necessary to mark: final weights have to be integrated (the same weight for household and the each person in the household) In the fourth quarter: as in the case when it was done in CALMAR, it was necessary to merge some age groups or classes of households according to the number of members The results was almost the same, because the differences were on the fourth or fifth decimal place www.stat.gov.rs / stat@stat.gov.rs
Conclusion The programming language R is constantly evolving, offering packages with numerous capabilities. Application of the ReGenesees package in the previously described example showed how powerful the calibration method is. Two different approaches has been implemented, on the level of household in CALMAR and on the person level in ReGenesees. Knowing different software can make some phases in data processing easy. What is important to say is that constant improvement and learning is crucial. www.stat.gov.rs / stat@stat.gov.rs
THANK YOU! www.stat.gov.rs / stat@stat.gov.rs