
Understanding Research Data Centers for Academic Researchers
Explore the benefits of Research Data Centers (RDCs) for academic researchers, detailing the available data, access procedures, and reasons to invest time in these secure facilities. Learn about the partnerships between RDCs, academic researchers, and the Census Bureau. Discover who can work in an RDC, why Census restricts microdata access, and the differences between restricted and public demographic data.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Proposal Development in Federal Statistical Research Data Centers (RDCs) Bethany S. DeSalvo, PhD Federal Statistical Research Data Centers, Texas Center for Economic Studies United States Census Bureau Any opinions and conclusions expressed herein are those of the author and do not necessarily represent the views of the U.S. Census Bureau.
Outline What are RDC s, who can work in them, and why would I want to invest my time? What data are available? How do I access these data? Questions.
What are Research Data Centers (RDCs)? RDCs provide secure access to restricted data to qualified researchers with approved research projects. RDCs are restricted-access federal facilities, staffed by a Census Bureau employee, which meet all relevant security requirements. RDCs are a partnership between the local institution, the US Census Bureau and other federal statistical agencies.
RDCs as partnerships For Academic Researchers: provides access to huge corpus of restricted data, supports cutting-edge research, and attracts and retains data-intensive faculty For the Census Bureau: Extends pool of expertise on substantive, methodological, and statistical issues
Who can work in an RDC? Researchers with an approved project, including: faculty and other researchers graduate students working with advisors foreign nationals with 3 years in the United States
Why Is Census Required to Restrict Microdata Access? Titles 13 (Census); 26 (IRS) U.S.C.; CIPSEA protect confidentiality so that: the respondent cannot be identified only Census employees and temporary staff can access microdata access must potentially provide legitimate benefits to Census Bureau programs
Demographic data: Restricted versus Public More geographic detail Additional variables More observations Variables not censored (income) Additional detail within variables
Person Identification Validation System (PVS) PVS assigns 9 digit, unique identifiers called Protected Identification Keys (PIKs) via probabilistic matching techniques to surveys and decennial data PIKs are used to facilitate record linkage Once PIKed, data can be linked to any other data processed through PVS Match keys include: full address, full name, full date of birth, SSN if available
Data Available Decennial Censuses 1950-2010 full count long and short form census data (when possible) Household and individual level demographic, socio-economic, program participation, education, household characteristics, etc Yearly ACS (American Community Survey) 2006 2015 (full), 2000-2005 (small, no GQ), 1996-1999 (limited) 1.5% of US population
Data Available Current Population Survey Supplements ASEC (Annual Social and Economic Supplement) or March 1967-2015 Fertility Supplement (1998-2012), Food Security (2001-2012), School enrollment (2004-2014), Tobacco Use (1998-2011), Unbanked (2009- 2013), Volunteer (2002-2015), Voter Reg (1998-2012) American Housing Survey Some years from 1984-2015; ~50,000 households per year Core questions: Home condition, occupant characteristics, home improvements, housing costs, home values, characteristics of recent movers, etc Topical questions vary by year
Data Available Survey of Income and Program Participation 2-4 year household panels; interviews ~every 4 months; 1984-2014; 14,000 to 52,000 households each wave Core: labor force, income dynamics, government transfers Topical modules vary National Crime Victimization Survey Yearly 2006-2014; ~90,000 households Non-fatal and property crimes, reported and unreported; demographic information for respondent; demographic information of perpetrator, exp with CJ system
Data Available National Longitudinal Mortality Study CPS-ASEC data linked to national death index CPS cohorts 1973-1998 National Longitudinal Survey (NLS) Original cohorts (1966, 1968) Labor market, demographic, and other data collected over 20 years ~5,000 respondents per cohort
Economic Data Advantages Establishment and firm level characteristics Detailed industry and geography Linking Data Consistent identifiers Business register Outside data
Economic Censuses Data Set Census of Auxiliaries (AUX) Census of Construction Industries (CCN) Census of Finance, Insurance, Real Estate (CFI) Census of Manufacturers (CMF) Census of Mining (CMI) Census of Retail Trade (CRT) Census of Services (CSR) Census of Transportation, Communications, Utilities (CUT) Census of Wholesale Trade (CWH)
Establishment Surveys Data Set Annual Survey of Manufacturers (ASM) Current Industrial Reports (CIR) Manufacturing Energy Consumption Survey (MECS) Medical Expenditure Panel Survey Insurance Component (MEPS-IC) National Employer Survey (NES) Quarterly Survey of Plant Capacity Utilization (QPC) Survey of Manufacturing Technology (SMT) Survey of Plant Capacity Utilization (PCU) Survey of Pollution Abatement Costs and Expenditures (PACE)
Firm Surveys Data Set Annual Capital Expenditures Survey (ACES) Annual Retail Trade Survey (ARTS) Business Expenditures Survey (BES) Business Research & Development and Innovation Survey (BRDIS) Enterprise Summary Report (ESR) Exporter Database (EDB) Quarterly Financial Report (QFR) Service Annual Survey (SAS) Survey of Business Owners (SBO) Survey of Industrial Research and Development (SIRD)
Business Register Data Data Set Compustat-SSEL Bridge (CSB) Form 5500 Bridge File Integrated Longitudinal Business Database (ILBD) Longitudinal Business Database (LBD) Ownership Change Database (OCD) Standard Statistical Establishment List / Business Register (SSEL)
Transactions Data Data Set Commodity Flow Survey (CFS) Foreign Trade Data - Export (EXP) Foreign Trade Data - Import (IMP) Longitudinal Foreign Trade Transactions Data (LFTTD)
Linked Employer Household Dynamics (LEHD) LEHD data combine administrative data from states Unemployment Insurance systems with Census Bureau data. 1. Workers: Employer history and quarterly wages, Individual characteristics (sex, age, race), Point in time residence and place of birth 2. Employers: Industry, employment, total payroll, location 3. Linkages between workers and employers 4. Links to other Census data: Virtually any RDC data on businesses; SIPP; CPS March supplement; ACS
Longitudinal Employer- Household Dynamics (LEHD) Employer Worker Jobs file U2W LBD BRB ECF EHF ES202 SSEL ICF CPS SIPP ACS
Recovered data Tapes from Unisys mainframe were recovered, providing data back to 1953 on all sectors of the economy Newly Recovered Microdata on U.S. Manufacturing Plants from the 1950s and 1960s: Some Early Glimpses. (3.7 MB) CES Discussion Paper CES-WP-11- 29. Recovered demographic data CPS data back to 1962 Income Surveys Development Program data (old SIPP) Others
Health & Human Services (HHS) Restricted Data: NCHS additional variables more detailed geography continuous/non top-coded variables Some data can be linked to: Mortality files Social Security files Medicare/Medicaid files Air quality files (indirect match by detailed geography)
HHS Restricted Data: NCHS Health Status Surveys National Health and Nutrition Examination Survey (NHANES) National Health Interview Survey (NHIS) National Health Interview Disability Survey National Immunization Survey Longitudinal Study on Aging National Survey of Family Growth National Maternal and Infant Health Survey See the complete list with descriptions at http://www.cdc.gov/rdc/
HHS Restricted Data: AHRQ data Medical Expenditure Panel Survey (MEPS), Household Component collects nationally representative data on demographic characteristics, health conditions, health status, use of medical care services, charges and payments, access to care, satisfaction with care, health insurance coverage, income, and employment. Restricted Variables: Geographic detail; state identifiers Fully specified ICD-9 codes Asset data Imputed NDC for prescription drugs Some medical provider data
Important Web Sites Census Bureau Data: Center for Economic Studies http://www.census.gov/ces/ NCHS Research Data Center http://www.cdc.gov/rdc/ AHRQ https://meps.ahrq.gov/data_stats/onsite_datacenter.js p 26
Background Check Off-line paperwork and documentation On-line trainings and certifications Background check Submitted online and followed with interview Residential history Foreign travel Education and employment history References Fingerprinting
Proposal development for projects requesting access to Census Bureau data.
Special Sworn Status SSS is authorized by Title 13 U.S.C. 23 (c) "to assist the Bureau of the Census in performing the work authorized by this title." The Census Bureau may provide SSS to an individual When an individual has expertise or specialized knowledge that can contribute to the accomplishment of Census Bureau projects or activities or engages in a joint project with the Census Bureau; When an individual is employed by an agency/organization performing a service for the Census Bureau under contract or providing information to the Census Bureau for statistical purposes; When Federal law requires an individual to audit, inspect, or investigate Census Bureau activities.
Writing the proposal: perspective The perspective of your proposal is driven toward the predominant purpose or the Census Bureau benefit. Your audience includes mostly data experts Your proposal is a request for data showing your project: has 2 possible benefits to the Census Bureau is feasible emphasizes statistical models vs. tabular output has scientific merit clearly needs restricted use data falls within the Census Bureau mandate indicates an understanding of the appropriate disclosure avoidance protections
Proposal Package 1. Abstract 2. Proposal Description 3. Benefit to the Census Bureau (Predominate Purpose Statement/PPS)
Description 15-25 pages Sections: Introduction Background / Literature Review Data & Methods Output / Disclosure Risk Papers needs to be thought out thoroughly during proposal process / before data are released Timeline / Project Duration Conclusion
Description Be clear about the importance of using restricted use data. What is your sample? Research question, hypotheses, variables, expected outcome, models, sample information, how data will be linked should be described Describe empirical methodology, including equations Clarify the relationship between your specifications and the data Show you have a feasible plan but leave room for movement.
Output / Disclosure Avoidance Review No output can leave the RDC without review Clear understanding of samples No individual person or business can be identifiable in release Performed by Administrator and the Center for Disclosure Avoidance Review 2-3 weeks (in general) Intermediate output discouraged Descriptive results may be problematic Focus on statistical data for release
Timeline List of major milestones When will you complete the matching of datasets, construction of extracts, etc. How do you expect the project to unfold When will you request disclosure Extensions often not granted
Conclusion upon completion of the project. we will include a report describing how the research project met Title 13, Chapter 5 requirement.. We will also provide all programs, outputs, and findings to the Census Bureau and submit a technical paper to the Working Paper Series
Benefits to the Census Bureau Predominant Purpose Statement Not a pro forma requirement Legal basis on which researchers are allowed access to restricted use data Must provide 2 benefits
Benefits 1. Evaluating concepts and practices underlying Census Bureau statistical data collection and dissemination practices, including consideration of continued relevance and appropriateness of past Census Bureau procedures to changing economic and social circumstances; 2. Analyzing demographic and social or economic processes that affect Census Bureau programs, especially those that evaluate or hold promise of improving the quality of products issued by the Census Bureau; 3. Developing means of increasing the utility of Census Bureau data for analyzing public programs, public policy, and/or demographic, economic, or social conditions; and 4. Conducting or facilitating census and survey data collection, processing or dissemination, including through activities such as administrative support, information technology support, program oversight, or auditing under appropriate legal authority. 5. Understanding and/or improving the quality of data produced through a Title 13, Chapter 5 survey, census, or estimate; 6. Leading to new or improved methodology to collect, measure, or tabulate a Title 13, Chapter 5 survey, census, or estimate; 7. Enhancing the data collected in a Title 13, Chapter 5 survey or census. For example: improving imputations for non- response; developing links across time or entities for data gathered in censuses and surveys authorized by Title 13, Chapter 5; 8. Identifying the limitations of, or improving, the underlying Business Register, Master Address File, and industrial and geographical classification schemes used to collect the data; 9. Identifying shortcomings of current data, collection programs and/or documenting new data collection needs; 10. Constructing, verifying, or improving the sampling frame for a census or survey authorized under Title 13, Chapter; 11. Preparing estimates of population and characteristics of population as authorized under Title 13, Chapter 5; 12. Developing a methodology for estimating non-response to a census or survey authorized under Title 13, Chapter 5; 13. Developing statistical weights for a survey authorized under Title 13, Chapter 5.
Approval Process Step 1: Approval from RDC Step 2: CES approval Step 3: Sponsoring agency approval Step 4: Background check
Timeframe Census Data Plan on 9 to 12 months from submission Title 13 (Census approval only) vs. Title 26 (Census & IRS approval) NCHS/AHRQ Data Timeline dependent on agency approval process Census approval NOT required Special Sworn Status 2-3+ additional months for your security clearance time runs concurrent with sponsoring agency review
How you can speed up the process: Adhere closely to all practices and procedures before proposal submission Work closely with local RDC on proposal development and on any requested revisions or clarifications. Providing the terms of use for any datasets they wish to bring to the lab. Process Special Sworn Status (SSS) paperwork quickly.
The Nuts & Bolts of Doing Research in a RDC Research conducted on site Computing environment Restricted area with badge access No internet, phones or personal computers allowed in lab No paper or output allowed outside of lab Disclosure Avoidance review required to present results discussion of specific results allowed only inside RDC (even among co-authors) 42
Discussion papers, reference papers, data introductions Business Register DeSalvo, Bethany, Frank Limehouse, and Shawn D. Klimek. Documenting the Business Register and Related Economic Business Data. US Census Bureau Center for Economic Studies Paper No. CES-WP-16-17 (2016). Patents and Firms Graham, Stuart JH, et al. Business Dynamics of Innovating Firms: Linking US Patents with Administrative Data on Workers and Firms. Georgia Tech Scheller College of Business Research Paper 30 (2015). Kerr, William, and Shihe Fu. The Industry R & D Survey: Patent Database Link Project. Center for Economic Studies, US Department of Commerce, Bureau of the Census, 2006. An Algorithmic Links with Probabilities Crosswalk for USPC and CPC Patent Classifications with an Application Towards Industrial Technology Composition The Longitudinal Business Database (LBD) Jarmin, Ron S., and Javier Miranda. The longitudinal business database. Available at SSRN 2128793 (2002). Geography and Demography Davis, James C., and Brian P. Holly. Regional analysis using Census Bureau microdata at the center for economic studies. International Regional Science Review3 (2006): 278-296. Longitudinal Employer-Household Dynamics (LEHD) Vilhuber, Lars, and Kevin McKinney. LEHD Infrastructure files in the Census RDC-Overview. No. 14-26. 2014. Goetz, Christopher, et al. The Promise and Potential of Linked Employer-Employee Data for Entrepreneurship Research. No. w21639. National Bureau of Economic Research, 2015. Annual Survey of Entrepreneurs Foster, Lucia, and Patrice Norman. The Annual Survey of Entrepreneurs: An Introduction. US Census Bureau Center for Economic Studies Paper No. CES-WP-15-40 (2015).
Thank you. Bethany DeSalvo, PhD Federal Statistical Research Data Center, Texas Center for Economic Studies US Census Bureau Bethany.DeSalvo@census.gov 979-845-5618