Statistical Framework on Privacy and Big Data Analysis in Boston

jsm boston august 8 2014 n.w
1 / 16
Embed
Share

Explore the statistical framework on privacy, big data, and the public good discussed at the JSM event in Boston. Key themes include valid inference, differential privacy, and challenges in extracting information from big data. Learn about the importance of data quality, ownership issues, and the need for a privacy measure in the era of big data analysis.

  • Statistical
  • Privacy
  • Big Data
  • Boston
  • JSM

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)

  2. Waterconsumption in Berlin during the Final

  3. Content

  4. Key themes Importance of valid inference and the role of statisticians New analytical framework: differential privacy Inadequacy of current statistical disclosure limitation approaches Possibilities for accessing big data (without harming privacy)

  5. Extracting Information from Big Data (Kreuter/Peng) The challenges of extracting (meaningful) information from big data are similar to those of surveys. Two main concerns when it comes extracting information from data: Measurement and Inference.

  6. Extracting Information from Big Data (Kreuter/Peng) Knowledge of the data generating process is need (Total Survey Error framework). Good starting point Need for development It is the difference between designed and organic data (Bob Groves) that poses challenges to the extraction of information. Solutions and new challenges: data linkage and information integration.

  7. Access and Linkage (Kreuter/Peng) Essential to understand data quality and break-downs Challenged by ... different privacy requirements Open issues of ownership Lack of trusted third parties However ... likely leads to good data documentation Reproducible research Transparency 7

  8. The Need for a Measure for Privacy (Dwork) Big data mandates a mathematically rigorous theory of privacy, a theory amenable to measure and minimize cumulative privacy, as data are analyzed, re-analyzed, shared, and linked. Nothing is absolute safe/secure.

  9. Differential Privacy (Dwork) Definition of privacy has to take into account; that we want to learn useful facts out of the data. It does not matter if you are in the data base, because the generalized result affects you: differential privacy. Data usage should be accompanied by publication of the amount of privacy loss, that is, its privacy price . The chosen statistics should be published using differential privacy, together with the privacy losses.

  10. Releasing Record-level Data (Karr/Reiter) Risky for data subjects and stewards Data often from administrative sources, hence available to others. Large number of variables means everyone is a populaton unique. Facing the Future 2013 10

  11. Might typical disclosure control methods provide an answer? (Karr/Reiter) Many data stewards alter data before releasing them Aggregate data, swap records, add noise... Usually minor perturbations for quality reasons Typical methods not likely to be effective Low intensity perturbations not protective High intensity perturbations destroy quality Facing the Future 2013 11

  12. A Potential Path Forward (Karr Reiter) An integrated system including unrestricted access to highly redacted data (synthetic data), followed with means for approved researchers to access the confidential data via remote access solutions, glued together by verification servers that allow users to assess the quality of their inferences with the redacted data. Facing the Future 2013 12

  13. We Have the Building Blocks (Karr/Reiter) Synthtic data Synthetic Longitudinal Business Database. Automated methods based on machine learning. Remote access solutions NORC virtual data enclave. Virtual machines and protected data networks. Verification servers Not been built yet, but we have ideas for quality measures. Facing the Future 2013 13

  14. Data Access for Research to Big Data Data access and combination of data sources is needed (Kreuter/Peng) Ideal scenario: data is held be a trusted or trustworthy curator: the data remain secret, the responses are published. Cryptography helps to be close to the ideal scenario (Dwork). Wallet Gardens (Stodden). The New Deal on Data (Greenwood et al.). Facing the Future 2013 14

  15. My Conclusion Blend big data and survey-based/official data. Use RDC structure for access to big data or combined data. No longer hands on work with data. Discussion of many topics needed: informed consent, non- participation, inference, privacy Main issues: data protection, access and trust. We have to be more active in the public discussion, because big data is affecting our daily work!!!

  16. http:/fdz.iab.de/en.aspx Forthcoming for Privacy, Big Data, and the Public Good Frameworks for E ngagement Stefan Bender stefan.bender@iab.de E dited by Julia Lane Am erican Ins titutes for R es earch,W as hington D C Victoria Stodden C olum bia U niversity Stefan Bender Institute for E m ploym ent R esearch of the G erm an F ederal E m ploym ent Agency Helen Nissenbaum N ew Y ork U niversity Massive amounts of data on human beings can now be analyzed. Pragmatic purposes abound, including selling goods and services, winning political campaigns, and identifying possible terrorists. Yet big data can also be harnessed to serve the public good: scientists can use big data to do research that improves the lives of human beings, improves government services, and reduces taxpayer costs. In order to achieve this goal, researchers must have access to this data raising important privacy questions. What are the ethical and legal requirements? What are the rules of engagement? What are the best ways to provide access while also protecting confidentiality? Are there reasonable mechanisms to compensate citizens for privacy loss? www.iab.de The goal of this book is to answer some of these questions. The book s authors paint an intellectual landscape that includes legal, economic, and statistical frameworks. The authors also identify new practical approaches that simultaneously maximize the utility of data access while minimizing information risk. Contributors Katherine J. S trandburg; S olon Barocas and Helen Nissenbaum; Alessandro Acquisti; Paul Ohm; Victoria S todden; S teven E . Koonin and Michael J . Holland; R obert M. Goerge; P eter E lias; Daniel Greenwood, Arkadiusz S topczynski, Brian S weatt, Thomas Hardjono, and Alex P entland; Carl Landwehr; J ohn Wilbanks; F rauke Kreuter and R oger P eng; Alan F . Karr and J erome P . R eiter; Cynthia Dwork Order Today! Visit www.cambridge.org/9781107637689 or call 1.800.872.7423 20% Discount Promo Code: F4LANE www.cambridge.org/9781107637689 www.dataprivacybook.org

More Related Content