Artificial data in social science

Artificial data in social science
Slide Note
Embed
Share

Artificial data in social science, first proposed in 1993, refers to synthetic data that mimics real data to reduce privacy concerns, enhance small datasets, and aid in model development and training. Tools like Python, Synthetic Data Vault, and SynthPop are used for creating artificial data, with techniques such as logistic modeling and CART employed in Python and R for data synthesis. The Synthetic Data Vault study shows comparable results between synthetic and control data, highlighting the effectiveness of artificial data in data science research.

  • Artificial Data
  • Social Science
  • Synthetic Data
  • Model Development
  • Python

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Artificial data in social science Richard Skeggs rskeggs@essex.ac.uk @rickskeggs www.BLGdataresearch.org @BLGDataResearch www.BLGdataresearch.org @BLGDataResearch

  2. What is Artificial Data First proposed in 1993. Often referred to as Synthetic data. In effect made up data. Mimics real data. www.BLGdataresearch.org @BLGDataResearch

  3. Why Use Artificial Data Reduces privacy concerns. Bolsters small datasets for models. Used for developing and training a model. Can be cheaper to obtain. www.BLGdataresearch.org @BLGDataResearch

  4. Creating Artificial Data Python synthetic data tools. Synthetic Data Vault Python & Numpy R synthetic data tools. Synthpop www.BLGdataresearch.org @BLGDataResearch

  5. Synthetic Data Vault Patki, N., R. Wedge, and K. Veeramachaneni. The Synthetic Data Vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 399 410, 2016. https://doi.org/10.1109/DSAA.2016.49. www.BLGdataresearch.org @BLGDataResearch

  6. Synthetic Data Vault 34 data scientist given datasets. No statistically different result between synthetic & control data. www.BLGdataresearch.org @BLGDataResearch

  7. Synthetic Data in Python Trumania: Creates test data based on a scenario. Synthetic-data-generator: generate random data follows uniform distribution. www.BLGdataresearch.org @BLGDataResearch

  8. Synthetic Data in R SynthPop R package. Uses a logistic model predict variables based on real data. Recursive model building up variables. www.BLGdataresearch.org @BLGDataResearch

  9. SynthPop Variables synthesised using CART. classification and regression trees. The model is created by binary recursive partitioning. Missing values are modelled. A model is applied to each column. www.BLGdataresearch.org @BLGDataResearch

  10. SynthPop Syn(data, Visit.sequence, k) Usage: Data a dataframe or matrix containing original data. Visit.sequence - column indices specifying the order of synthesis K size of synthetic data (optional) Method meyhod used for synthesising by default CART (optional) Example: syn.data<-syn(orig.data[c(7:28),], visit.sequence=c(7:28), k=39) www.BLGdataresearch.org @BLGDataResearch

  11. SynthPop Administrative Data Research Centre Scotland | Chris Dibben| 20 June 2014 www.BLGdataresearch.org @BLGDataResearch

  12. SynthPop synthpop: Bespoke Creation of Synthetic Data in R Beata Nowok, Gillian M. Raab, Chris Dibben Published 2015 www.BLGdataresearch.org @BLGDataResearch

  13. SynthPop if(!require(synthpop)) { install.packages("synthpop") } library(synthpop) syn.data<-syn(iris[c(1:150),], visit.sequence=c(1:150), k=150, proper=TRUE) for(x in 1:nrow(iris)) { if(x==1) { synth.data<-as.data.frame(syn.data$syn[x, ]) } else { synth.data[x,]<-syn.data$syn[x,] } } View(synth.data) plot(synth.data$Petal.Length, synth.data$Petal.Width) www.BLGdataresearch.org @BLGDataResearch

  14. SynthPop Original Data Synthetic Data www.BLGdataresearch.org @BLGDataResearch

  15. SynthPop www.BLGdataresearch.org @BLGDataResearch

  16. SynthPop syn.data<-syn(iris, visit.sequence=c(1:150), k=150, proper=TRUE) www.BLGdataresearch.org @BLGDataResearch

  17. SynthPop www.BLGdataresearch.org @BLGDataResearch

  18. SynthPop syn.data<-syn(iris, method="parametric", visit.sequence=c(1:150), k=150, proper=TRUE, default.method = c("normrank", "logreg", "polyreg", "polr")) www.BLGdataresearch.org @BLGDataResearch

  19. SynthPop www.BLGdataresearch.org @BLGDataResearch

  20. SynthPop syn.data<-syn(iris, method="parametric", visit.sequence=c(1:150), k=150, proper=TRUE, default.method = c("normrank", "logreg", "polyreg", "polr"), seed=6000) www.BLGdataresearch.org @BLGDataResearch

  21. SynthPop www.BLGdataresearch.org @BLGDataResearch

  22. Any Questions? Richard Skeggs rskeggs@essex.ac.uk 5 April 2018 www.BLGdataresearch.org @BLGDataResearch www.BLGdataresearch.org @BLGDataResearch

Related


More Related Content