Package: synthpop 1.8-0

synthpop: Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. For a description of the implemented method see Nowok, Raab and Dibben (2016) <doi:10.18637/jss.v074.i11>.

Authors:Beata Nowok [aut, cre], Gillian M Raab [aut], Chris Dibben [ctb], Joshua Snoke [ctb], Caspar van Lissa [ctb]

synthpop_1.8-0.tar.gz
synthpop_1.8-0.zip(r-4.5)synthpop_1.8-0.zip(r-4.4)synthpop_1.8-0.zip(r-4.3)
synthpop_1.8-0.tgz(r-4.4-any)synthpop_1.8-0.tgz(r-4.3-any)
synthpop_1.8-0.tar.gz(r-4.5-noble)synthpop_1.8-0.tar.gz(r-4.4-noble)
synthpop_1.8-0.tgz(r-4.4-emscripten)synthpop_1.8-0.tgz(r-4.3-emscripten)
synthpop.pdf |synthpop.html
synthpop/json (API)
NEWS

# Install 'synthpop' in R:
install.packages('synthpop', repos = c('https://bnowok.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/bnowok/synthpop/issues

Datasets:
  • SD2011 - Social Diagnosis 2011 - Objective and Subjective Quality of Life in Poland

On CRAN:

7.72 score 40 stars 450 scripts 1.6k downloads 6 mentions 59 exports 65 dependencies

Last updated 2 years agofrom:0d013c5cfd. Checks:OK: 5 NOTE: 2. Indexed: yes.

TargetResultDate
Doc / VignettesOKNov 20 2024
R-4.5-winNOTENov 20 2024
R-4.5-linuxNOTENov 20 2024
R-4.4-winOKNov 20 2024
R-4.4-macOKNov 20 2024
R-4.3-winOKNov 20 2024
R-4.3-macOKNov 20 2024

Exports:codebook.syncomparecompare.data.framecompare.fit.syndscompare.listcompare.syndsglm.syndslm.syndsmulti.comparemultinom.syndsnumtocat.synpolr.syndsprint.compare.fit.syndsprint.fit.syndsprint.summary.fit.syndsprint.summary.syndsprint.syndsread.obsreplicated.uniquessdcsummary.fit.syndssummary.syndssynsyn.bagsyn.cartsyn.catallsyn.ctreesyn.cubertnormsyn.ipfsyn.lognormsyn.logregsyn.nestedsyn.normsyn.normranksyn.passivesyn.pmmsyn.polrsyn.polyregsyn.rangersyn.rfsyn.samplesyn.satcatsyn.smoothsyn.sqrtnormsyn.stratasyn.survctreeutility.genutility.gen.data.frameutility.gen.listutility.gen.syndsutility.tabutility.tab.data.frameutility.tab.listutility.tab.syndsutility.tablesutility.tables.data.frameutility.tables.listutility.tables.syndswrite.syn

Dependencies:bromanclassclassIntclicmmcodetoolscoincolorspacee1071fansifarverforeignggplot2gluegtableisobandKernSmoothlabelinglatticelibcoinlifecyclemagrittrMASSMatrixmatrixStatsmgcvmipfpmodeltoolsmultcompmunsellmvtnormnlmennetnumDerivpartypillarpkgconfigplyrpolsplineprotoproxyR6randomForestrangerRColorBrewerRcppRcppEigenrlangrmutilrpartRsolnpsandwichscalesstringistringrstrucchangesurvivalTH.datatibbletruncnormutf8vctrsviridisLitewithrzoo

Inference in synthpop

Rendered frominference.Rnwusingutils::Sweaveon Nov 20 2024.

Last update: 2021-11-29
Started: 2017-11-21

Using synthpop

Rendered fromsynthpop.Rnwusingutils::Sweaveon Nov 20 2024.

Last update: 2021-11-29
Started: 2017-05-24

Utility

Rendered fromutility.Rnwusingutils::Sweaveon Nov 20 2024.

Last update: 2021-11-29
Started: 2021-11-17

Readme and manuals

Help Manual

Help pageTopics
Generating synthetic versions of sensitive microdata for statistical disclosure controlsynthpop-package synthpop
Makes a codebook from a data framecodebook.syn
Comparison of synthesised and observed datacompare
Compare model estimates based on synthesised and observed datacompare.fit.synds print.compare.fit.synds
Compare univariate distributions of synthesised and observed datacompare.data.frame compare.list compare.synds print.compare.synds
Fitting (generalized) linear models to synthetic dataglm.synds lm.synds print.fit.synds
Multivariate comparison of synthesised and observed datamulti.compare
Fitting multinomial models to synthetic datamultinom.synds
Group numeric variables before synthesisnumtocat.syn
Fitting ordered logistic models to synthetic datapolr.synds
Importing original data sets form external filesread.obs
Replications in synthetic datareplicated.uniques
Social Diagnosis 2011 - Objective and Subjective Quality of Life in PolandSD2011
Tools for statistical disclosure control (sdc)sdc
Inference from synthetic dataprint.summary.fit.synds summary.fit.synds
Synthetic data object summariesprint.summary.synds summary.synds
Generating synthetic data setsprint.synds syn syn.strata
Synthesis with baggingsyn.bag
Synthesis of a group of categorical variables from a saturated modelsyn.catall
Synthesis with classification and regression trees (CART)syn.cart syn.ctree
Synthesis of a group of categorical variables by iterative proportional fittingsyn.ipf
Synthesis by linear regression after transformation of a dependent variablesyn.cubertnorm syn.lognorm syn.sqrtnorm
Synthesis by logistic regressionsyn.logreg
Synthesis for a variable nested within another variable.syn.nested
Synthesis by linear regressionsyn.norm
Synthesis by normal linear regression preserving the marginal distributionsyn.normrank
Passive synthesissyn.passive
Synthesis by predictive mean matchingsyn.pmm
Synthesis by ordered polytomous regressionsyn.polr
Synthesis by unordered polytomous regressionsyn.polyreg
Synthesis with a fast implementation of random forestssyn.ranger
Synthesis with random forestsyn.rf
Synthesis by simple random samplingsyn.sample
Synthesis from a saturated model based on all combinations of the predictor variables.syn.satcat
syn.smoothsyn.smooth
Synthesis of survival time by classification and regression trees (CART)syn.survctree
Distributional comparison of synthesised and observed dataprint.utility.gen utility.gen utility.gen.data.frame utility.gen.list utility.gen.synds
Tabular utilityprint.utility.tab utility.tab utility.tab.data.frame utility.tab.list utility.tab.synds
Tables and plots of utility measuresprint.utility.tables utility.tables utility.tables.data.frame utility.tables.list utility.tables.synds
Exporting synthetic data sets to external fileswrite.syn