Package: synthpop 1.9-3

synthpop: Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

A tool for producing synthetic versions of microdata containing confidential information so that they are safe to be released to users for exploratory analysis. The key objective of generating synthetic data is to replace sensitive original values with synthetic ones causing minimal distortion of the statistical information contained in the data set. Most synthesising methods available in the package synthesise from conditional distributions where variables, which can be categorical or continuous, are synthesised one-by-one using sequential modelling. Replacements are generated by drawing from conditional distributions fitted to the original data using parametric or classification and regression trees models. Methods that are not sequential, but synthesise all variables at once, are 'sample', 'ipf', and 'catall'. Data are synthesised via the function syn() which can be largely automated, if default settings are used, or with methods defined by the user. Optional parameters can be used to influence the disclosure risk and the analytical quality of the synthesised data. The package also includes functions to assess the utility and disclosure risk of the synthetic data compared to the original. These are described in vignettes (Utility - Assessing, Visualizing and Improving the Utility of Synthetic Data) and (Disclosure - Practical Privacy Metrics for Synthetic Data).

Authors:Beata Nowok [aut, cre], Gillian Raab [aut], Chris Dibben [ctb], Joshua Snoke [ctb], Caspar van Lissa [ctb], Lotte Pater [ctb], Timon Huijser [ctb]

synthpop_1.9-3.tar.gz
synthpop_1.9-3.zip(r-4.7)synthpop_1.9-3.zip(r-4.6)synthpop_1.9-3.zip(r-4.5)
synthpop_1.9-3.tgz(r-4.6-any)synthpop_1.9-3.tgz(r-4.5-any)
synthpop_1.9-3.tar.gz(r-4.7-any)synthpop_1.9-3.tar.gz(r-4.6-any)
synthpop_1.9-3.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
synthpop/json (API)

# Install 'synthpop' in R:
install.packages('synthpop', repos = c('https://bnowok.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/bnowok/synthpop/issues

Datasets:
  • SD2011 - Social Diagnosis 2011 - Objective and Subjective Quality of Life in Poland

On CRAN:

Conda:

9.60 score 56 stars 1 packages 689 scripts 4.8k downloads 6 mentions 73 exports 70 dependencies

Last updated from:8e51186105. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK208
source / vignettesOK266
linux-release-x86_64OK206
macos-release-arm64OK216
macos-oldrel-arm64OK217
windows-develOK154
windows-releaseOK159
windows-oldrelOK177
wasm-releaseOK126

Exports:codebook.syncomparecompare.data.framecompare.fit.syndscompare.listcompare.syndsdisclosuredisclosure.data.framedisclosure.listdisclosure.syndsglm.syndslm.syndsmergelevels.synmulti.comparemulti.disclosuremulti.disclosure.data.framemulti.disclosure.listmulti.disclosure.syndsmultinom.syndsnumtocat.synpolr.syndsprint.compare.fit.syndsprint.disclosureprint.fit.syndsprint.multi.disclosureprint.summary.fit.syndsprint.summary.syndsprint.syndsprint.utility.genprint.utility.tabprint.utility.tablesread.obsreplicated.uniquessdcsummary.fit.syndssummary.syndssynsyn.bagsyn.cartsyn.catallsyn.ctreesyn.cubertnormsyn.ipfsyn.lognormsyn.logregsyn.nestedsyn.normsyn.normranksyn.passivesyn.polrsyn.polyregsyn.rangersyn.rfsyn.samplesyn.satcatsyn.smoothsyn.sqrtnormsyn.stratasyn.survctreesynorig.compareutility.genutility.gen.data.frameutility.gen.listutility.gen.syndsutility.tabutility.tab.data.frameutility.tab.listutility.tab.syndsutility.tablesutility.tables.data.frameutility.tables.listutility.tables.syndswrite.syn

Dependencies:bromanclassclassIntclicmmcodetoolscoincpp11digeste1071farverforcatsforeignfuturefuture.applyggplot2globalsgluegtableisobandKernSmoothlabelinglatticelibcoinlifecyclelistenvmagrittrMASSMatrixmatrixStatsmipfpmodeltoolsmultcompmvtnormnnetnumDerivparallellypartypillarpkgconfigplyrpolsplineprotoproxyR6randomForestrangerRColorBrewerRcppRcppArmadilloRcppEigenrlangrmutilrpartRsolnpS7sandwichscalesstringistringrstrucchangesurvivalTH.datatibbletruncnormutf8vctrsviridisLitewithrzoo

Disclosure
Introduction | A simple example | Scenario and definitions | Identifying disclosure from 1-way and 2-way relationships | Excluding records | Conclusions | Acknowledgement

Last update: 2026-04-15
Started: 2025-07-12

Inference in synthpop
Introduction | Inference to results from the original data | Inference to population parameters | Comparing fits to the original and synthesised data | Acknowledgement

Last update: 2025-07-12
Started: 2017-11-21

Utility
Introduction | Measures | Models for the propensity score: practical considerations. | Using utility measures to tune the synthesis methods. | Conclusion | Details of utility measures | Evaluation of utility measures | Methods for NULL distribution of utility measures | Equivalence of SPECKS, PO50 and MabsDD

Last update: 2025-07-12
Started: 2021-11-17

Using synthpop
Introduction and background | Overview of method | The synthpop package in practice | Illustrative examples | Concluding remarks

Last update: 2021-11-29
Started: 2017-05-24

Readme and manuals

Help Manual

Help pageTopics
Generating synthetic versions of sensitive microdata for statistical disclosure controlsynthpop-package synthpop
Makes a codebook from a data framecodebook.syn
Comparison of synthesised and observed datacompare
Compare model estimates based on synthesised and observed datacompare.fit.synds print.compare.fit.synds
Compare univariate distributions of synthesised and observed datacompare.data.frame compare.list compare.synds print.compare.synds
Disclosure measuresdisclosure disclosure.data.frame disclosure.list disclosure.synds print.disclosure
Fitting (generalized) linear models to synthetic dataglm.synds lm.synds print.fit.synds
Merge levels of factors in a data framemergelevels.syn
Multivariate comparison of synthesised and observed datamulti.compare
Disclosure measures for multiple of target variables.multi.disclosure multi.disclosure.data.frame multi.disclosure.list multi.disclosure.synds print.multi.disclosure
Fitting multinomial models to synthetic datamultinom.synds
Group numeric variables before synthesisnumtocat.syn
Fitting ordered logistic models to synthetic datapolr.synds
Importing original data sets form external filesread.obs
Replications in synthetic dataprint.repuniq.synds replicated.uniques
Social Diagnosis 2011 - Objective and Subjective Quality of Life in PolandSD2011
Tools for statistical disclosure control (sdc)sdc
Inference from synthetic dataprint.summary.fit.synds summary.fit.synds
Synthetic data object summariesprint.summary.synds summary.synds
Generating synthetic data setsprint.synds syn syn.strata
Synthesis with baggingsyn.bag
Synthesis of a group of categorical variables from a saturated modelsyn.catall
Synthesis with classification and regression trees (CART)syn.cart syn.ctree
Synthesis of a group of categorical variables by iterative proportional fittingsyn.ipf
Synthesis by linear regression after transformation of a dependent variablesyn.cubertnorm syn.lognorm syn.sqrtnorm
Synthesis by logistic regressionsyn.logreg
Synthesis for a variable nested within another variable.syn.nested
Synthesis by linear regressionsyn.norm
Synthesis by normal linear regression preserving the marginal distributionsyn.normrank
Passive synthesissyn.passive
Synthesis by ordered polytomous regressionsyn.polr
Synthesis by unordered polytomous regressionsyn.polyreg
Synthesis with a fast implementation of random forestssyn.ranger
Synthesis with random forestsyn.rf
Synthesis by simple random samplingsyn.sample
Synthesis from a saturated model based on all combinations of the predictor variables.syn.satcat
syn.smoothsyn.smooth
Synthesis of survival time by classification and regression trees (CART)syn.survctree
check synthetic and original if not produced by synthpop.synorig.compare synorig.compare.list
Distributional comparison of synthesised and observed dataprint.utility.gen utility.gen utility.gen.data.frame utility.gen.list utility.gen.synds
Tabular utilityprint.utility.tab utility.tab utility.tab.data.frame utility.tab.list utility.tab.synds
Tables and plots of utility measuresprint.utility.tables utility.tables utility.tables.data.frame utility.tables.list utility.tables.synds
Exporting synthetic data sets to external fileswrite.syn