synthpop - Generating Synthetic Versions of Sensitive Microdata for
Statistical Disclosure Control
A tool for producing synthetic versions of microdata
containing confidential information so that they are safe to be
released to users for exploratory analysis. The key objective
of generating synthetic data is to replace sensitive original
values with synthetic ones causing minimal distortion of the
statistical information contained in the data set. Variables,
which can be categorical or continuous, are synthesised
one-by-one using sequential modelling. Replacements are
generated by drawing from conditional distributions fitted to
the original data using parametric or classification and
regression trees models. Data are synthesised via the function
syn() which can be largely automated, if default settings are
used, or with methods defined by the user. Optional parameters
can be used to influence the disclosure risk and the analytical
quality of the synthesised data. For a description of the
implemented method see Nowok, Raab and Dibben (2016)
<doi:10.18637/jss.v074.i11>.