Have Your Cake & Synthesize it Too
Get Ready for a Game
- Get a blank index card and something to draw with
- Draw your best example of the object listed on the index card in the middle of the table (or in your group)
- Once everyone has created their example collect all the examples and put the topic index card on top
- Distinguish between user- and model-based synthesis
- Gain familiarity with some current synthesis methods
- Understand the applications of synthesis methods
But first, some questions about the problems we face...
What are the biggest barriers you face to share data with external stakeholders?
How do you currently analyze data for your smallest demographic groups?
Synthetic Data: The Origin Stories
In the beginning, there was imputation...
- Statistical agencies want to share data broadly while protecting privacy
- First references to synthetic data come from Statistical Disclosure Control
- Initially, synthetic data are a form of multiply imputed data
- Brought to you by the makers of Multiple Imputation: Donald Rubin
and then computer scientists came along...
- Computer vision researchers find they need more image data
- Early methods are based on lossy compression techniques
- In 2014, Goodfellow and his colleagues develop Generative Adversarial Networks (GAN)
- Now in addition to images, GANs are synthesizing tabular data
- Data created by a computer
- Can be created by:
- User instructions/specifications
- Models fitted to observed data
- It is not synthetic control
Model-Defined Approaches
- Most common/popular - Classification & Regression Trees (CART)
- First variable is sampled from the observed data; the rest is predicted from models
- CART is prone to overfitting
- As the number of variables increases, performance can be worse
- Requires reshaping longitudinal data to wide format
Considerations
- Privacy Protection vs Data Utility
- Cross-Sectional vs Longitudinal Data
- Time Sensitivity/Computational Resources
- Validation Server
Who is using synthetic data?
Want to play a game?
The phases of the game
- Draw phase
- Evaluate phase
- Telephone phase
How GANs work
You've Got Questions,
I've (hopefully) Got The Answers