We know that the conditions in which people are born, where they grow up, live and work can have an impact on their health. We also know that these social determinants of health, such as income, good quality homes, education, or work, are not distributed equally in society leading to health inequalities. Preventing ill-health connected to social and economic conditions should be a priority of all policy sectors affecting the economy, welfare, housing, education, and employment.
The SIPHER Consortium is a major project funded by the UK Prevention Research Partnership which aims to tackle some of these public health policy challenges. SIPHER brings together university researchers, local, regional and government policymakers, and groups working in public health. The consortium is working on new insights on the causes and consequences of poor health and building evaluation and data tools for researchers and policymakers.
What is synthetic data?
The interactions between social, economic and health conditions are complex and can involve data that can be disclosive. This can make it difficult or even impossible for researchers to access these data. To make it easier to study the dynamic relationships between social determinants and health outcomes, SIPHER has created a synthetic population, based on Understanding Society and UK Census data.
Using spatial microsimulation, the SIPHER Synthetic Population dataset captures attributes available in the UK Census, such as age, sex and ethnicity, plus additional information captured in Understanding Society. The dataset reflects the characteristics seen in the survey data, but with the distribution, scale and geographical coverage of the UK Census. Capturing small areas, the dataset provides a “digital twin” of the adult population in England, Scotland, and Wales. It’s important to note that the dataset does not reflect “real”, but “synthetic” people.
What can synthetic data be used for?
The SIPHER Synthetic Population can help researchers and policymakers to swiftly fill important data gaps. In particular, the dataset can be used in simulation models when testing the likely effects of different policy options. Here, the dataset can reveal how policy interventions might affect population subgroups and areas across Great Britain differently.
SIPHER has also developed an interactive dashboard. This tool allows exploration of an aggregated version of the synthetic population without any coding or data preparation. Its ‘click and explore’ format enables comparison of areas of interest, creation of bespoke detailed area profiles, development of customised data visualisations, and downloading of the aggregate data used.
While the Synthetic Population draws directly on data provided by “real” Understanding Society survey respondents, it does only ever reflect synthetic (“not real”) individuals. In addition, it is important to keep in mind that the Synthetic Population is the outcome of a statistical creation process (i.e. spatial microsimulation). Therefore, all results obtained from this dataset should always be treated and understood as “model output” – even basic descriptive statistics. Hence, the Synthetic Population should not be seen as a replacement of any “real” data in standard statistical analyses (e.g., regression analysis) which are typically performed with the Understanding Society survey. However, the dataset provides a great source of data for understanding “status quo” and modelling “what if” scenarios (e.g., static/dynamic microsimulations), as well as for “exploratory” analyses (e.g., when no other data available). Here, the Synthetic Population can be used for both, area-level and individual-level studies. At the same time, results obtained from the Synthetic Population can be utilised in a variety of external applications, such as informing parameters of external models, such as Agent-Based Models developed in Python or Netlogo (SIPHER Synthetic Population for Individuals in Great Britain 2019-2021 user guide, page 21).
Accessing the SIPHER Synthetic Population
Researchers can access the dataset via the UK Data Service curated collection: SIPHER Synthetic Population for Individuals in Great Britain, 2019-2021 (UK Data Service Curated Collection, SN9277).
You can explore the Dashboard and learn more about the dataset on the SIPHER website.



