Representing the population

Understanding Society can be used in different ways, to represent several different populations.

You can represent the cross-sectional population (those currently resident in the country) in any year since 1991 or the longitudinal population over a series of years (those continuously resident in the country over a period of time). You need to identify the appropriate data files and the appropriate weight to use, depending on the population you wish to represent. There are some important points to note: From 1991 to 2000, the Study only covered Great Britain (England, Scotland and Wales). It was extended to Northern Ireland in 2001. Consequently, you can represent:

the cross-sectional population of Great Britain in any year since 1991
the longitudinal population of Great Britain over any period of years since 1991
the cross-sectional population of the United Kingdom in any year since 2001
the longitudinal population of the United Kingdom over any period of years since 2001.

However, a much larger sample size is available from 2009-10 onwards, when data collection from the main Understanding Society samples (General Population Sample and Ethnic Minority Boost Sample) started, so longitudinal analysis starting at this point can be particularly valuable for the study of small subgroups or rare events. Due to the sampling methods used, some recent immigrants are excluded from several of the possible reference populations. The only populations with no such under coverage are Great Britain in 1991, Wales and Scotland in 1999, Northern Ireland in 2000, UK in 2009-10, in 2014-15 and in 2021-2022: The data collected between 1992 and 2008 in England exclude households consisting entirely of recent (since 1991) immigrants. In Wales and Scotland, data collected between 1992 and 1998 exclude households consisting entirely of immigrants since 1991 and data collected between 2000 and 2008 exclude households consisting entirely of immigrants since 1999. In all countries of the UK, data collected between 2010/11 and 2013/14 exclude households consisting entirely of immigrants since 2009/10, and data collected between 2015/16 and 2021-2022 exclude households consisting entirely of immigrants since 2014/15, data collected since 2021/22 exclude households consisting entirely of immigrants since 2021/22.

Representing a subpopulation

You can represent any subpopulation of any of the populations described above, provided it is defined by substantive variables. If you use appropriate analysis methods for the relevant population, but restrict your analysis to members of the subpopulation, your results will be representative of the subpopulation.

Examples of subpopulations that you can represent:

Residents of Northern Ireland
Females in full time employment
Babies born in the last 12 months
Conservative voters in the 2017 election
Males aged between 17 and 29 who hold a driving license

Are sample sizes adequate to represent ethnic minorities or immigrants?

In 2009-10 (Wave 1 of Understanding Society) data were collected for the first time from an ethnic minority boost sample which was designed to provide substantially boosted sample sizes for the following subgroups: Indian, Pakistani, Bangladeshi, Afro Caribbean and Black African. From 2014-15 (Wave 6) we further boosted the five ethnic minority groups listed above and also added a boost of immigrants (i.e. persons born outside of the UK). If you are interested in immigrants other than of the five ethnic groups listed above you may want to start your analysis from Wave 6. These subgroups are also asked additional questions, referred to as the “extra 5 minutes” questionnaire. These additional questions are also asked of a small random 4 subsample of the general population sample which can be used to compare findings for ethnic minority groups to the total population. For this use the weight w_ind5mus_aa.

How to represent a population or subpopulation

To produce population (or sub-population) estimates you need to apply weights and compute estimate standard errors taking into account the complex survey design. The Analysis guidance for weights, PSU, Strata page shows how to do this.

Analysing a subset of a population – what weights should I use?

It depends whether the subset is defined by personal characteristics (e.g. sociodemographic or geography).

It is appropriate to use the provided weight, even though this was derived for the whole sample. Though not tailored specifically to your analysis sample, the provided weights should not only make the total sample representative of the total population but should also make any subset of the sample representative of the equivalent subset of the population. For example, sample members resident in Wales will represent the population of Wales; female sample member born between 1951 and 2000 will represent all women in the population born between 1951 and 2000, and so on.

In the case of using an unusual combination of instruments for which we do not provide a weight, you have a choice between three options:

Use the weight provided for the (smallest) hierarchically-superior (larger) sample
Use the weight provided for the (largest) hierarchically-inferior (smaller) sample
Derive your own weight, tailored to your analysis sample (take a look at our Open Essex (MoodleX) course Creating tailored weights for UKHLS).

The first two options are both sub-optimal, in different ways, but are simple to implement and the sub-optimality may be minimal. With the first option, the weights will be correcting for a different nonresponse process to the one relevant to your analysis sample.

With the second option, weights will not be defined for all potential members of your analysis sample, but the weights will correct for the relevant nonresponse process. In the example, the largest hierarchically-inferior sample for which weights are provided is the set of people who gave a full interview, including the self-completion component, at all of Waves 1 to 5, e_indscus_lw. If this weight were only defined, for example 15,613 out of the 17,977 members of your potential analysis sample, i.e. 86.8%, using this weight would cause a (very slight) loss of precision (but will not introduce additional bias).

What else is Understanding Society doing?

Insights 2026: Children and young people’s futures

The disappearance of the hump shape in illbeing by age

Training can boost job mobility and pay growth

Understanding Society Wave 15

Email newsletter