Primary sampling unit and stratification variables
As the sample design involves stratification and clustering, these design features affect standard errors and should therefore be taken into account in analysis. Appropriate variables are provided to allow the analyst to do this. Here we describe the stratification and clustering variables in the main Understanding Society data files. See Fumagalli, Knies et al. (2017) for a description of these variables in the harmonised BHPS. General advice on using this information appropriately applies to both the UKHLS and harmonised BHPS data.
The variable indicating the primary sampling unit is psu. It is available in the cross-wave files, xwavedat xwaveid. As the PSU is determined at the time of sampling the value of this variable does not change over time. But to make it easier to use, this variable is also included in wave specific data files, where the name of the variable is w_psu with “w_” reflecting the wave prefix. Similarly, as stratification occurs at the sampling stage, the variable representing stratification, strata, does not change over time and is available in the cross-wave files, xwavedat xwaveid, and in wave specific data files with a wave prefix, w_strata.
Description of variables
These tables provide details of the range of values for variables w_psu and w_strata.
What happens if I don’t correct for clustering?
Taking sample clustering into account is simple to do in most standard statistical software for most kinds of estimation. However, if you do not do this, while your estimates are not affected, associated standard errors will tend to be underestimated – sometimes considerably so – resulting in biased hypothesis tests and overfitting of models (with the exception of the distribution of gender where the effect of clustering has sometimes been shown to decrease confidence interval, but such effect is usually observed where households comprise highest level of clustering).
What happens if I don’t correct for stratified sampling?
Taking the stratified nature of the sample design into account is simple to do in most standard statistical software for most kinds of estimation. However, if you do not do this, your estimates are not affected, but associated standard errors will tend to be slightly over-estimated. This makes your analysis slightly conservative, which is often acceptable.



