Skip to content

Selecting the correct weight for your analysis

The UKHLS dataset is designed to be used with weights (see the Why use weights page).

Separate sets of weights are provided for:

  • The combined GPS and EMBS (from Wave 1)
  • The former BHPS sample (from Wave 2)
  • The combined GPS, EMBS and BHPS (from Wave 2)
  • The combined GPS, EMBS, BHPS and IEMBS (from Wave 6)
  •  The combined GPS, EMBS, BHPS, IEMBS and GPS2 (from Wave 14)

The available sets of weights are not identical for these four analysis bases, reflecting differences in data collection. For any proposed analysis, weights should be selected carefully. Given the complexity and multi-purpose nature of the Understanding Society design we provide multiple sets of weights to meet the different needs of users. The weight for your analysis reflects the survey instrument that is the source of the data being used in the analysis (e.g. household grid, household questionnaire, individual questionnaire, etc.), the analysis level (household or individual), and the combination of waves involved.

Each weight except for design weights has been scaled to have a mean of one amongst cases eligible to receive the weight.

To get started watch our short video about selecting weights in Understanding Society (note this video does not mention the weights for the latest sample added, GPS2). The naming conventions for weights are intended to help users to identify the correct weight. The name of each weight reflects the wave for which the weight is calculated, level of analysis, data source and its nature (design weight, cross-sectional analysis weight or longitudinal analysis weight). There are a number of weights reflecting the complex structure of the data and the help within this section gives guidance on which weights to use. All weight names follow the same structure w_xxxyyzz_aa:

The rules are described below.

Table: Naming convention for Understanding Society weights w_XxxYyZz_aa

Wave letterWho are you studying?Which questions(naire)?Which sample/timeline?Analysing one wave or across waves?
w_ (a to o)Xxx (Hhold or individual)Yy (instrument)Zz (samples cover different waves)_aa (cross-sectional/longitudinal)
a_

b_

c_

d_

e_

f_

g_ 
hhd: household

psn: persons 0+

ind: persons 16+

yth: persons 10-15
en: enumeration

in: interview

px: interview or proxy

5m: “extra 5 minutes”

sc: self-completion
us: GPS & EMB (W1>)

bh: BHPS (W2>)

ub: GPS, EMB & BHPS (W2-W5)

ui: GPS, EMB, BHPS & IEMB (W6>)

g2: BHPS, GPS, EMB, IEMB and GPS2 (W14>)

91: BHPS original sample (91> excl. N.I.)

01: BHPS original sample + boosts
_xw: cross-sectional analysis weight

_lw: longitudinal weight

_xd: x-sectional design weight

_li: longitudinal inclusion weight

Cross-sectional or longitudinal analysis

If your analysis uses only data from one Wave, select the “xw” (cross-sectional) version of the weight. This weight is defined for all sample members who responded to the relevant questionnaire at a particular Wave. If your analysis uses data from multiple or consecutive waves select an appropriate “lw” (longitudinal) version of the weight.

Example

If your analysis only uses data from Wave 4, select the “xw” (cross-sectional) version of the weight (note all wave 4 variables begin with d_). If your analysis uses data from multiple waves select an appropriate “lw” (longitudinal) version of the weight from the last wave. For example if you are looking at waves 4 to 9, use the appropriate longitudinal weight from the last wave in your analysis (note wave 9 variables begin with i_). The longitudinal and cross-sectional weights page explains the difference between the weights and when to use them.

Hierarchy of analysis levels

For individual level analysis you may want to combine information from different questionnaires. In this situation please select the weight suitable for the lowest level according to the hierarchy table below:

Level of analysis Data source _xxxyy
5Household grid and/or household questionnaire _psnen
4Adult proxy and main interview_indpx
3Adult main interview only (no proxy)_indin
2Adult self-completion interview_indsc
2Extra five minutes interview_ind5m
2Youth questionnaire_ythsc

Example:

If you are analysing cross-sectional data from Wave 1, and using questions from both the proxy/full interview as well as from the self-completion questionnaire, then the correct weight will be a_indscus_xw – the weight for the self-completion questionnaire is level 1 in the table and is lower than the proxy/full interview questionnaire at level 3.

Variable weight a_indscus_xw is designed for participants from Wave 1 (a_), aged 16+ (ind), who answered via the self-completion questionnaire (sc) from the general population and ethnic minority samples (us) and analysed within one wave (xw).

Note: for cross-sectional analysis from Wave 15 onwards and for a longitudinal analysis starting at Wave 14 or later there is no longer a distinction between self-completion, proxy and main adult interviews. For all such analysis consider main adult interview level. 

After you have decided the population you want to generalise your results to and the questionnaire(s) you want to use, refer to these tables to decide the appropriate weight for your analyses.

Alternatively, you can refer to the Naming conventions table above that summarises the naming convention and makes it easy to choose the correct weight.

For advanced users who want to model nonresponse in their own way, we provide design weights and inclusion weights which adjust the sample for unequal selection probabilities but not for nonresponse. Note that adjusting for the first wave nonresponse is different from adjusting for attrition and requires variables which have values for both responding households and never responding households.

Email newsletter

Sign up to our newsletter

Enter your email to receive our newsletter updates.