Derived variables are variables that are computed from one or more variables. Some are computed during the interview to control the routing within the questionnaire and appear in the context of the relevant module. Others are computed post-field for the purpose of analysis and are positioned last in the data files so they can be easily identified (derived income variables are discussed in the Derived Income variables section).
Some derived variables flag whether or not a certain characteristic is true for a study member. w_jbft_dv is a flag for whether or not a respondent has a full-time job and w_nemp_dv counts the number of employed people in the household, while w_mnpid points to the cross-wave person identifier of the respondent’s biological mother.
A data file may have alternative versions of a derived variable, such as those which point to others in the household, for example, w_hgbiom and w_mnpno for the person number of the respondent’s biological mother in the household. While w_hgbiom has been computed based on information collected during the interview, w_mpno has been computed post-field after the information collected in the household grid has undergone extensive data cleaning.
Variables that are produced post-field, are clearly marked in the data by suffixes: UKHLS weights are shown by the suffixes “_lw” or “_xw”; most derived variables are shown by the suffix “_dv”, and pointers to other members in the household typically end on “pno” or “pid”. All variables ending on “pid” contain the UKHLS person identifier pidp, not the original BHPS person identifier.
Information collected using dependent interviewing is merged with the respective information collected using independent interviewing (e.g.,) when a respondent did not provide the information in the previous interview, or when they are new to the Study) and stored in the data file under the variable name used for the latter. See, for example, marital status (w_mstatsam).
We use look-up files between SOC 2010 and other SOC versions (i.e., 2000, 2020) to derive variables corresponding to each version. Users may apply to access the Special Licence version of Understanding Society to access non-condensed versions of these codes.
Tips for analysts:
Information about how a derived variable is produced is shown in the Derived Variable Note field of the variable. The Variable Search provides descriptive statistics for each variable and, in the Origin field, lists the variables used in the computation of the derived variable. For variables that were computed during the interview, additional information is available in the questionnaires.
Analysts can also search for the description of the derived variables under the Index Term “Derived variables” on the website.
Index of Multiple Deprivation (IMD) variables
Variables containing quintiles of the Index of Multiple Deprivation (IMD) ranking for each household’s LSOA 2011 area (or equivalent for Scotland and Northern Ireland) are provided in the indall data file for each wave of Understanding Society, onwards Wave 12, as follows:
- England – w_imd2015qe_dv (IMD 2015 version) and w_imd2019qe_dv (IMD 2019 version)
- Northern Ireland – w_imd2017qni_dv (IMD 2017 version)
- Scotland – w_imd2016qs_dv (IMD 2016 version) and w_imd2020qs_dv (IMD 2020 version)
- Wales – w_imd2014qw_dv (IMD 2014 version) and w_imd2019qw_dv (IMD 2019 version)
These have all be derived from the latest official data released by each country’s statistical body. Researchers requiring access to the full indexes or to the individual indices can do so by applying for the Special Licence dataset SN 7248 (LSOA2011) or SN 6670 (LSOA2001) as appropriate. This can then be linked to the required IMD dataset that is available as open access data published by each country.



