Getting Started Guide
Understanding Society is a rich and valuable data resource that provides high quality, longitudinal information on topics such as education, employment, health, income, family and ethnicity. Understanding Society data helps researchers understand the short, middle and long term effects of social and economic change.
If you are new to using longitudinal data the CLOSER Learning Hub has information and resources to help you explore longitudinal studies.
This Getting Started Guide can also be downloaded as a pdf.
Exploring Understanding Society
Contents of the Study
Understanding Society is longitudinal and long-term. By repeatedly collecting data from the same individuals over many years changes in people's lives can be understood and causality can be better identified than with cross-sectional survey data. Data collection for the British Household Panel Survey, which is now part of Understanding Society, started in 1991, whilst Understanding Society data collection started in 2009/10.
The Study has a household focus. The collection of data from every adult and child aged 10 or older in each household means that inter-relations between family and household members can be investigated. The large sample size permits analysis of small sub groups and analysis at regional and country level.
Target population and samples
Understanding Society has four core samples:
- a sample of around 26,000 households, representative for the residential population living in private households in the UK in 2009/10
- an Ethnic Minority Boost (EMB) sample for England and Wales designed to provide around 1,000 adult respondents from Indian, Pakistani, Bangladeshi, Caribbean and African groups
- an Immigrant and Ethnic Minority Boost (IEMB) sampled added at Wave 6 to add a sample of new immigrants and refresh the core ethnic minority groups
- households from the British Household Panel Survey (BHPS), which was a representative probabilty sample of the residential population living in private households in Britain in 1991, with subsequent boost samples for Scotland, Wales and Northern Ireland. You can find out more about the BHPS here.
In the data, information from all samples in a single Wave is presented in the same files. When analysing data from Understanding Society it is important to consider that not all cases had the same probabilty of selection into the Study originally and that all samples (except the Northern Ireland sample) used a clustered and stratified design. We attempt to maintain individuals from the first Wave as part of the sample as long as they live in the UK. We also interview other individuals joining their household, as long as they live with the original sample member.
Accessing the data
Using the data
The data are delivered in a file containing a compressed folder named UKDA-6641-stata which you need to extract using software such as 7-ZIP. We provide our data in SPSS or TAB format. When you open the compressed folder click on Extract in the task bar, the Extraction Wizard starts. You are prompted to choose a location to place the extracted files. We recommend you keep all files in a designated folder and as our data gets updated over time it is advisable to include the release version in the folder name.
Once you have extracted the file you will find two files and two folders. The file read6614.htm links to a website that provides basic information about this release. The file 6614_file_information provides a list of all files in this release. Detailed study documentation, such as user guides, questionnaires and fieldwork materials are in the sub-folder mrdoc/pdf.
The Understanding Society data are in the sub-folder stata\stata11_se.
stata\stata11_se: this contains the data files in Wave specific sub-folders. For example bhps_w1 contains all data files from harmonised BHPS files from its first Wave, us_w1 contains all data files from Understanding Society Wave 1. The respective cross-wave files are in bhps_wx and us_wx.
Data for different Waves are presented in separate files. File names begin with a prefix designating the Wave of data collection (“a_” for the first Wave, “b_” for the second Wave; we use “w_” to denote Waves in general). Waves collected before 2009 additionally have a “b” in front of the Wave prefix (i.e., “bw_”). A small number of files do not have Wave prefixes – they store information across all Waves.
Data collected from different sources (e.g. the household interview, the adult interview, the youth interview) are stored in separate files. Table 1 lists the key data files that contain substantive information collected in interviews with responding households and individuals. The root filename is fixed over time.
Table 1: Key data files for analysising data for responding households and individuals
|Substantive data from responding households|
|Substantive data for responding adults (16+) including proxies and telephone interviews from individual questionnaires, including self-completion|
|Substantive data from the youth questionnaire (UKHLS: age 10-15, all Waves; harmonised BHPS: age 11-15, Waves 4-18 only)|
|xwavedat||Includes stable characteristics of individuals, including those reported when first entering the study|
When you open a file, it will contain variables that identify the units of observation such as households and individuals.
Households are identified by w_hidp, a Wave specific variable with a different prefix for each Wave. It can be used to link information about a household from different records within a Wave, but cannot be used to link information across Waves.
Individuals are identified by the personal identifier (pidp), which is consistent in all Waves and can be used to link information about a person from different records belonging to one Wave, or to link information from different Waves. Additionally, individuals are identified by w_pno – the person number within the household in a single Wave. The combination of w_hidp and w_pno is unique for each individual.
Variable names are built from the Wave prefix and the variable root name. The variable root name does not change over time as long as the underlying question does not change substantially. The prefix designtes the Wave of data collection (i.e. bw_ for Waves collected before 2009 or w_ for Waves collected from 2009 onwards) and is not used for information that will, by definition, never change (such as the unique cross-wave person identifier pidp and the survey design variables psu and strata) or stable characteristics in the data file xwavedat.
The variable root name corresponds to the name used for questions in the questionnaire, so it is easy to look up the questions that underpin the data. The Universe field in the questionnaire provides users with information about who gets asked a specific question.
Table 2: Missing value codes
|-21||No data from the UKHLS|
|-20||No data from BHPS Waves 1-18|
|-11||Only available for the IEMBS|
|-10||Not available for the IEMBS|
|-9||Missing by error or implausible|
|-8||Not applicable to the person or because of routing|
|-7||Proxy respondent. The question was not asked of proxy respondent or derived variable cannot be computed for proxy respondents|
|-1||Respondent does not know the answer|
The variable view is the best place to find out more about variables that are produced post-field. In the data additional variables are positioned at the bottom of the data files. We recommend looking for derived variables (search for *_dv in your Stata file).
Stata commands for basic data tasks
A list of Stata commands for basic data management and analysis tasks can be found here. Understanding Society currently provides syntax for Stata.