All of Understanding Society's key features can be found here
About the study
It is a household longitudinal survey of UK residents living in private households. The survey started in 2009.
A household survey is where the population of interest is households (and their members) and so a sample of households are chosen and information is collected about the household members.
A longitudinal survey is a survey where the same set of people are interviewed at regular intervals and asked the same questions. This helps in understanding the dynamic processes of their lives and measuring change.
The regular intervals at which a longitudinal survey collects information from its sample members is referred to as waves (some surveys refer to these as sweeps).
The data is collected over a period of 24 months, also referred to as the fieldwork period. The fieldwork period for the first wave was 2009 to 2010. But as the interviews had to be in one year intervals, the second wave covered the period 2010 to 2011. Generally, those who were interviewed in 2009 were next interviewed in 2010, 2011, and so on and similarly those who were interviewed in 2010 were next interviewed in 2011, 2012 and so on. Read the survey timeline
The data is available one year after the end of the collection period. So, Wave 1 data was available in November 2011, Wave 2 data in November 2012, and so on.
At each sampled address, up to three dwelling units are selected and at each dwelling unit, up to 3 households are randomly selected into the sample. In most cases, an address has one dwelling unit and one household in that dwelling unit. All household members of the households chosen in Wave 1 form the core sample. They are referred to as Original Sample Members (OSM). The children of OSM mothers also become OSMs. This is the core sample, that is, they represent the population of interest. So, OSMs are followed wherever they go as long as they live in the UK. If anyone joins the household of one or more OSMs, then they are called Temporary Sample Members (TSMs) and are interviewed only as long as they live with at least one OSM. This is because they are not the core sample who represent the population of interest, but they are interviewed to provide the household context for the OSMs. Exception: If a TSM is the father of an OSM child, then their status changes to Permanent Sample Member (PSM) and like the OSMs they are followed wherever they go as long as it is within UK. PSMs are followed permanently because they provide information about the family background of an OSM child. If we did not do this, then if the OSM mother and TSM father separate, we will not have information about the father of the OSM child.
We use the standard ONS definition: "one person living alone or a group of people who either share living accommodation OR share one meal a day and who have the address as their only or main residence".
A Dwelling Unit (DU) is a living space with its own front door – this can be either a street door or a door within a house or block of flats. Usually there is only one dwelling unit at an address.
Students who have left the household are followed and treated as a "split-off" household. If they are living in institutional accommodation (e.g., Halls of Residence), they are considered to be the only member of the household. If they are living in private residence (e.g., renting a flat), then the usual rules apply of who is considered part of the household.
The sample is chosen from private households and so institutions are not part of the core sample. However, if sampled individuals move into institutions after the first wave, they are interviewed if it is possible to do so. The only exception to this is prisons, where we do not seek an interview.
Information is collected about different aspects of the lives of the sample members. The main topic areas cover income, education, labour market activities, family background, partnerships, fertility behaviour health and wellbeing, attitudes and values, identity, ethnicity and religion. For a good overview of the different topics see Long Term Content Plan. You can also search for variables here HYPERLINK to Online Documentation search facility.
All information collected is voluntary. Interviewees can stop at any point in time and can also ask for their data to be destroyed after it is collected. First information is collected by interviewers from a responsible adult in the household about the household members - their names, date or birth or age, sex, marital status, employment status and relationship to each other. This is called the ENUMERATION GRID. Second, generally the owner or renter (or if more than one, the eldest of them) is asked about the household - household expenditure, mortgage, number of bedrooms,... This is the HOUSEHOLD QUESTIONNAIRE. Then all 16+ year olds, referred to as adults for the purposes of the survey, are asked questions about different aspects of their lives. This is the INDIVIDUAL QUESTIONNAIRE. This questionnaire also includes questions about children, 0-9 year olds for their parents and guardians...such as parenting styles, birthweight, etc. All 10-15 year olds are asked a number of questions about aspects of their school, friends, eating and drinking habits, happiness and wellbeing. This is the YOUTH QUESTIONNAIRE. The interviewers also provide some information about the condition of the property, cooperativeness of the interviewees, any difficulties faced by interviewees in answering the question etc.
Parents and guardians answer some questions about their children such as parenting styles, birthweight, etc. All 10-15 year olds are asked fewer number of questions about aspects of their school, friends, eating and drinking habits, happiness and wellbeing. This is the youth questionnaire.
The collection of information is subcontracted to a fieldwork agency. Until 2014 (Waves 1-5) it was NatCen Social Research, for Waves 6-8 it was Kantar Public. For the current Waves 9-11 contract, the fieldwork is conducted by Kantar Public, with part of the interviewing being done by NatCen. There are different ways of collecting data. (1) Until recently the most common method was face-to-face, that is interviewers went to the homes of interviewees and asked them questions in person. Some sensitive questions were answered by the interviewees directly on paper or computer and the interviewer did not see their answers. (2) Around 500 households are interviewed via telephone (3) Since Wave 7, an increasing proportion of the sample completes the interview directly on the web, that is, no interviewers are directly involved in asking the questions.
To meet the different objectives of the study, in the first Wave, a random sample of 26000 households was chosen from UK (referred to as the General Population Sample) AND a sample of 4800 households where at least one household belonged to an ethnic minority group was chosen (referred to as the Ethnic Minority Boost Sample). Then in 2010, the BHPS sample was incorporated into the Understanding Society and was interviewed as part of the second Wave. In 2015, as part of the sixth Wave, a new sample was added. This was the Immigration and Ethnic Minority Boost Sample, which included 2500 households where there was at least one person who was born outside the UK and/or belonged to an ethnic minority group.
Yes, the BHPS in Wave 2 and the Immigrant and Ethnic Minority Boost Sample in Wave 6.
Biomarkers such as (grip strength, lung function test, blood samples, etc.) were collected from a sub-sample of the survey sample. This is referred to as the biomarker sample.
Five minutes of question time was set aside for questions of interest to ethnicity and migration research. These questions were only asked of a sub-sample of the survey sample. This is referred to as the extra five minutes sample.
The BHPS is the British Household Panel Survey which started in 1991 and continued until 2008. It started with a Great Britain sample of around 5500 households. In 1999, Scottish & Welsh boost samples of around 1500 households were added and then in 2001 the Northern Ireland boost sample was added. There are similarities between the BHPS and Understanding Society in terms of sample design, questionnaire content, data structure but there are some differences. For example, the BHPS did not include an ethnic minority boost samples. There are also some differences in questionnaire content. See here (HYPERLINK TO FAQ relevant for that).
No. As a longitudinal study it is important that we interview the same people over a long period of time. We cannot add new people to the sample who are not connected to the households already in the Study.
First step is to look at the frequency distribution of the variable responses. For example, for the question "how satisfied are you with your life overall?" asked in Wave 1, 5588 individuals said they were completely satisfied, 17453 said they were mostly satisfied and so on and 1033 said they were completely dissatisfied. Read more
Understanding Society data can be accessed via the UK Data Service (UKDS). Researchers will need to be registered with the UKDS before requesting and downloading data. You will need to decide which level of the main data you need - End User Licence (EUL), Special Licence or Secure Data access.
- Read an outline of the different licence types available and the conditions of their access.
- Read the study's full Data Access Strategy
The EUL version of the main survey dataset (Study Number 6614) includes all of the data that should be required by the majority of researchers and is the simplest and quickest to access from the UK Data Service (UKDS). It does not, however, contain: dates of birth more detailed than the year of birth; detailed country of birth; detailed occupation or industry codes; or geographical identifiers less than Government Office Region (GOR). In addition, the income and pay variables are top-coded. Read the UKDS catalogue details and the Understanding Society documentation.
Government Office Regions (GOR) and countries are available as part of the main datasets (Study Numbers 6614 and 6931). The most detailed geographical indicators, grid references, are supplied as part of the Secure Access version (Study Number 6676). The geographical indicators between GOR and grid references are available as Special Licence datasets. A maximum of three Special Licences are usually allowed per application and combinations of certain 2001 and 2011 census type geographical identifiers are not allowed due to disclosure risks. The Understanding Society geography Special Licence datasets are only applicable to the Understanding Society waves. Users wishing to use Special Licence geographies with the harmonised BHPS waves will need to apply for the relevant BHPS geography Special Licence dataset(s).
It is free to use for non-commercial purposes.
The data is available in Stata (*.dta), SPSS (*.sav) and in tab delimited formats.
Read the terms and conditions of access to End User Licence data
Read the terms and conditions of access to Special User Licence data
The Special Licence / Secure Access application process is two-stage. Applications are made to the UK Data Service (UKDS) and they perform a series of checks on security and other aspects of the application. The UKDS will inform applicants of any delays in their processing. Secure Access applications are likely to take longer than Special Licence ones due to the additional processing required. The second stage is the approval of the application by the Understanding Society Scientific Leadership Team (SLT) under delegated authority from the data owners. The SLT is required to process applications within 10 working days of receiving them although the time-frame is often considerably shorter. Applications with insufficient or unclear details on the form are likely to require clarification being sought from the applicants via the UKDS and this will prolong the application process. It should also be noted that users of Secure Access data will require special training - the UKDS will advise on this matter.
A limit of three is normally applied to the number of Special Licences granted per application. In addition, combinations of certain 2001 and 2011 census type geographical identifiers are normally prohibited due to potential disclosure risks. Read the access restrictions . With regards to Secure Access data, access to it is only available in a controlled environment. It is possible to import Special Licence data, or indeed datasets from elsewhere, and a special application will need to be made to the UKDS in such circumstances.
Data that a researcher has been granted access to on a research project may not be used on another research project. A separate application will need to be submitted to the UK Data Service (UKDS). If clarification is required before submission then please get in touch with the UKDS for advice via their standard contact mechanisms.
Only those persons named in the data access application will be able to use the data for the specific project. Each applicant on a project must submit an application form. Supervisors requiring access to the data must also apply in the same way.
Project team members from different institutions can apply for access to data as part of the same project and to work on the same data. Special consideration for access to some Special Licence datasets may be given for applicants from non-UK institutions.
No person may access Understanding Society data without authorisation. Please contact the UK Data Service (UKDS) for advice in the first instance.
Data that a researcher has been granted access to on a research project may not be used on another research project. A separate application will need to be submitted to the UK Data Service (UKDS). If clarification is required before submission then please get in touch with the UKDS via their standard contact mechanisms.
All released Understanding Society data should be accessed from the UK Data Service (UKDS).
The survey data files that are downloaded from the UK Data Service (UKDS) are supplied in a single ZIP file which is a compressed format that results in faster download times. Read full details on ZIP files.
Yes. When you download the data from the UK Data Service (UKDS) you will be asked to check a box to say you will be using it for teaching purposes. You will then be asked to download a simple form with instructions. Basically it will require you to provide some simple details about you and the data and get you and everyone in your class to sign the form.
Please contact the UK Data Service (UKDS) Helpdesk for all download and accessibility issues.
A large number of questions are asked every year, while others are asked every few years. The frequency of variables can be seen below the variable description in the online description of the variable. For example, current economic activity status is asked every year. If you click here, you will see the waves in which this question has been asked under "Wave Occurance" For an overview of the question module frequency see the long term content plan
If the questions have the same name then they are comparable across time
Participation in the survey is completely voluntary and respondents can skip any question that they do not want to answer. As part of the survey we also collect respondents' consent to link to various administrative data sources such as information held by DWP, HMRC, DVLA and FCA. Read more
Read the BHPS - Harmonised User Guide. This guide accompanies the first edition of the Understanding Society – harmonised BHPS. It focuses on the harmonisation process, not the differences in scope, fieldwork practices, questionnaire design and content of the two studies.
Data and research
All data files that include information collected in Wave 1, start with a prefix A_, similarly all files pertaining to Wave 1, start with a prefix B_. Same types of information are available in files with the same names across all waves, just the prefix changes. So, all information collected in the adult interviews is put in INDRESP files: the Wave 1 adult interview information is in A_INDRESP, the Wave 2 adult interview information is in B_INDRESP. Since November 2018, 18 waves of BHPS datafiles which have been harmonised (to a large extent) have also been released along with 7 waves of Understanding Society data. These data files have similar structure, but their names start with prefixes BA_ ,BB_,... until BR_. For details about the other files see the study’s documentation and information on harmonisation of the BHPS
These files have names starting with the letter "x" to indicate these are the cross-wave files, that is, they include data collected from different waves. For example, XWAVEDAT includes time fixed information such as date or birth, country of birth, parents' pccupation when the person was 14 years old. These types of information are only collected once, generally the first time a person is interviewed. Although most poeple were interviewed for the first time in Wave 1, others who join the households of the core sample members (TSMs) after Wave 1, were asked in the wave they joined. So, this file puts this data collected in different waves together in one file.
The naming convention for variables is the same as for files, that is, the same variable across waves has the same root name but a different wave prefix. So, the variables for age in Wave 1 is A_DVAGE, in Wave 2 is B_DVAGE and so on.
If a variable does not change across waves it will not have a wave prefix. One example is the individual crosswave identifier pidp
A_HHORIG, B_HHORIG, C_HHORIG,… in wave specific files AND HHORIG in the cross-wave files will show which sample a sample member belongs to.
If the variables have the same root name then they are comparable across waves.
In some data files each row represents a unique household. These are household level files, e.g., A_HHRESP. In some data files, each row represents a unique individual. These are individual level files, e.g., A_INDREP.
These are derived variables, that is, the Understanding Society data team has produced these variables using different variables to make it easier for users. Sometimes these are imputed values of the raw variables (e.g., A_PAYGU_DV), sometimes these combine different bits of information (e.g., A_HIQUAL_DV), sometimes these have been checked for consistency using different bits of information available for that person (e.g., A_AGE_DV). Note another type of derived variable are the relationship pointers. These do not have a _DV suffix.
This is the unique cross-wave person identifier. That is, within or across waves, this variable uniquely identifies each person ever associated with the survey.
These are the wave specific household identifiers, that is, within any wave these values uniquely identify each household. Every person in the same household in each wave will have the same household identifier.
These are person numbers assigned to all household members in a responding household. These along with the household identifiers uniquely identify an individual member within a wave. These can change across waves and the ordering has no significance. So, one person could have the value A_PNO = 1 in Wave 1 and B_PNO = 7 in Wave 2
Use the household identifier: A_HIDP in Wave 1, B_HIDP in Wave 2, and so on
Use the individual crosswave identifier PIDP
You cannot do that. There is no concept of a longitudinal household because over time people move in together as well as live separately. As a result a cross-wave household identifier cannot be provided.
When individuals live in the same household with their spouse or partner, then along with other information for the person the PIDP of their spouse or partner (W_PPID) are also provided. Use these to link information about spouses.
The negative values -21 -20, -11, -10, -9, -8, -7, -2, -1 have been reserved for different types of missing data. Other than these values any negative value represent actual value of the variable. -1 = Don't Know, -2: Refusal, -7: Not asked as Proxy Interview, -8: Not asked as not eligible for question, -9: Other reasons for missing data; ONLY IN XWAVDAT FILE: -20 means "no data from BHPS" and -21 means "no data from Understanding Society". ONLY IN WAVE 6 data files: -11 means "variable not available for non IEMB samples" and -10 means "variable not available for IEMB sample".