Known data issues

Some of the known problems relate to problems in implementing the experiments.

Cross-wave issues

In all Waves, the benefit income data has not been edited for outliers.

Wave 1

In Waves 1 and 2, we asked participants for consent to link administrative records to survey data. We will not be linking the administrative records because some of the consent forms have been lost. In this data release we are including Wave 1 consent variables. Wave 2 consent variables were restructured to improve clarity.

Wave 2

In Wave 2, a variable for w_ivtrans (translator used) was not collected. However, there is a related variable available in Waves 1-4, w_ivaffct22 (in what way was the respondent influenced: Other helped in translation, reading showcards, and other survey tasks).

Wave 3

In Wave 3, the Showcard experiment required at least some interviewers to use showcards for some participants and not for others. There are doubts about whether interviewers correctly followed the instruction about which sample members should have showcards. This situation could create errors and there is no check to tell whether or not the respondent saw the showcards.

In Wave 3, some respondents were incorrectly asked the experimental IP2 satisfaction questions in addition to the IP3 questions in relation to the satisfaction experiment. This happened with respondents with values 7, 8, 9, or 10 on the IP2 treatment indicator b_ff_lifesatw2. The responses to the IP3 questions for the respondents are potentially affected by having answered similar questions earlier in the interview. The questions that should not have been asked are c_lfsat variables ending in _g to _j. The c_lfsat variables ending in _a to _f are correct.

Variable c_conddateh, which is about strategies used to recall dates for a health condition beginning says it is a “check all that apply” variable. However, it was implemented as “select one”. Similarly, the variable c_pldateh, which is about strategies used to recall dates for a move is documented as a “check all that apply” variable, but was implemented as “select one”.

Variables related to nssec in Wave 4 for current and last job (previously not included) are in the current release. These include the 3, 5 and 8 category classifications.

A variable for highest qualification is not released because there has been a change in the response categories for educational and vocational qualifications.

In the Wave 3 Annual Events questions about employment, there are inconsistencies. For the first job, the question on the type of employment (nxtjbes) is less detailed than the one in the loop if they have additional jobs after this (nextjob). nxtjbes only asks if they were employed or self-employed, whereas nextjob asks if they were doing a different job for the same employer, working for a different employer or working as self-employed. This is only a problem for the first job reported in the annual events.

There are inconsistencies in variable names and variable labels in employment histories between IP2 and IP3/IP4 (because of changes in the way the histories are collected). From Wave 3 the loop through jobs starts at the second employment spell, whereas in IP2 the loop begins at the first spell. As a result the variable names are slightly inconsistent between IP2 and IP3/IP4. At IP3, the variable nxtst is supposedly equivalent to the variable nextstat1 at IP2 – i.e. it’s the first employment spell. However, it seems the variable nextstat1 has been incorrectly labelled as this first spell (it is in fact the 2nd employment spell after nxtst).

Wave 5

Errors in the Wave 5 questionnaire

The grid, household questionnaire and individual questionnaires were all programmed as separate web instruments, whereas the CAPI questionnaire was programmed as one combined instrument. In previous waves, the feed-forward data sat within the household grid, and any text fills or routing in the household or individual questionnaires were programmed via a reference to the household grid data. In IP5, because the web instruments were programmed separately, the feed-forward data needed to be copied into these instruments, so that it could be referenced within the household or individual instrument. Each feed-forward variable was copied individually (using code). There were mistakes in the code copying feed-forward data into the household and individual questionnaires. For subsequent waves, the whole feed-forward is copied as a block, to ensure that all feed-forward variables are copied correctly.

Feed-forward variables determine which experimental questions are asked in an interview, so the copying errors corrupted some of the experiments. This section describes their effects.

Household questionnaire. At the household level, three feed-forward variables: e_ff_rentwc, e_ff_metersw5 and e_ff_diw5 were improperly copied. The related variables about gas or electric meter reading were not asked and were not released in the data.

Additionally, the e_ff_diw5 variable did not have the correctly assigned experimental values. This meant that the dependent interviewing (DI) experimental variables in the household questionnaire were confounded, in that some DI questions were asked, but not the ones that should have been according to the experimental design. There were four sets of questions affected by this confounding: hsrooms/hsbeds (number of bedrooms and other rooms at the address); hsownd (tenure); xpmg (monthly mortgage payments) and rent/rentwc (amount and frequency of rent). Some variables were combined to facilitate analysis; others were not released (see summary below).

The affected variables in the household questionnaire were:

Summary of household variables affected by errors
Variable	Impact
E_FF_METERSW5	Blank due to programming error
E_FF_DIW5	Incorrect values due to programming error
E_HSROOMCHK	Combined version released
E_HSOWNDCHK	Combined version released
XPMG_A	Asked, but wrong experimental version, not released
XPMG_B	Asked, but wrong experimental version, not released
XPMG_C	Asked, but wrong experimental version, not released
XPMG_D	Asked, but wrong experimental version, not released
FF_RENTWC	Blank due to programming error
RENTCHK_A	Asked, but wrong experimental version, not released
RENTCHK_B	Asked, but wrong experimental version, not released
RENTCHK_C	Asked, but wrong experimental version, not released
RENTCHK_D	Asked, but wrong experimental version, not released
GASUSE	Not asked due to programming error in FF_MetersW5
GASUSE_CAWI	Not released
GASMETER	Asked, but wrong experimental version, not released
GASEST	Asked, but wrong experimental version, not released
ELECUSE	Asked, but wrong experimental version, not released
ELECMETER	Asked, but wrong experimental version, not released
ELECEST	Asked, but wrong experimental version, not released

Errors in the Wave 5 individual questionnaire

There was an error in the code copying three feed-forward variables in the employment modules of the individual questionnaire which meant that they were blank: ff_jbmngr, ff_jbsize and ff_jbterm1. This affected multiple variables which were not released. See the summary below.

Due to an error in the code, none of the e_ff_bentype01 to e_ff_bentype37 variables were copied into the individual questionnaire. This affected the nfh01 to nfh37 variables about benefit income. It only affected those people who did not mention a benefit that they said they were receiving the previous year. Such people will not have received the additional prompt question reminding them of last year’s answer. Our estimate is that around three-quarters of respondents were not eligible to be asked any additional prompt questions in the first place; of those who were eligible to be asked any, a large majority (around 70 per cent) only missed out on one such question, 20 per cent missed out on two, and ten per cent missed out on three or more.

The e_ff_casiw5 variable was not copied into the individual questionnaire at the start of fieldwork. The variable controls the mode of the self-completion questionnaire. The problem was resolved part way through the fieldwork period (after June 11). We created a variable e_scflagip5 (on e_indresp_ip) to show the status of mode of completion for the self-completion questionnaire in Wave 5. The effect of the error is that around 50 per cent of those eligible to receive the questions in face-to-face CASI mode did not get asked the experimental questions (313 people, based on unedited data). It should be noted that this does not confound the experiment (i.e. no respondents were asked questions in the wrong mode), but the reduced numbers mean that it does reduce its power to detect mode differences.

The affected variables in the Wave 5 individual questionnaire were:

Summary of individual level variables affected by errors in feed forward variables
Variable	Impact
FF_JBMNGR	Blank due to programming error
JBMNGRCHK	Not asked because FF_JBMNGR was blank
FF_JBSIZE	Blank due to programming error
JBSIZECHK_A through to JBSIZECHK_D	Not asked, not released
FF_JBTERM1	Blank due to programming error
JBTERM1_A through to JBTERM1_D	Not asked, not released
FF_BENTYPE01-FF_BENTYPE37	Blank due to programming error
NFH01 to NFH37	Not asked because FF_BENTYPE01 – FF_BENTYPE37 were blank
FF_CAWIW5	not released
SF12 Module	Not asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
GHQ Module	Not asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Parental Relationships Module	Not asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Alcohol Module	Not asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Personality Module	Not asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions

Wave 6

In Wave 6, four households in the £10 incentive treatment group became aware of the £30 treatment. To compensate they were offered an extra £20. The households are identified by the variable f_incentcomp on the record f_hhsamp.

Wave 7

In Wave 7, there are a few households with missing values for the experimental treatment allocations in the hhsamp file. The initial IP7 sample used to generate the experimental allocation variables was based on what was the latest IP6 data delivery at that time. Later data deliveries included some additional households. Most of these extra households were untraced with no addresses for them to go out in to the field. For the few extras that did have an address we generated randomisations for the experimental variables separately. For households with missing address information the experimental variables remained missing.

Wave 11

In Wave 11, the variables related to height and weight have been removed from all waves in the w_youth_ip record, due to measurement problems with these variables.

The Wave 11 individual interview question timings file (k_indint_timings.csv) contained two errors which have been corrected with the version released with IP14. The first error was the following. When the timings file was created, respondents with the same Serial ID (a household-level fieldwork identifier) were updated with values from another interview that had the same Serial ID, overwriting values (e.g. if they both answered ConsentQ3, both interviews would have the same value for ConsentQ3 and should have been different). If they were routed to different versions of questions – e.g. ConsentQ3 and ConsentQ4 – one respondent would have timings for both questions. This has been corrected by ensuring that the serial ID is used with other identifiers to uniquely identify individual cases. This means that the updated timings data has changed across all the timings variables and the derived summary variables. The second error was that some observations for questions in modules that the respondent was not routed into contained the value “12/30/1899 0:00:00” instead of being blank. This has also been corrected.

Wave 13

In Wave 13, there was an error in the sample file. The variable ff_eventtrigw12 was erroneously set to missing for all sample members. As a result the question “eventdebrief” that should have been asked for all sample members invited to the event-triggered data collection during 2020/2021 (ff_eventtrigw12=1) was not asked of anyone. In addition, in the introduction to the Annual Events History (“calintro”), the text fill “Please tell us about all changes, even if you have already reported them in the monthly questions about life events that we have been trialling. The reason for asking you again is that in this interview we are interested in different aspects of any changes you have experienced.”, which should have been shown to all respondents invited to the event-triggered data collection, was not displayed to anyone.

Wave 14

In Wave 14, the variable father (“Fathered children since last interview”) is not populated for about 300 cases where it should be. This error occurred because an age filter was left active from a prior question, so respondents aged > 64 years were not asked.

Wave 15

In Wave 15, there is a household that participated in the wave but had no allocations for the experimental conditions (ff_ variables in the hhsamp file). This was a late re-joiner household, that was lost at Wave 13 but re-joined the panel for Wave 15 but after the allocations had been made.

Wave 16

In Wave 16, there are three households that participated in the wave but had no allocations for the experimental conditions (ff_ variables in the hhsamp file). These were late re-joiner households, that were lost previously but re-joined the panel for Wave 16 but after the allocations had been made.

In Wave 16, no respondents were routed into the proxy questionnaire module that asked about respondents who had moved into a care home (module “carehomeproxy” in the IP16 questionnaire). The corresponding variables were therefore dropped from the file p_indresp_ip. Similarly, in the household grid, no respondents were routed into the questions about household members who were reported as having moved into a care home at the previous wave. The variables chomestill – chmrespidp were therefore dropped from the file p_indall_ip.

Wave 17

Lost household responses

At Wave 17, an error in the data collection process resulted in lost household responses for some households.

A scripting error related to face-to-face fieldwork led to some web data being unintentionally overwritten. This occurred where the household grid had been completed online, but individual interviews were incomplete or missing. If such cases were later accessed by a face-to-face interviewer, the data already provided via the web could be lost, as it was overwritten by (blank) face-to-face data.

The issue was not apparent during fieldwork and only became clear during data processing, when inconsistencies were found in cases thought to have complete household data. This problem was particularly relevant for households that were not fully completed online (for example, where individual interviews remained outstanding). Fully complete households were automatically ‘locked’ by the script, preventing overwriting, but this was not done for partially completed households as access to the household record was necessary for interviewers to complete follow-up work face-to-face.

This issue affected 156 households (5.3% of the issued sample). All were marked in the data returned by the fieldwork agency with a ‘data lost’ outcome.

These cases are identified in the file q_hhsamp_ip via their value of the household level outcome variable, q_ivfho. Most (147 cases) have q_ivfho == 17 ‘individual interviews only (no grid)’; the remaining 9 cases have been allocated q_ivfho == 39 ‘lost on laptop’ as they have neither a household grid nor an adult interview.

We have been able to partially recreate data for some of these cases using some weekly data flows received during fieldwork. To facilitate prompt delivery of sensors to households consenting to have in-home sensors (see 9.49 Indoor residential environment: consent for in-home sensor) and prompt data collection from smart meters (see 9.50 Domestic energy use: consent for smart meter data linkage), the fieldwork agency sent us weekly data on the outcomes of the relevant consent questions. As we retained these weekly data files, we have been able to reassemble sparse records providing the consent responses in relation to the households where the main household data was lost.

Consent question envsenscons2 asked in CATI mode

At Wave 17, a consent question was inadvertently asked of some telephone respondents. We generally do not ask questions seeking consent for data linkage or to complete additional tasks in the CATI (telephone) mode.

The IP17 household questionnaire included two variants of a consent question to place environmental sensors in people’s homes. Households were pre-allocated via variable ff_esensinfow17 to receive either version 1 (envsenscons1) or version 2 (envsenscons2) of the question.

Households responding in the CATI mode and allocated ff_esensinfow17=1 were (correctly) not asked envsenscons1.

However, households responding in the CATI mode and allocated ff_esensinfow17=2 were asked envsenscons2 when the question should have been skipped based on the mode of response.

Inconsistency between mode of allocation variable and mode of completion

At Wave 17, different allocation variables were computed within the questionnaire script (i.e., at the point the respondent was completing the questionnaire) depending on whether the respondent was completing online or face-to-face. Face-to-face respondents were allocated a value of congrpftf while web respondents were allocated a value of congrpweb. (For further details, see 9.46 Consent decision process.)

There are 12 respondents who have values for congrpweb — indicating they completed online — but also have a mode of completion (indmode) value describing them as having completed face-to-face. These cases are believed to all be respondents who switched mode at some point during their completion of the survey. They would have completed part of the survey online, receiving an allocation of congrpweb, then having not completed the full survey an interviewer would have contacted them and they would have finished the survey face-to-face. The apparent data inconsistency, therefore, arises because the two variables are indicators of different things. The presence of a value for variable congrpweb indicates they completed the survey online up to at least the point where that variable was allocated, while the face-to-face value in indmode indicates they finished the survey in the face-to-face mode.

Wave 1

Wave 2

Wave 3

Wave 5

Wave 6

Wave 7

Wave 11

Wave 13

Wave 14

Wave 15

Wave 16

Wave 17

What else is Understanding Society doing?

Authoritarian parents have negative effects on behaviour

Lords Committee cites Understanding Society in home-working report

Unpaid work – the missing link in the gender gap

Children’s worries include good grades and rising prices

Email newsletter