Skip to content

Missing values

For various reasons survey responses may not have a valid code or value.

The missing value codes asigned to these data are described below. All missing values are negative and are never used as valid responses. We recommend that users carefully read the questionnaires and compare missing value distributions across waves before using the substantive information contained in them.

ValueDescription
-1“Don’t know” – When the respondent does not know the answer.
-2“Refused” – When the respondent refuses to answer a question.
-7Proxy” – Sometimes when a person cannot participate in the interview, someone else in the household (generally their spouse or partner or adult children) answers questions on their behalf, that is, by proxy. This questionnaire is a much shorter questionnaire asking factual information. So, if a question was not included in the proxy questionnaire and the person gave a proxy interview, this variable will be missing for them. In such cases the variable will have a value of -7.
-8“Valid skip” – This information is missing because the person was never asked this question as they were not eligible for it. E.g., someone who is not in paid employment is not asked questions about their pay.
-9“Missing by error or implausible”.
-10“Not available for the IEMBS” – Some questions were only asked in the W6 questionnaire for non-IEMBS and so individuals in IEMBS will have missing information for these variables.
-11“Only available for the IEMBS” – Some questions were only asked in the W6 questionnaire for IEMBS and so individuals in non-IEMB samples will have missing information for these variables.
-20“No data from the BHPS W1-18” – This code is only used for variables in the xwavedat file which is harmonised across BHPS and UKHLS. If some variable was only asked in the UKHLS then there will be no data from BHPS W1-18 and hence missing for those not interviewed during UKHLS.
-21“No data from the UKHLS”. – This code is only used for variables in the xwavedat file which is harmonised across BHPS and UKHLS. If a variable was only asked in the BHPS then there will be no data from UKHLS and hence missing for those not interviewed during BHPS W1-18.

Note that the default missing value code for post-field derived variables tends to be “missing or wild”. This also applies to most variables on the xwavedat file. Missing value codes on the youth self-completion questionnaire also tend to be less accurate because the instrument was administered as a paper-and-pencil questionnaire and so it is not clear whether they refused to answer, didn’t know the answer or simply missed the question. They may also not have followed the question routing correctly.

Tips for analysts:

Income variables can be negative due to self-employment reported losses. A value of 0 means their income was 0 and negative values (other than those dedicated for missing values) are actual negative values reported. All individuals with negative net labour income are self-employed. In this survey data, missing values are assigned a negative value. These negative values are dedicated for the missing values and will never reflect actual negative income values reported. Note: users can identify those self-employed using the variable w_jbsemp.

Email newsletter

Sign up to our newsletter