How do interviews become data?
Dr Jon Burton, John Payne
How do interviews with participants become the kind of data our researchers need?
Understanding Society wouldn’t be possible without our participants, who answer our questions each year about work, income, family life, health and more. But how do survey answers from thousands of individual people, chronicling tens of thousands of life decisions, opinions and thoughts become data for researchers to use? Well, we’ve put together a quick guide…
Whether our participants have a face-to-face interview or take the survey online, their answers are mostly gathered as numbers. So, when we ask if they agree or disagree with a statement, for example, their answer will be a number between 1 for ‘strongly agree’ and 5 for ‘strongly disagree’.
While they’re being interviewed, all the information is stored on the interviewer’s encrypted laptop until the end of the day, when the interviewer goes home and syncs their computer with their fieldwork agency’s secure server. (We have two fieldwork agencies: Kantar, a research consultancy, and NatCen Social Research.) If the participant is doing the survey online, it’s more direct: every screen they complete has a ‘next’ button at the bottom, and when they click it, the data from that screen are saved straight to the secure server.
People aged 10-15 fill in a paper questionnaire, which goes back to the fieldwork agency, where someone goes through each questionnaire and enters their answers.
The best way to envisage all this data is as a table or spreadsheet, with each row across being a person, and each column down a question, filled in with numbers.
We get the data through a secure, encrypted, online portal, to which very few people have access. One of the first things that happens is that all the information which can identify the participants is stripped out – names, addresses, postcodes, and answers to open-ended questions such as “What has happened to you in the last 12 months?”, where they might give people’s names. These are securely stored in a separate database.
(A small team has access to that information, so we can write to participants, and tell the fieldwork agency if someone contacts us needing to change the date or time of their interview.)
Before anyone can use the data for research, there’s a huge amount of work to do. The fieldwork agency, for example, checks that, if the coding says someone has given an interview, we actually have an interview for that person. In a household panel survey like Understanding Society, we can see how complicated people’s lives can be. People can move in and out of households – so if someone was in the study as a child, and is now an adult moving back into their parents’ house, the fieldwork agency will establish who they are. They can then identify that person as a ‘rejoiner’, and connect the information we’ve got now with what we’ve collected in the past – rather than treating them as someone new, and asking them a lot of questions we already know the answers to.
At Understanding Society, we check that people have been ‘filtered’ into answering the questions that are relevant for them. In the old days of paper questionnaires, you might have a question with a yes or no answer, and then: if yes, go to question X, and if no, go to question Y. On a computer-assisted survey like ours, we have coding built into the survey to filter people into answering the correct questions.
Turning the data round
One of our immediate tasks is to get some of the data ready to send back to the fieldwork agency. Each household gets interviewed once a year, so in the first quarter (January-March) we send the fieldwork agency the names and addresses of one batch of people in the Study. They will interview them until August or September, and we get the data back in September-October. We turn it round and get it back to the agency in November or December, so they can start interviewing the same households again in the first quarter of the next year.
We give the fieldwork agencies the information they need to speed up the interview and make sure people are asked the right questions – names, addresses, dates of birth, how many children they’ve got, and so on. So, rather than asking people lots of questions each year about their job – which might not have changed – we remind them what they told us last year, and ask if they are still doing that job. This means we only have to ask detailed questions about the job if it is new. It reduces the burden on our participants.
Structuring the data
Before the data are deposited with the UK Data Service, we create a consistent structure. The Study is longitudinal, so it has to have the same structure every year, so researchers can compare different aspects of people’s lives across time.
When a researcher wants to look at people with a specific level of education, they don’t want to go through every year to find all the qualifications someone has, so we have a ‘derived variable’: highest educational qualification. Essentially, it’s a way of tagging the data so a researcher can look for everyone with secondary education, or everyone with a degree, for example. Someone’s ethnic background is also a variable, so we ask that question the first time we interview someone and tag their record, and researchers can look at specific groups in society if they want to.
Imputation and weighting
Another area of our work is imputation, where we calculate the information we don’t have. We don’t collect household income, for example, but we ask every adult in the house what their income is, and use that to impute household income.
Weighting helps us to make sure the sample of households in the Study is representative of the UK population as a whole. A hypothetical example is: if we know there are about 50% men and 50% women in the sample, but men are slightly less likely to take part, we might end up with 48% men, and 52% women. So, we give all of the men slightly more ‘weight’, and each man then counts slightly more, so the ‘weighted sample’ is back to 50% men and 50% women.
Data deposit – and research
The final stage before the dataset is deposited with the UK Data Service is to decide which variables researchers can see, because there are different security levels. As long as a researcher has an account with the UK Data Service, they can download data with an end user licence. Data which are more disclosive – that is, containing more detailed information – would go under a special licence. Researchers have to apply and explain why they want to use them – they might need slightly more geographical detail, for example. Finally, there are secure data, which they can only access in the UK Data Archive’s Secure Lab and the equivalent at ONS. There’s very strict vetting for that level.
Over to the researchers
Then, the data analysis begins – which is the stuff of textbooks and degree courses, rather than blog posts, but you can read about the results of it here, or on our Twitter feed.
Jonathan Burton is the Understanding Society Associate Director, Surveys and is responsible for the management of the survey.
John is Associate Director, Data at Understanding Society, and is responsible for Understanding Society’s data management systems