Certain respondents in a survey wave are assigned a weight of zero, effectively excluding them from cross-sectional (single-wave) analysis. This typically happens for respondents who do not meet specific criteria for representativeness in that wave.
Attrition occurs when cases are lost from a sample over time. There are a variety of reasons for this in longitudinal research, such as: unwillingness of subjects to continue to participate, difficulties in tracing original respondents for follow-up (due to change of address) and nonavailability (due to serious illness or death).
The inverse of the probability of selection for each unit in the sample. It accounts for the sampling design and ensures that each sampled units represents the correct number of individuals in the population.
Broad sense: in social science often used to denote any measurement derived from the human body which might relate to health, including grip strength, waist circumference, lung function, etc. Narrow sense (as defined by the National Institute for Health): “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenetic processes, or pharmacological responses to a therapeutic intervention”.
Diastolic blood pressure (dBP) is the pressure in the blood vessels when the heart rests between beats, while systolic blood pressure (sBP) is the pressure in the blood vessels when the heart beats. ‘Blood pressure’ is reported as the sBP ‘over’ dBP (for example, 120 over 80). Both numbers are measured using either a manual or a digital blood pressure monitor (sphygmomanometer) and measured in millimetres of mercury (mmHg).
C-reactive protein (CRP) is an acute phase protein primarily produced in the liver which exhibits an elevated response in the body to injury (tissue damage, infectious and non-infectious diseases and trauma) and is otherwise scarcely present in the blood. CRP plays a variety of key roles in the immune system as a marker of inflammation. Concentration of CRP rises rapidly within a few hours after acute tissue injury or inflammation and falls rapidly afterwards. CRP levels are generally below 3mg/L in healthy persons but can rise above 500mg/L following an acute-phase stimulus.
Computer-Assisted Personal Interviewing (CAPI) refers to survey data collection by an in-person interviewer (i.e. face-to-face interviewing) who uses a computer to administer the questionnaire to the respondent and captures the answers onto the computer.
Computer-Assisted Self-Interviewing (CASI) is the respondent using a computer to complete the survey questionnaire without an interviewer administering it to the respondent.
Computer-Assisted Telephone Interviewing (CATI) is the interviewer using a computer to conduct the interview with the respondent by telephone.
Computer-Assisted Web Interviewing (CAWI) is a questionnaire provided to the respondent with a link to complete on a website.
Cholesterol is found in body tissues and blood plasma; mostly produced by liver cells and serves as a precursor for the steroid hormones (testosterone and oestrogen) and vitamin D. Cholesterol is carried in the blood in lipoproteins particles, a complex of lipid and protein. The lipoproteins are classified according to their density and the most common are lipoproteins (VLDL), low-density lipoproteins (LDL) and high-density lipoproteins (HDL). Total cholesterol is the combined amount of LDL and HDL cholesterol in blood.
A range of values so defined that one can specify the probability that the value of a population parameter lies within it. For example, the average yearly income of a sample of 1000 individuals is £30,000. The true population value can be higher or lower. Based on the sample size and variability across individuals, we compute a 95% confidence interval of £28,000 to £32,000. This means that if we repeat the sampling process many times, 95% of the time, the average income will fall within this range.
Coverage error is a bias in a statistic that occurs when the statistic is based on a sample drawn from the population of interest with some population elements being excluded, i.e., when the sampling frame is different from the target population or the population of interest.
Cross-sectional weights are statistical adjustment factors applied to survey data to ensure that the sample accurately represents the target population at a specific point in time. They correct for unequal selection probabilities, nonresponse bias and other sampling imperfections, allowing for unbiased population estimates.
Explains the variable names and values in the datafile.
The inverse of the probability of selection of each unit in the sample (the base weight), accounting for factors like stratification, clustering, or oversampling to better reflect the intended sample design.
Dihydroepiandrosterone (DHEA) and its sulfate form (DHEA-S) are the most common steroid hormones in the body and have been identified as markers of stress. DHEA and DHEA-s are produced by the adrenal gland in response to Adreno Cortico Trophic Hormone (ACTH) release.
Some groups within a sample may be more likely to respond to the survey and this could lead to potential bias (e.g. certain age groups or educational attainment).
Deoxyribonucleic acid. The chemical in our cells which comprises our genome.
EUL refers to End user licence which is a type of licence for accessing the data. The data are available to users, once registered, via the UK Data Service.
Enumeration weights adjust for undercoverage or unequal probabilities of selection that arise during the household enumeration process. The household enumeration grid identifies all household members – this determines who is eligible for an interview – enumeration weights account for differences in the likelihood of households and individuals being listed.
The study of mechanisms that affect gene expression by altering the DNA in a way that does not change its code. This term is often used to indicate ‘epigenomics’. Encompasses several mechanisms, one of which is DNA methylation.
When an interview is conducted in-person by an interviewer, usually at the respondent\'s home.
Ferritin is a protein which stores iron and prevents the potential toxic side-effects iron can have on other proteins, DNA and lipids. It is a commonly used biomarker for iron deficiency/anaemia, where there are low levels of red blood cells or haemoglobin in the blood (iron is a key part of red blood cells), limiting the ability to carry oxygen to the body’s tissues. Serum ferritin has also been identified as a marker of inflammation, although its causal role is less clear.
Fibrinogen is a soluble glycoprotein and an important component of blood clotting. Primarily produced by liver cells it is converted into the insoluble protein fibrin during the clotting process. Because fibrinogen is a major plasma protein, a small elevation in fibrinogen level will have a significant impact on plasma viscosity which can increase thrombotic risk (when blood clots block blood vessels). Fibrinogen is involved in several physiological processes including inflammation and atherogenesis (atherosclerotic plaque formation leading to heart disease) and other diseases.
Fieldwork encompasses the tasks that are undertaken to collect data for a survey.
Is an organisation which carries out the data collection for the survey.
Fieldwork documents include all materials used in the process of collecting the data.
It is the period during which interviews are conducted and data are collected for a particular survey.
A section of a DNA molecule which the cell can read and translate to produce a protein.
The study of how the DNA code relates to traits (genetically determined characteristics) and health conditions. This term often indicates approaches that focus on a narrow set of genes or markers, but is often used more broadly to include ‘genomics’.
A study in which a phenotype is tested for association with a large number of genetic marks (especially single nucleotide variant (SNV) across the genome, using linear or logistic regression. See the biological data glossary for the full definition of terms.
Glycated haemoglobin (HbA1c), is a common biomarker of diabetes and prediabetes, similar to fasting glucose. Unlike fasting glucose HbA1c represents a measure of blood glucose level over the previous 2–3 months and is measured as the percent of haemoglobin (protein in red blood cells) that is glycated (has sugar attached) using a technique called HPLC (High-Performance Liquid Chromatography) that is standard in most laboratories/hospitals. Percentage HbA1c can also be measured as mmol/mol, although this is less common. Values of 6.5% and above indicate diabetes, with 6–6.4% indicating prediabetes.
Grip strength is a measure of overall muscle strength/function, which can predict disability, morbidity and mortality in later life. Normal values are relative to age and gender, with values typically starting to decrease from the age of 60. Participants grip a dynamometer three times with each hand to measure maximum strength (measured in kilograms). Age and sex should be routinely adjusted for with some studies also adjusting for height.
Broadly speaking, the statistical assessment of empirically observed sample data against a theoretical model that asserts what would be found under particular specified conditions.
is offered to some respondents to increase response rates including gift cards or online vouchers.
is offered to some respondents to increase response rates including gift cards or online vouchers.
The inverse probability of sample members being selected through different samples and continuing to be enumerated up to wave X. Inclusion weights adjust for household response rate at the first wave of each sample (e.g. BHPS original, BHPS Scotland…GPS Wave 1, EMB) and attrition between the recruitment wave and the wave where the longitudinal inclusion weight is computed.
refers to sample members who are invited to take part during the fieldwork period.
Longitudinal weights account for sample attrition and aim to maintain the sample representativeness over multiple waves of data collection. These weights adjust for differences in response probabilities across waves, ensuring that the panel remains as representative as possible of the target population despite dropouts.
The chemical addition of a ‘methyl group’ to a cytosine, having an effect on the expression of a gene. See the biological data glossary for the full definition of the term
Survey researchers use the term mode to refer to the way in which data are collected in the survey (such as self-completion web interviews, face-to-face interviews or telephone interviews).
Is the influence that using different modes during data collection can have on survey responses. Methodology researchers look at the impact of mode on data obtained from surveys.
Survey researchers use the term mode transition to refer to the way respondents\' data collection changes from one mode to another (such as face-to-face interviews changing to web interviews).
Mathematical or statistical representations of relationships between variables used to analyse data, make predictions, or test hypotheses.
A mover refers to a respondent who has moved address since they were last interviewed for the survey.
Non-contact refers to an interviewer making contact with a household but the selected respondent or respondents are unavailable to complete the interview.
Non-respondents are those who have been invited to complete an interview but do not respond.
In surveys, nonresponse occurs when selected individuals or households do not participate or fail to provide answers to some or all questions.
The suffix (sometimes used as a word) that denotes an approach wherein a broad, comprehensive set of molecules of a certain class are assayed (measured) or analysed simultaneously.
A survey delivered over the internet.
Member of a Panel study where data are collected from a sample using successive waves of data collection with the aim of studying change over time.
The sum of an individual\'s alleles which may contribute to a given phenotype, usually weighted by GWAS effect size. While the associations of individual SNPs identified in GWAS are typically very small, when combined they can statistically explain a considerable proportion of variation in phenotypes. This combination of SNPs is referred to as a polygenic score (also referred to as a polygenic index or polygenic risk score). Polygenic scores provide an estimate of the summed effect of all SNPs that have been identified to associate with a phenotype. As such, they provide noisy but reliable proxies for genetic predisposition.
The population is the group that is being represented, from which a sample is then drawn.
In sample surveys, primary sampling unit (PSU) arises in samples in which population elements are grouped into aggregates and the aggregates become units in sample selection. The aggregates are, due to their intended usage, called “sampling units.” Primary sampling unit refers to sampling units that are selected in the first (primary) stage of a multi-stage sample ultimately aimed at selecting individual elements.
An extremely diverse class of biological molecule, each protein is composed of amino acids and is encoded by a gene. Proteins carry out every process in our bodies.
The study of a broad range of proteins. The proteome is the collection of proteins produced or modified by a cell or tissue. Proteomics methods identify and quantify proteins as well as protein interactions in cells and tissue.
If someone is not able to participate due to illness or they are busy, interviewers can ask someone else in the household (e.g., spouse or adult child) to complete a proxy questionnaire on their behalf. This is a much shorter questionnaire including only factual questions.
In sample surveys, primary sampling unit (PSU) arises in samples in which population elements are grouped into aggregates and the aggregates become units in sample selection. The aggregates are, due to their intended usage, called \"sampling units.\" Primary sampling unit refers to sampling units that are selected in the first (primary) stage of a multi-stage sample ultimately aimed at selecting individual elements.
refers to the letter received by a respondent who has not responded to the initial interview and is then reassigned to another interviewer or invited to take part in another mode of data collection.
The population is the group that is being represented, from which a sample is then drawn.
the degree to which a survey respondent perceives taking part in a survey as difficult, time consuming or emotionally stressful including interview length, frequency, complexity and invasive questions.
Refers to a change in the way a respondent answers the survey questions due to an outside stimuli, e.g. a sudden change in the mode from face-to-face to online interviewing during Covid-19.
A response outcome is the answer to a question asked in the survey interview.
Indicates how many interviews were achieved as a proportion of those eligible for the survey.
To protect or restrict something so that it can only be used for a particular purpose, e.g. a certain amount of the sample are only interviewed in a certain mode.
A sample design is the framework used for the selection of a survey sample. For example, if researchers are interested in obtaining information through a survey for a population, or universe of interest, they must define a sampling frame that represents the population of interest, from which a sample is to be drawn.
Sampling error addresses how much, on average, the sample estimates of a study characteristic or variable, such as years of education, differ from sample to sample. Sampling error is essential in describing research results, how much they vary, and the statistical level of confidence that can be placed in them. Sampling error is also critical in tests of classic statistical significance.
Likelihood of a particular unit (individual or household) to be selected as part of a sample in a study.
Likelihood of a particular unit (individual or household) to be selected as part of a sample in a study.
A measure of the accuracy of a sample estimate. The standard error is the standard deviation of the sampling distribution of a statistic.
Stratified sampling separates the population into subgroups that are called “strata” and then selects random samples from each subgroup. Dividing the sampling effort in this fashion creates some extra work and extra cost. However, under some conditions, the estimates drawn from stratified samples have much lower sampling errors than estimates from simple random samples of the same size. This allows sampling error goals to be met with smaller sample sizes than are needed in simple random sampling and consequently lowers the total cost of research.
Design or method that does not reach the best possible statistical efficiency, representativeness or precision given the available information.
interviews conducted by telephone near the end of the fieldwork period in an attempt to conduct interviews originally assigned to another mode of collection, e.g. CAPI.
Triglycerides are a combination of three fatty acids and glycerol (a carbohydrate molecule) which we get from our diet and through production in the liver. Triglycerides fuel our bodies and store excess calories in fat cells that can be released in between meals to keep energy levels stable. If we consume more calories than we need on a regular basis these triglycerides will continue to be stored and accumulate as body fat (they are the most common type of body fat).
Not adjusted to account for differences in selection probability, nonresponse, or population representation.
Portion of the total variance in an estimate that can be attributed to a specific source of variability in the sampling process.
Each set of annual interviews conducted as part of the Understanding Society survey are referred to as a wave.
refers to the order in which the interviewing mode is conducted during the fieldwork period, e.g. web interviews (CAWI) are conducted in the first few weeks and reissued to another mode if no contact is made.
Adjustment factors applied to survey data to ensure that estimates derived from the sample are representative of the target population. They correct for unequal selection probabilities, nonresponse, and coverage issues.
Sign up to our newsletter