Skip to content

Blog

We need a long-term evaluation culture

...and longitudinal data can help

people in a shopping centre

Policymakers have ball-park mechanisms at their disposal to figure out what is ‘going on out there’, but the sure-fire way to be certain whether a policy is achieving its intended outcomes, and for whom, and learn lessons, is to evaluate. What are the challenges of assessing social, economic and public health interventions – and how can longitudinal data help?

According to the National Audit Office, only 8% of major government projects are robustly evaluated, while 64% are not evaluated at all. So, it is not always clear how billions of pounds spent on key policies and programmes are making a difference to the lives of citizens. Arguably, in an era of populist politics, learning lessons plays an even more vital role in the durability, coherence and legitimacy of policies in driving change.

The dynamic nature of policymaking – perpetual drive for new ideas, and challenges of measuring impact – mean that evaluations are under-utilised as a way of working, strategic learning, driving change and celebrating success. Civil servants argue that the focus on delivery, ‘fire-fighting’ or bureaucracy also make it harder to focus on outcomes for citizens. So, policies risk being transactional rather than transformative. The NAO investigation identified both demand side and supply side challenges in government, but there are also challenges of data.

In the field of social, economic and environmental development, policy proposals can rarely be tested, with those that can test often opting for randomised controlled trials. However, observational data also have an important role to play, including in measuring spill-over effects and unintended consequences. Here we are primarily discussing interventions that impact on thousands or millions of lives, and it’s not as simple as whether something works or not. A range of factors and processes, such as social and cultural norms, class and economic systems, the way different stakeholders respond to new policies and how something is implemented locally, can affect impact. In some cases, results may not be apparent for years, and what we are looking for is a slowing down or acceleration of a change. 

The What Works Network, changes to HM Treasury’s Green Book and Magenta Book, guidance for civil servants, and the establishment of the Evaluation Task Force (plus £15 million evaluation accelerator fund) are some of initiatives being taken to scale up ‘testing, learning and adapting’, and build better evidence for decision making. To further stimulate the use of evaluations in improving policies, Understanding Society has started a project on how long-term panel data could help – including a call for policy evaluation fellows.

Evaluations are more likely to be feasible, and the results used, if they are designed into the policy at the stage of conception. According to the UK Government Department for Levelling-Up, Housing and Communities, “the advantage of considering evaluation evidence from the outset is that it increases the likelihood of generating timely and helpful information to assist in delivering the department’s objectives”.

Longitudinal surveys like Understanding Society, or repeated data, can provide a valuable long-term resource for evaluations and have much to contribute. This is because data is collected from the same set of individuals over time, and across the entire year, allowing comparisons before and after a (large-scale) intervention or between groups (where one group is not affected by a policy). The Study collects data from across the UK, enabling the evaluation of policies that are devolved, or gradually rolled out across the country.

Robust evaluations can be difficult, especially in a dynamic environment, with many ‘moving parts’ to people’s lives. With its multi-topic content covering education, employment, health and wellbeing, income and deprivation, family life and civic engagement, Understanding Society offers greater scope to factor in and control for confounders.

Longitudinal studies enable various evaluation techniques, with varying degrees of robustness: single group pre and post analysis; interrupted time series analysis; difference in difference analysis where two groups are compared; propensity score matching; developing a synthetic control group as the counterfactual; and regression discontinuity design. Longitudinal studies can also be a cost-effective option – providing data which might otherwise cost the evaluation process a lot to collect.

In April 2013, the government introduced a ‘bedroom tax’ (officially labelled the under-occupancy penalty) to reduce housing benefit expenditure. An evaluation by Sanchez-Vidal et al (2020) examined the impacts of the policy on its target group considering a range of outcomes and using a difference-in-difference method that compared the observed behaviour of treated families with the outcomes of a suitable control group. In a nutshell, the researchers found that the treated group experienced losses to housing benefits and overall income but the policy was unsuccessful in encouraging residential moves – though tenants who moved did downsize.

Evidence based mechanisms of change

Lack of demand stems from a variety of factors according to the NAO Report, ranging from lack political interest or cultural buy-in to evaluation evidence not well understood by the policy profession. The amount of public spending on evaluations is a fraction of one percent, so increasing departmental investment in R&D to the national average in the long-term, combined with ensuring all civil servants understand basic evaluation methods, is vital to drive up experiments according to David Halpern, the ‘What Works’ National Adviser.

In the short to medium-term the growing use of the theories of change by many organisations including government departments, provides an opportunity. Simply defined, a theory of change is a model of how (in principle) a policy, or an organisation, expects delivery to translate into desired outcomes. There are many varieties of theories of change used in different contexts, but they are widely used as evaluative frameworks. Such modelling is useful for both process and impact evaluation.

Given the widespread application of the concept, Geoff Mulgan argues that much more discussion is needed about what constitutes good or bad theories for change. If departments set out the mechanisms of change based on research and evaluative evidence, and publish these (at least in outline) alongside each major new policy or programme, that could create a more reflective culture.

Linking-up data

The inclusion of a counterfactual improves causal inference on approaches based on panel data, but the selection of a suitable counterfactual or control area can be problematic. Even with large panel surveys like Understanding Society, identifying the intervention group can be a constraint. It is usually not feasible to ask extensive questions to identify if respondents have been the beneficiary of a range of policies or services.

Linking survey and administrative data can help with the ‘identification problem’, where information on take-up or use of a service is captured in administrative data sources. For example, evaluating the impact of free-schools meals on child development, health and educational outcomes is feasible as a result of linking Understanding Society data with the National Pupil Database for England.

A study by Rabe et al (2021), sought to evaluate how parents change their behaviour in responses to OFSTED assessments about school quality. Using linked household and school administrative data, the researchers observed some households being interviewed prior to their school being inspected (the control group), and some being interviewed post inspection (the treated group). Parents receiving good news over school quality significantly decreased time investment into their children relative to parents that later received such good news. Overall progress, however, on linking large scale survey and administrative data remains slow.

Interventions and systems thinking

The pandemic required mobilisation across the whole of government. This way of working is becoming more critical as we face cross-cutting challenges such as the climate crisis, deep seated inequality, public health, an ageing society, the impact of technology, etc.

The issue of policy silos is a very long-standing one, and the shift towards Outcome Delivery Plans and shared outcomes in government may gradually help to break-down silos. According to the Institute for Government, 26% of the outcomes from the 2020 Spending Review are shared between national departments – although they also note that these are often seen as ‘nice to have’, and dropped if resources are constrained. Where powers or responsibilities are devolved, Scotland, Wales and NI will no doubt have their own mechanisms for mobilising effort across departments.

Do such shared outcomes open-up the possibilities of taking a systems approach to evaluation – or vice versa? Social policies, designed to tackle so called ‘wicked problems’, make these types of interventions particularly difficult to evaluate. Social complexity is characterised by long causal chains or high risks of unintended consequences.

One interesting example of a collaboration between academia, government partners and practitioner organisations, which takes a systems approach to policy appraisal and evaluation of (health) policy is the SIPHER programme. The programme is not seeking to answer ex-ante and ex-post questions about policy, but better understand how many events or interventions contribute to changes in different parts of a system.

The programme has created a synthetic baseline population using the granularity of Census data and richness of Understanding Society. The use of synthetic controls as a tool for evaluating public health interventions is generally underutilised. The SIPHER programme initially focus on three areas of policy evaluation: health, wellbeing and inequality. It will provide policymakers and practitioners a dynamic resource base to both appraise policy and consider evidence from the effects of interventions. This will help inform suitable next moves through a range of policies – such as those that help spread economic benefits and opportunities, improve availability, quality and affordability of housing, and promote and maintain good mental health and wellbeing.

A culture of reflective policymaking is vitally important to drive change in people’s lives. Longitudinal data, combined with greater capabilities in robust evaluation design across academia and policymakers, represents a real opportunity to improve policies.

This is an expanded version of a post which first appeared on the LSE Impact Blog

Authors

Raj Patel

Raj Patel is Associate Director, Policy and Partnerships, at Understanding Society

EducationHealth and wellbeingPolitics and social attitudes

Email newsletter

Sign up to our newsletter