Evaluating the performance of personal, social, health-related, biomarker and genetic data for predicting an individual’s future health using machine learning
The field of Epidemiology (and health-related social science) has undertaken considerable effort into understanding the determinants of health. Population level analyses concerning the differences in averages between population groups (e.g. lung cancer rates between smokers and non-smokers) is a common enquiry utilising data from social surveys. However, Epidemiology has long distinguished between how the causes of disease differ between the population- and individual levels. Our ability to translate discoveries at the population level towards discriminating between cases and non-cases of disease at the individual level has been limited so far despite increasing availability of data. The fellowship drew from the recent advances and successes in computer science using machine learning approaches to explore whether such methods can revolutionise how we build predictive models of health using social survey data. Using data from Understanding Society, the fellowship compared the relative contribution of personal, social, health, biomarker and genetic types of data as predictors of an individual’s future health status (i.e. one, two and four years from baseline) using a deep learning approach. Each ‘data type’ was evaluated using cross-validation approaches to identify its individual predictive power, and then through different combinations of data types to explore how much data can effectively (if at all) predict health outcomes. The project aimed to make two main contributions to social science research: (1) the evaluation of different data types and their relative contributions as ‘predictors’ of health status; (2) exploring the potential of deep learning to improve predictive models of ill health.
You can find out more about Mark’s work on his profile page.



