Incorporating survey-derived information in human heritability estimates
Shaila Musharoff (Cornell, Computational Biology)
In the era of massive, biobank-scale human datasets (i.e., hundreds of thousands to millions of individuals), trait modeling requires new approaches. In addition to genetic data, these datasets contain data from participant-completed surveys that were designed to capture trait-relevant environmental information. However, this survey data is typically noisy and characterized by missingness, making the goal of integrating it with genetic data to model human traits challenging. Additionally, environmental factors vary between populations, further complicating cross-population comparisons. Here, we consider a key trait-modeling task: heritability estimation from population samples. We first apply dimensionality reduction techniques to the health- and lifestyle-related survey data from the All of Us Research Program. We then include these survey summaries in heritability models of biomarkers and anthropometric measurements, several of which are used to diagnose common diseases. We find that including survey summaries as covariates reduces heritability to an extent that varies by population and by trait, indicating a context-specific role of survey data in trait modeling. We further find that these survey summaries are themselves heritable, indicating their overlap with genetic information and the heritability of environmental factors.