Idea: Genetics, COVID-19 and Crowdsourcing GWAS

There are numerous reasons why the severity of COVID-19 differs from person to person, with age being the most obvious one. Given the nature of the virus, specifically how it utilizes ACE2 receptors to invade cells, genetics should be one of the major covariates that can predict COVID-19 severity.

The goal of this (very ambitious) project is to build a fully private data platform that will be used to discover genetic factors that affect COVID-19 severity. Once such genetic markers are identified, the tool will be used to identify people who are at higher risk. These findings would have an immense and immediate benefit to the public.

How does this work? We start by (i) building necessary private data analysis techniques (where either multi party computing or some variation of homomorphic encryption will be utilized) and (ii) identifying a range of SNPs that can be relevant to COVID-19. We then recruit COVID-19 patients who have done genetic testing via services such as 23andme or Ancestry, and ask them to use our platform to upload their (homomorphically encrypted) genetics data, COVID-19 severity and other relevant factors (e.g. age, preexisting conditions, smoking frequency). Finally, we run the world’s largest crowd sourced GWAS.

There is much much more to discuss (how to achieve private data analysis, how this would be marketed, how to convince people that this is fully private, the legal issues etc.) but I’ll stop here and ask for your first impressions and comments (constructive or destructive, both are welcome).


Possibly of interest:

Cao, Y., Li, L., Feng, Z. et al. Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations. Cell Discov 6, 11 (2020).

Final paragraph:

In summary, we systematically analyzed coding-region variants in ACE2 and the eQTL variants, which may affect the expression of ACE2 using the GTEx database to compare the genomic characteristics of ACE2 among different populations. Our findings indicated that no direct evidence was identified genetically supporting the existence of coronavirus S-protein binding-resistant ACE2 mutants in different populations (Fig. [1a]). The data of variant distribution and AFs [allele frequencies] may contribute to the further investigations of ACE2, including its roles in acute lung injury and lung function12. The East Asian populations have much higher AFs in the eQTL variants associated with higher ACE2 expression in tissues (Fig. [1c])), which may suggest different susceptibility or response to 2019-nCoV/SARS-CoV-2 from different populations under the similar conditions.

1 Like

Thanks for the citation! Here are some other relevant work for future reference; I’m skipping the dates as all are from 2020. The first three papers have pretty comprehensive lists of related pathways and genes, the others study effects of ACE2 expression levels but are limited in scope (they either have small sample sizes or use publicly available data sets, which might not be the best way to measure ACE2 expression - the Ziegler paper expands more on this):


FYI, a relevant international initiative has been started here:


Fantastic, thank you very much!

I posted in the Data Source thread but if you are not following you may be interested in our GWAS summary statistics site:

We are updating with new analyses run out of my group (or other groups), and improving annotations. Annovar, CADD, eQTLs and GWAS results (Ebi and GRASP) coming soon