RP19 - Uncertainty aware haplotype based genomic variant effect prediction

We have developed a new statistical approach for the determination of HLA haplotypes while quantifying the involved uncertainty and, in particular, while considering the heterogeneity of tumor samples. In the second cohort, we consider haplotypes again, but this time at the level of their impact on the proteins derived from genes.

The prediction of effects of genomic variants is a crucial step in the analysis of genomic data, in particular in precision oncology and precision medicine in general. So far, this has been done by considering each individual variant alone by predicting its effect on a regulatory element or a protein (e.g. causing the early termination of the protein or, less severe, a changed amino acid), estimating the severity of this effect, and annotating public knowledge about the variant (like allele frequencies, pathogenicity, and disease associations). Various tools that accomplish this task have been developed, for example VEP [1] and SNPeff [2].

However, the consideration of each variant alone can lead to incomplete or in the extreme case even wrong information. For example, on the one hand, a deletion at the beginning of a protein that shifts the reading frame invalidates any downstream predictions of amino acid changes. On the other hand, if this first deletion is followed by an insertion that restores the reading frame, the impact on the protein might be less severe, or at least completely different. Another example is the collaboration of multiple amino acid changing variants that only together generate an effect severe enough to impact a certain binding domain of a protein. Finally, a severe variant inside a promoter or enhancer does only play a role for the targeted genes on the same haplotype or subclone, which needs to be taken into account when interpreting the interplay with potential additional variants in the targeted gene.

To address these challenges, we will develop a completely new approach for variant effect prediction. Instead of considering each variant individually, we will develop a graph structure (impact graph) that represents haplotypes as paths, while representing individual variants as nodes. This way, the impact of a variant can be annotated and interpreted in the context of the surrounding haplotype. Uncertainty in haplotype assignment is captured by alternative paths. Moreover, we will leverage the Bayesian model of Varlociraptor [2] to annotate both edges and nodes with posterior probabilities for haplotype and variant calls. We will further develop an interactive visual representation of the impact graph that allows us to comprehensively assess the impact, any kind of common annotations and the uncertainty of the presented information. The vizualisation will be implemented as a Javascript library. We plan to integrate this visualization into Snakemake [3] data analysis reports as well as SHIP and the patient dashboard, such that it can potentially be directly used in both research and at the point of care. For the latter case, we aim to use it to support decision making within molecular tumor boards. An evaluation of that will happen within an already existing research project of the West German Genome Center (WTZ) where we compare the results of commercial and certified genomic variant prediction software with an open source data analysis pipeline based on our previously published technologies Varlociraptor [4] and Snakemake [3].

In total, we expect our new approach to yield a paradigm shift in variant interpretation that enables faster and more informed decisions in precision medicine.

[1] William McLaren et al. “The Ensembl Variant Effect Predictor”. In: Genome Biology 17.1 (June 6, 2016), p. 122. doi: 10.1186/s13059-016-0974-4.

[2] Pablo Cingolani et al. “A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff”. In: Fly 6.2 (Apr. 1, 2012), pp. 80–92. doi: 10.4161/fly.19695.

[3] Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021). doi: 10.12688/f1000research.29032.1

[4] Johannes Köster et al. “Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery”. In: Genome Biology 21.1 (Apr. 28, 2020), p. 98. doi: 10.1186/s13059-020-01993-6.