A difficult task: Phylogenetic analysis of SARS-CoV-2 data

13. August 2020

Computer scientists from HITS and KIT review the challenges for phylogenetic methods in analyzing the evolution of the corona virus: Large number of sequences, low number of mutations.

Many studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny. In a preprint paper, researchers from the Computational Molecular Evolution group (CME) at the Heidelberg Institute for Theoretical Studies (HITS) and the Institute for Theoretical Informatics at the Karlsruhe Institute of Technology (KIT) have analyzed all available virus sequences available at the beginning of May. They found it difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. “The data contain an extremely weak signal”, says CME group leader Alexandros Stamatakis.”

They also found that it is not possible to reliably root the inferred phylogeny either using the bat and pangolin or by applying novel computational methods on the human virus phylogeny. Even an automatic classification of the current sequences was not possible, as the sequences are too closely related.

The researchers conclude that results of phylogenetic analysis on SARS-CoV-2 data should be considered and interpreted with extreme caution. “Do not draw conclusions from just a single tree”, says Alexandros Stamatakis.

The paper has been published in “Molecular Biology and Evolution” on 15 December 2020:

Phylogenetic analysis of SARS-CoV-2 data is difficult. Benoit Morel, Pierre Barbera, Lucas Czech, Ben Bettisworth, Lukas Hübner, Sarah Lutteropp, Dora Serdari, Evangelia-Georgia Kostaki, Ioannis Mamais, Alexey M Kozlov, Pavlos Pavlidis, Dimitrios Paraskevis, Alexandros Stamatakis: https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msaa314/6030946

About HITS

HITS, the Heidelberg Institute for Theoretical Studies, was established in 2010 by physicist and SAP co-founder Klaus Tschira (1940-2015) and the Klaus Tschira Foundation as a private, non-profit research institute. HITS conducts basic research in the natural, mathematical, and computer sciences. Major research directions include complex simulations across scales, making sense of data, and enabling science via computational research. Application areas range from molecular biology to astrophysics. An essential characteristic of the Institute is interdisciplinarity, implemented in numerous cross-group and cross-disciplinary projects. The base funding of HITS is provided by the Klaus Tschira Foundation.

Switch to the German homepage or stay on this page