A difficult task: Phylogenetic analysis of SARS-CoV-2 data

13. August 2020

Computer scientists from HITS and KIT review the challenges for phylogenetic methods in analyzing the evolution of the corona virus: Large number of sequences, low number of mutations.

Many studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny. In a preprint paper, researchers from the Computational Molecular Evolution group (CME) at the Heidelberg Institute for Theoretical Studies (HITS) and the Institute for Theoretical Informatics at the Karlsruhe Institute of Technology (KIT) have analyzed all available virus sequences available at the beginning of May. They found it difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. “The data contain an extremely weak signal”, says CME group leader Alexandros Stamatakis.”

They also found that it is not possible to reliably root the inferred phylogeny either using the bat and pangolin or by applying novel computational methods on the human virus phylogeny. Even an automatic classification of the current sequences was not possible, as the sequences are too closely related.

The researchers conclude that results of phylogenetic analysis on SARS-CoV-2 data should be considered and interpreted with extreme caution. “Do not draw conclusions from just a single tree”, says Alexandros Stamatakis.

The paper is available on the bioRxiv preprint server:

Phylogenetic analysis of SARS-CoV-2 data is difficult. Benoit Morel, Pierre Barbera, Lucas Czech, Ben Bettisworth, Lukas Hübner, Sarah Lutteropp, Dora Serdari, Evangelia-Georgia Kostaki, Ioannis Mamais, Alexey M Kozlov, Pavlos Pavlidis, Dimitrios Paraskevis, Alexandros Stamatakis: https://www.biorxiv.org/content/10.1101/2020.08.05.239046v1

About HITS

The Heidelberg Institute for Theoretical Studies (HITS) was established in 2010 by the physicist and SAP co-founder Klaus Tschira (1940-2015) and the Klaus Tschira Foundation as a private, non-profit research institute. HITS conducts basic research in the natural sciences, mathematics and computer science, with a focus on the processing, structuring, and analyzing of large amounts of complex data and the development of computational methods and software. The research fields range from molecular biology to astrophysics. The shareholders of HITS are the HITS-Stiftung, which is a subsidiary of the Klaus Tschira Foundation, Heidelberg University and the Karlsruhe Institute of Technology (KIT). HITS also cooperates with other universities and research institutes and with industrial partners. The base funding of HITS is provided by the HITS Stiftung with funds received from the Klaus Tschira Foundation. The primary external funding agencies are the Federal Ministry of Education and Research (BMBF), the German Research Foundation (DFG), and the European Union.

Switch to the German homepage or stay on this page