PDP2018-04

Global and local genome sequence analysis to understand fungal virulence

Host laboratory and collaborators

Abstract

We study the interaction between C. elegans and its natural pathogen Drechmeria coniospora, using a Swedish isolate of the fungus. After 20 years of serial passage in C. elegans, the strain has evolved to become more virulent. A separate Danish strain is even more virulent. A superficial comparison of their genomes indicates high local sequence similarity, but surprisingly extensive genome rearrangement (Lebrigand et al. PLoS Genetics, 2016). We plan to use ONT nanopore sequencing to establish the extent of this unexpected genome plasticity across several Drechmeria strains and its link with virulence. ONT sequencing suffers from miscalling of homopolymer sequences, rendering local comparison problematic. New algorithms are needed to deal with this problem, to allow ONT-only comparative genomics.

Keywords

Comparative genomics, virulence, nanopore, C. elegans, evolution, base-calling, algorithms

Objectives

- Conduct comparative global sequence analysis of different Drechmeria coniospora isolates

- Improve base-calling algorithms to overcome systematic errors in homopolymeric sequences

- Conduct comparative sequence analysis of the different isolates, at the local scale to identify (i) gene loss and gain, (ii) functionally-relevant polymorphisms between strains (iii) candidate virulence genes and mechanisms.

Proposed approach (experimental / theoretical / computational)

Genomic DNA prepared by the Ewbank lab will be sequenced at the EMBL Genecore facility. For global genome analyses, sequences assembled at EMBL using Canu will be compared by the post-doc using state-of-the art methods in collaboration with Guillaume Blanc. As even the best basecallers still have systematic error (https://zenodo.org/record/1188469) in parallel, in collaboration with Laurent Tichit, the post-doc will develop new methods to improve accuracy. This will be based on anti-consensus, or outlier approaches, seeking raw read support for statistically probable homopolymer sequences. Gene sets will be predicted on the basis of these fully polished genomes and evaluated using BUSCO. Comparative in silico analyses will reveal gene losses and gains. Candidate virulence mechanisms will then be experimentally validated in vivo using CRSIPR-based genome engineering by researchers in the Ewbank lab.

Interdisciplinarity 

The Ewbank lab is a classic “wet” lab, applying genetics, biochemistry, cell biology and functional genomics to the study of host-pathogen interactions. This project requires the recruitment of a bioinformatician able to develop novel methods to exploit long-read sequence data. The project will involve a close collaboration with a major European sequencing core facility and the lab of Guillaume Blanc, expert in genome evolution. Supervision of the programming part of the project will be by Laurent Tichit, (Mathématiques et Algorithmique pour la Biologie des Systèmes), who has extensive experience in the optimization of pattern matching algorithms.

X