PDP2019-05

Developing graph-based approaches to assemble -omics layers and model regulation of imprinted genes in Prader-Willi syndrome

Supervisors

Benoit Ballester / TAGC / benoit.ballester@inserm.fr

Francoise Muscatelli / INMED / francoise.muscatelli@inserm.fr

Abstract

The regulatory mechanisms governing gene activity are thought to apply similarly to the maternal and paternal alleles for a given autosomal gene. However in mammals, genomic imprinting silences one allele depending on its parental origin, with 1 to 2% of genes estimated to be imprinted and susceptible targets for numerous human pathologies.
Today, how imprinted chromosomal domains are organised and regulated, and whether their structural organisation is different on the maternal compared to the paternal genome remain poorly understood.
By using Prader-Willi syndrome (PWS), a classical pathology involving imprinted genes as a model system, our objective is to identify and understand the critical elements and their interactions involved in the regulation of all imprinted genomic regions.
In order to find those elements at the genomic level in a large genomic region or at the level of specific genes, we need to (i) integrate different layers of biological data at our disposal (genomic, transcriptomic, proteomic, epigenetic data) into graph representations and (ii) identify transversal graph similarities across layers, specific to those regions.
For this, we propose to apply community detection algorithms to the resulting graphs to find transversal similarities across layers. This project at the interface of biology and computer science investigates the structure/regulation/function of the imprinted genomic regions and proposes for the first time a novel graph-based method (i) to detect the critical elements governing a large genomic imprinted region, (ii) to identify potential critical regions involved in human pathologies and also (iii) to understand the evolution of genomic imprinting in mammals.

Keywords

Prader-Willi Syndrome, graph comparison, community search, multi-omics.

Objectives

The objectives are to develop computational methods to create graphs representing a genomic region to understand the critical elements involved in the genomic imprinting regulation. With this graph based representation and by applying transversal graph search, we aim to reveal specific information across layers of -omics data that will extend our knowledge of regulation specific imprinted regions of all imprinted regions.

Proposed approach

This project is structured around three steps : -omics data gathering, implementation of multi layered graph and community search across transversal layers of graphs. The project would start by:

i) Gathering and generating a maximum of high quality -omics- data in mouse such as genomic, transcriptomic, proteomic, spatial organization of chromatin and epigenetic information to feed our graphs.
ii) We plan on implementing a graph to represent all -omics layers for a given genomic region, such as the PWS region. We would link the different kinds of biological layers of data to their genomic region by building a star graph. The center of this graph would be the vertex representing the genomic region, while the other vertices would represent the biological data. The edges would be weighted in a way that remains to be determined. We plan to develop this method by using the PWS region as a model and all regions subject to the parental genomic footprint.
iii) Finally, we would build a larger graph by creating edges between the genomic region vertices (the center of the star graphs made at step ii). Those edges would be created using transversal similarities patterns between different star graphs or actual biological data (for example : Hi-C type contacts in two genomic regions would link two corresponding star graphs). In the resulting large graph, we would apply graph partitioning methods and transversal similarity detection algorithms to to highlight specificities or similarities of Prader-Willi syndrome regions and region subject to imprinting.

Interdisciplinarity

The integration of multi-omics data with genomics knowledge is a major challenge for predicting clinical outcome and understanding the genetics of pathologies. By using various -omics data in mouse and Prader-Willi syndrome as model, we propose to develop a new graph-based framework for integrating multi-omics data using genomic imprinting knowledge to elucidate interplay between different -omics levels. This research program is highly interdisciplinary, at the interface of computer science and biology. It involves joint efforts between a team at TAGC with a great expertise in large scale regulatory genomics analyses and a team at INMED specialized in genomic imprinted related neurodevelopmental disorders, in particular the PWS disease. Multi–omics data are accumulating for different pathologies; integrating these layers of information still remains a major challenge. This collaborative project proposes to develop an innovative method in the form of a graph-based approach to investigate the regulation of the imprinted genomic regions in order to reveal and extract relevant biological informations.

Expected profile

Candidate skills (essentials):

PhD in bioinformatics, or computer science.

Demonstrated experience in “Omics” data analysis,

Experience with graph models,

Ability to work in a multidisciplinary research environment,

Communication skills in English.

Desirable skills:

Knowledge about genomic organisation, gene regulation and epigenetic regulation.

Good understanding of the experiments used to generate raw biological data.

Application form