PDP2022-01

Combined deep learning and synthetic-based approaches to unravel the genetic determinants of enhancer versus promoter activity of Epromoters

Host laboratory and collaborators

 

Salvatore Spicuglia/ TAGC / salvatore.spicuglia@inserm.fr

Badih Ghattas / I2M / badih.ghattas@univ-amu.fr

Aitor González / TAGC / aitor.gonzalez@univ-amu.fr

Abstract

Regulation of gene transcription is accomplished by proximal (promoters) and distal (enhancers) regulatory elements. However, a strict dichotomy model is now challenged and a major question in the field is to define the genetic determinants of the different regulatory activities. The Spicuglia team has previously identified Epromoters as cis regulatory elements with both enhancer and promoter (E/P) activities and is currently using high-throughput approaches to evaluate both activities in thousands of wild-type and mutant DNA sequences. In this project, we will build a sequence-based deep learning model of Epromoters to unravel the genetic determinants of enhancer vs. promoter activities. The model will be challenged and refined in back and forth exchanges between model predictions, experimental validation and synthetic generation of Epromoters. S. Spicuglia will lead the overall project and supervise the experimental work to generate model input data and experimentally validate the predictions. B. Ghattas will supervise the design and validation of the deep learning models to predict E/P activities. A. González will supervise data processing, integration and analysis.

Keywords

Cis-regulatory elements, genetic variants, machine learning, synthetic biology

Objectives

1) To process NGS data from the dual reporter assay. 2) to create a deep learning model of DNA sequences to predict E/P activities. 3) To design de-novo synthetic regulatory sequences with defined E/P activities. 4) To evaluate experimentally E/P activities in synthetic sequences. 5) To analyse the model predictions to infer the logics of E/P activities at the DNA sequence level and assessment of natural genetic variants.

Expected profile

The postdoc candidate should have a PhD in bioinformatics or related fields, with a solid background in computer science, statistics and/or mathematics. The candidate should be interested in “omics” data analyses, genomics and gene regulation. Proven previous experience in manipulating NGS data and/or deep learning and in collaborations with experimental biologists is an advantage

Is this project the continuation of an existing project or an entirely new one?

This is a new CENTURI project. The TAGC has the necessary funding and human ressources to cover the experimental part of the project (ANR, H2020 ITN)

2 to 5 references related to the project

• Bernardo P. de Almeida, Franziska Reiter, Michaela Pagani, Alexander Stark. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of enhancers. doi:10.1101/2021.10.05.463203
• W Kopp, R Monti, A Tamburrini, U Ohler, A Akalin. Deep learning for genomics using Janggu. Nat Commun. 2020 Jul 13;11(1):3488. doi: 10.1038/s41467-020-17155-y.
• Andersson, R., Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet 21, 71–87 (2020). https://doi.org/10.1038/s41576-019-0173-8
• Medina A, Santiago D, Puthier D, Spicuglia S. (2018) Wide-spread enhancer activity from core promoters. TiBS. 43(6):452-468.
• Core, L., Martins, A., Danko, C. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46, 1311–1320 (2014). https://doi.org/10.1038/ng.3142

3 main publications from each PI over the last 5 years

Salvatore Spicuglia

• Santiago-Algarra D, Souaid C, Singh H, Dao LTM, Hussain S, Medina-Rivera A, Ramirez-Navarro L, Castro-Mondragon JA, Sadouni N, Charbonnier G, Spicuglia S. (2021) Epromoters function as a hub to recruit key transcription factors required for the inflammatory response. Nat Commun. Nov 18;12(1):6660. doi: 10.1038/s41467-021-26861-0.
• Belhocine M, Simonin M, Abad Flores JD, Cieslak A, Manosalva I, Pradel L, Smith C, Mathieu EL, Charbonnier G, Martens JHA, Stunnenberg HG, Maqbool MA, Mikulasova A, Russell LJ, Rico D, Puthier D, Ferrier P, Asnafi V, Spicuglia S. (2021). Dynamic of broad H3K4me3 domains uncover an epigenetic switch between cell identity and cancer-related genes. Genome Research. Jun 23. Advanced online doi:10.1101/gr.266924.120
• Dao LTM, Galindo-Albarrán AO, Castro-Mondragon JA, Andrieu-Soler C, Medina-Rivera A, Souaid C, Charbonnier G, Griffon A, Vanhille L, Stephen T, Alomairi J, Martin D, Torres M, Fernandez N, Soler E, van Helden J, Puthier D, Spicuglia S (2017). Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat Genet. 49(7):1073-1081. PMID: 28581502.

 

Badhi Ghattas

• Fournel J, Bartoli A, Bendahan D, Guye M, Bernard M, Rauseo E, Khanji MY, Petersen SE, Jacquier A, Ghattas B. Medical image segmentation automatic quality control: A multi-dimensional approach. Med Image Anal. 2021 Dec;74:102213. doi: 10.1016/j.media.2021.102213. Epub 2021 Aug 12.
• Bartoli A, Fournel J, Bentatou Z, Habib G, Lalande A, Bernard M, Boussel L, Pontana F, Dacher JN, Ghattas B, Jacquier A. Deep Learning-based Automated Segmentation of Left Ventricular Trabeculations and Myocardium on Cardiac MR Images: A Feasibility Study. Radiol Artif Intell. 2020 Nov 25;3(1):e200021. doi:10.1148/ryai.2020200021.
• Jaotombo F, Pauly V, Auquier P, Orleans V, Boucekine M, Fond G, Ghattas B, Boyer L. Machine-learning prediction of unplanned 30-day rehospitalization using the French hospital medico-administrative database. Medicine (Baltimore). 2020 Dec 4;99(49):e22361. doi: 10.1097/MD.0000000000022361.

 

Aitor González

• Anthony Baptista, Aitor Gonzalez, Anaïs Baudot. Universal Multilayer Network Exploration by Random Walk with Restart. 2021. arXiv:2107.04565v1
• Aitor Gonzalez*, Vincent Dubut, Emmanuel Corse, Reda Mekdad, Thomas Dechatre, et al.. VTAM: A robust pipeline for validating metabarcoding data using internal controls. 2021. Bioarxiv. doi:10.1101/2020.11.06.371187.
• Aitor Gonzalez*, Marie Artufel, Pascal Rihet. TAGOOS: genome-wide supervised learning of non-coding loci associated to complex phenotypes. Nucleic Acids Research, doi:10.1093/nar/gkz320.
*Corresponding author