#
Statistics and Data Science major

Master 1 courses

# 1st Semester

## Shared courses

**Organisation: **TD (18h)

**Lecturers: ** Julien Lefèvre

**Evaluation:** projects and oral presentation

The goal of this course is to introduce students to multidisciplinary research topics through seminars and visits to research laboratories and private sector companies.

The teaching unit consists of a presentation of the main professions involved in biological modelling. It will be carried out in two ways. First of all, some seminars will be offered with speakers from outside Aix-Marseille University in the academic and industrial field. Secondly, students will benefit from an immersion in Centuri laboratories where they will discover multidisciplinary research topics. To conclude this unit, students will be asked to present a specific problem related to data processing and modelling. The scientific aspect will also have to be integrated into a reflection on the underlying professional issues, whether in the academic or private sector.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Brigitte Mossé, Elisabeth Remy

**Evaluation:** projects and final written exam

This UE presents the finite Boolean dynamical systems, that are mathematical tools more and more used in the field of the modelling of biological regulatory networks.

Prerequisites in logics, set theory, graph theory and Boolean finite dynamical systems are provided, as well as definitions of the different possible associated dynamics, the corresponding regulatory graphs and logical formula. The rôle of feedback circuits in the dynamics is specially emphasised.

Applications of these tools for the modelling of biological networks are presented, mainly in context of diseases. The practical lessons (TP) are done using GINsim, a free software dedicated to the logical modelling of regulatory networks.

**Organisation: **Lectures (18h)

**Lecturers:** Laurence Röder, François Muscatelli

**Evaluation:** final written exam

This unit is the first part of the Fundamentals of biology course. In this first part, we will give a general presentation of the role of macromolecules in cell function, particularly in the regulation of gene expression, epigenetic inheritance and speciation as a source of biological diversity during evolution.

**Organisation: **Lectures (14h), TP (16h)

**Lecturers: ** Laurent Pézard, Michael Kopp, Andreas Zanzoni

**Evaluation:** projects and final written exam

Computational biology introduce the biological concepts needed to model complex systems, implement modelling approaches (differential, logical, stochastic or deterministic equations) to develop mathematical models of a biological system, analyze mathematical models and biological data to understand complex systems and assess the adequacy between a biological question.

The course is divided in 3 sections:

- Computational neuroscience: dynamic models of neuron function: dynamic behavioural simulation, biological aspects, computer complexity, analytical aspects
- Bioinformatics: alignment, molecular phylogeny, prediction and modelling of structural aspects of proteins, cis-regulation
- Evolution and population dynamics.

**Organisation: **Lectures (12h), TD (12h), TP (12h)

**Lecturers: ** Julien Lefèvre

**Evaluation:** continuous exams

This course recalls the general basics of programming, implemented here with Python.

It also provides the essential elements for a modern practice of programming in the Python language: use of an IDE, version control, use of existing modules, good practice for coding. Part of this course is also devoted to some classic algorithms for sorting and manipulating current data structures as well as algorithms dedicated to bioinformatics.

## Mathematics courses

**Organisation: **Lectures (9h), TD (9h)

**Lecturers: ** Olivier Guès

**Evaluation:** continuous monitoring and final written exam

The purpose of this course is to deepen some notions of functional analysis that can be encountered in the study of mathematical models of biological problems. For instance:

- Banach spaces. The notion of norm on a vector space. Convergent sequences and notion of Banach space. Fixed point Theorem.
- Examples in finite dimension. Completeness. Notion of compactness and the link with closed and bounded subsets. Illustrations in biology : problems of optima. Behavior of recursive sequences. Fixed points of discrete dynamical systems. Stability/instability of a fixed point.
- Examples in infinite dimension. Examples of classical functional spaces. Spaces of continuous functions on a compact set, C(K,R
^{n}), H1([0,1]), spaces of periodic functions. Linear continuous operators.- Examples of applications of the fixed point theorem. Illustrations in Biology.
- Study of functional equations, integral equations. Examples from Biology.
- Study of density and approximation property (Dirichlet Theorem, Gibbs phenomenon, Theorems of Fejer and of Weierstrass.).

**Organisation: **Lectures (9h), TD (9h)

**Lecturers: ** Christophe Gomez

**Evaluation:** continuous monitoring and final written exam

The purpose of this course is to revise and deepen the fundamental concepts in probability. More precisely, the following concepts will be discussed:

- Dependence, law, expectancy, conditional density
- Generating functions
- Markovian models
- Branching, Poisson, birth and death processes.

**Organisation: **Lectures (12h), TD (12h), TP (12h)

**Lecturers: ** Florence Hubert

**Evaluation:** projects and final written exam

The purpose of this course is twofold. The first part of the course will concern linear algebra tools that can be useful in the studies of biological systems. Cite for instance the notions of

- Interative methods, rate of convergence of vectorial sequences
- Matrix reduction, Power method and Perron-Frobenius theorem
- Linear regression; analysis of variance, Principal Component Analysis; Singular Values Decomposition

The second part of the course will be devoted to revision and deepening of the notion and properties of differential equations and systems of differential equations which underlie the main continuous models used in biology (dynamics of populations or cells, biochemical processes, etc.). We will address both qualitative (existence, global existence, equilibria, stability of the equilibria, long-time behavior) and quantitative (positivity, parameter dependency) properties of the considered models.

In parallel to this theoretical study, numerical approximation will be studied and implemented during the computer sessions. Practicals will consist in using Python specialised libraries as scipy.integrate in order to visualise trajectories and systems behaviours.

# 2nd Semester

## Shared courses

**Organisation: **Lectures (12h)

**Lecturers: ** Bianca Habermann, Elisabeth Remy, Albert Michelot

**Evaluation:** project and continuous monitoring

Scientific seminars constitute a good way to broaden your scientific horizon. In this regard, MSc students will frequently attend CENTURI seminars. At the end of the semester, students will be asked to write a summary of two seminars they have attended.

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** François Muscatelli, Laurence Röder

**Evaluation:** final written exam

This course is a continuation of Fundamentals of biology 1. It will show how molecular mechanisms provide the information required for the development and function of tissues and organisms. Instruction is structured around four areas:

- Organismal development;
- Immune system;
- Nervous system
- Intergenerational transmission of traits

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Annie Broglio

**Evaluation:** projects and final written exam

This course aims at providing students with a practical approach of the analysis of biological data with R, based on the concepts acquired in the course “Probabilities and statistics for modelling 1”. The associated mathematical foundations will be developed in the course “Advanced statistics”. The following notions will be investigated:

- Sampling and estimation (moments, robust estimators, confidence intervals)
- Fitting
- Additional distributions
- Hypothesis testing (mean comparison, goodness of fit, …)

The course will be based on the analysis of biological datasets with the R programming language.

**Lecturers: **Florence Hubert, Laurence Röder

**Evaluation:** project

Following the module [PROJ1], the students will do a short internship in laboratory. They will have to propose a modelling or data processing problem at the math-info-bio interface. They will be asked to synthesize their results in a dissertation and an oral presentation.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: Jean-Marc Freiermuth**

**Evaluation:** projects and final written exam

This course will tackle advanced notions in statistics such as:

- Statistical inference (fundamental concepts, estimators, intervals and tests, quadratic error, bias and variance)confidence
- Likelihood (Fisher information, likelihood ratio test)
- Exponential family
- Convergence
- Multivariate Gaussian distributions

Prerequisite for CMB-B:

- [STAT1] Probabilities and statistics for modelling 1
- [STAT2] Statistics for biology

The R software will be used in the practicals.

**Organisation: **Lectures (9h), TD (9h)

**Lecturers: Victor Chepoy**

**Evaluation:** to be announced

This specialization course helps the student to deepen his knowledge on problems on graphs and their solutions: Counting and enumeration: number of covering trees, number of couplings of a planar graph; Flow algorithms; Problems of connection, coupling, assignment, transport

## Mathematics courses

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Julien Olivier

**Evaluation:** project and final written exam

**Prerequisites: **

- Continuous dynamical systems, linear algebra and modelling
- Functional analysis

This course is an introduction to Fourier and Hilbertian analysis with an applications in biology.

- Fourier series
- Reminder of the classical theorems Dirichlet, Fejer, Plancherel Parseval
- Regularity and decrease of Fourier coefficients
- Fast Fourier Transform
- Application to the representation of biological signals in neuroscience, for example

- Fourier Transform
- Reminders of the main results
- First examples of applications to the resolution of diffusion equations in the open domain of biology

- Hilbert spaces
- Hilbertian bases, examples
- Approximation of Galerkin
- Application to the resolution of a diffusion equation in the bounded domain intervening in biology

**OPTION 1: Markov Chains and martingales **

**Organisation: **Lectures (15h), TD (15h)

**Lecturers: ** Glenn Merlet

**Evaluation:** continuous monitoring and final written exam

**Prerequisites: **

- Probability

In this course, we introduce the notion of Markov chain, the classification of these states (recurrent, transient, positive, null, periodic), as well as its stationary law. We then introduce the notion of a martingale, submartingale and a supermartingale. The precise outline is the following

- Definition of a Markov chain
- Classification of the states of a Markov chain
- Stationary law and ergodicity of a Markov chain
- Definition of a martingale, a submartingale, and a supermartingale

The question of numerical approximation will also be developed through finite difference schemes around the concepts of convergence, order, stability and consistency by including TPs to illustrate both the qualitative properties of equations and the concepts related to numerical approximation.

**OPTION 2: Decision making statistics**

**Organisation: **Lectures (15h), TD (15h)

**Lecturers: ** Mohammed Bouhatar

**Evaluation:** continuous monitoring and final written exam

**Prerequisites: **

- Advanced statistics

In this course, we introduce some basic notions in probability and statistics linked to the statistical estimation. More precisely, we will study:

- Hypothesis tests
- Bayesian Inference
- Statistical decision
- Binary logistic regression

**OPTION 3: Chronological series**

**Organisation: **Lectures (10h), TD (6h), TP (8h)

**Lecturers: ** Mohammed Bouhatar

**Evaluation:** continuous monitoring and final written exam

This course consists in introducing some long-term forecasting methods based on the decomposition of a time series (seasonal averages method, moving averages method ...). We will then introduce short-term forecasting methods based on exponential smoothing and the ARMA models. The following notions will be investigated

- Decomposition into trend and seasonal component
- Exponential smoothing (single-double and Holt-Winters)
- Second-order stationary model (ARMA)

**OPTION 4: Discrete time signals, deterministic and probabilistic models**

**Organisation: **Lectures (15h), TP (15h)

**Lecturers: ** Sandrine Anthoine, Caroline Chaux

**Evaluation:** continuous monitoring and final written exam

This course is an introduction to the notion of digital signal and to the classical tools for their analyze (1D and 2D). The notions of Discrete Fourier transform, digital filtering and random signal analysis will be studied and illustrated on concrete examples in signal processing and images. The outline of the course is the following:

- Deterministic numerical signals in finite and infinite dimensions: series, filtering, convolution, discrete Fourier transformation, subsampling
- Hilbert bases in discrete signal spaces, discrete wavelets, approximation
- Random digital signals: generalities, stationarity, notion of spectrum, filtering; base of Karhunen-Loève
- Application to some signal processing problems (eg Wiener filtering, noise reduction, optimal detection)