#
Statistics and Data Science major

Master 1 courses

# 1st Semester

## Shared courses

**Organisation: **TD (18h)

**Lecturers: ** Julien Lefèvre

**Evaluation:** projects and oral presentation

The teaching unit consists of a presentation of the main professions involved in biological modelling. It will be carried out in two ways. First of all, some seminars will be offered with speakers from outside Aix-Marseille University in the academic and industrial field. Secondly, students will benefit from an immersion in Centuri laboratories where they will discover multidisciplinary research topics. To conclude this unit, students will be asked to present a specific problem related to data processing and modelling. The scientific aspect will also have to be integrated into a reflection on the underlying professional issues, whether in the academic or private sector

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Sylvain Sené, Elisabeth Remy

**Evaluation:** projects and final written exam

This course is an introduction to the basics of finite dynamic systems and PLC networks (definitions of local functions, global function/relationship, automata, interaction graph, transition graph) as well as the main static and dynamic properties. A part of the lecture will also focus on the parallel update mode.

The lectures should provide students the skills to implement modelling approaches (differential, logical, stochastic or deterministic equations) to develop mathematical models of a biological system, analyze mathematical models and biological data to understand complex systems, evaluate the adequacy between a biological question, available data, and mathematical formalisms and interpret and validate a study.

**Organisation: **Lectures (18h)

**Lecturers: ** Thomas Lecuit, Jacques van Helden, Michael Kopp, Laurence Röder, François Muscatelli

**Evaluation:** final written exam

This course is divided in 2 parts taught during the 1^{st} and 2^{nd} semester of CMB. The first part of this module is a presentation of the evolutionary theories that have founded modern biology (from Lamarck to Darwin), and a synthesis of the discoveries that have led to current concepts of molecular and cellular biology: the role of macromolecules in cell function (information transfer between DNA, RNA, proteins, regulation, etc.), heredity and cellular adaptation.

Examples topics covered during the lectures:

- Information, evolution causes for living organisms
- Cellular information
- Epigenetics – phenomenons, information, adaptation, mechanisms

**Organisation: **Lectures (14h), TP (16h)

**Lecturers: ** Laurent Pézard

**Evaluation:** projects and final written exam

Computational biology will introduce the biological concepts necessary to model complex systems, implement modelling approaches (differential, logical, stochastic or deterministic equations) to develop mathematical models of a biological system, analyze mathematical models and biological data to understand complex systems and assess the adequacy between a biological question.

The course is divided in 2 sections:

- computational neuroscience: dynamic models of neuron function: dynamic behavioural simulation, biological aspects, computer complexity, analytical aspects
- bioinformatics: alignment, molecular phylogeny, prediction and modelling of structural aspects of proteins, cis-regulation

**Organisation: **Lectures (12h), TD (12h), TP (12h)

**Lecturers: ** Sylvain Sené

**Evaluation:** continuous exams

This course will teach students how to link biological problematics, available datas and mathematical formalisms by tackling the important notions and concepts used in algorithms and programmation. The course is divided in 3 sections:

- Unix: file system and basic shell commands, text utilities, redirections, pipe

- Programming language (Python): basic principles of imperative programming, control flow, basic data structures, local and external modules, functions

- Algorithms: arrays, sorting, lists, stacks, queues

## Mathematics courses

**Organisation: **Lectures (9h), TD (9h)

**Lecturers: ** Olivier Guès

**Evaluation:** continuous monitoring and final written exam

The purpose of this course is to deepen some notions of functional analysis that can be encountered in the study of mathematical models of biological problems. Cite for instance the notions of:

- Banach spaces.
*The notion of norm on a vector space. Convergent sequences and notion of Banach space. Fixed point Theorem.* - Examples in finite dimension.
*Completeness. Notion of compactness and the link with closed and bounded subsets. Illustrations in biology : research of optima. Behavior of recurrent sequences. Fixed points of discrete dynamical systems. Stability/instability of a fixed point.*

- Examples in infinite dimension. Examples of classical functional spaces
*. Spaces of continuous functions on a compact set, C(K,Rn), H1([0,1]), spaces of periodic functions. Linear continuous operators.*

*- Examples of applications of the fixed point theorem. Illustrations in Biology.*

*- Study of functional equations, integral equations. Examples from Biology.*

*- Study of density and approximation property (Dirichlet Theorem, Gibbs phenomenon, Theorems of Fejer and of Weierstrass.)*

**Organisation: **Lectures (9h), TD (9h)

**Lecturers: ** Christophe Gomez

**Evaluation:** continuous monitoring and final written exam

The purpose of this course is to revise and deepen the fundamental concepts in probability. More precisely, the following concepts will be discussed:

- Dependence, law, expectancy, conditional density

- Generating functions

- Markovian models

- Branching, Poisson, birth and death processes.

Lectures will provide a summary of basic concepts, which will be applied in practical courses.

**Organisation: **Lectures (12h), TD (12h), TP (12h)

**Lecturers: ** Florence Hubert

**Evaluation:** projects and final written exam

This course has 2 purposes. The first one is to teach linear algebra tools that can be usefull in the studies of biological systems. For instance the notions of:

- Iterative methods, rate of convergence of vectorial sequences
- Matrix reduction, Power method and Perron-Frobenius theorem
- Linear regression ; analysis of variance, Principal Component Analysis; Singular Values Decomposition

The second purpose of the course is to review and further study the notion and properties of differential equations and systems of differential equations which underlie the main continuous models used in biology (dynamics of populations or cells, biochemical processes, etc.). We will address both qualitative (existence, global existence, equilibria, stability of the equilibria,, long-time behavior) and quantitative (positivity, parameter dependency) properties of the considered models.

In parallel to this theoretical study, numerical approximation will be studied and implemented during the computer sessions. Practicals will consist in using Python specialised libraries as scipy.integrate in order to visualise trajectories and systems behaviours.

# 2nd Semester

## Shared courses

**Organisation: **Lectures (12h)

**Lecturers: ** Bianca Habermann, Laurent Tichit

**Evaluation:** project and continuous monitoring

Scientific seminars constitute a good way to broaden your scientific horizon. In this regard, MSc students will frequently attend CENTURI seminars. At the end of the semester, students will be asked to write a summary of two seminars they have attended.

The students will learn to work in an interdisciplinary group, to deepen a subject and to communicate on it.

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** François Muscatelli, Thomas Lecuit, Dominique Payet, Guillaume Voisinne, Valery Matarazzo, Julie Koenig

**Evaluation:** final written exam

The second part of this module will show how these molecular mechanisms underlie the development and functioning of tissues and organisms. It will be structured around four areas: intergenerational transmission of traits; organism development; immune system and nervous system.

Examples topics covered during the lectures:

- Information and organization: intergenerational transmission (cells, organisms Information, evolution causes for living organisms)
- Organisms’ development
- Information and organization of the immune system
- Information and organization of the nervous system

**Organisation: **Lectures (12h), TD (12h), TP (12h)

**Lecturers: ** Jacques van Helden, Victor Chepoi, Kolja Knauer

**Evaluation:** projects and final written exam

*This course is divided in 2 parts.*

**Statistics for biology**

Statistics for biology aims at providing students with a practical approach of the analysis of biological data with R, based on the concepts acquired in the course “Probabilities and statistics for modelling 1”. The associated mathematical foundations will be developed in the course “Advanced statistics”.

- Sampling and estimation (moments, robust estimators, confidence intervals)
- Fitting
- Quelques distributions additionnelles
- Hypothesis testing (mean comparison, goodness of fit, …)

**Graph theory and algorythms 1**

This introductory course focuses on graphs as mathematical objects and some of its uses to solve applications to biological networks. After intruducing different classes of graphs and their properties, the following points will be developped:

- Planar graphs, graphs on a surface, Euler characteristic

- Interval graphs, perfect graphs

**Organisation: **Project

**Evaluation:** project

At the end of the courses Professional perspectives for biological systems modelling, and Fundamentals of biology 1, students will choose a scientific article at the interface of several disciplines on which they will work in groups. They will have to present in a memory and an oral presentation, to explain the biological context and the related basic concepts, to explain the methods used to interpret the biological data, to synthesize the results obtained in the article. Through this lecture, students will learn to work in an interdisciplinary group, to deepen a subject and to communicate on it.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: Jean-Marc Freiermuth**

**Evaluation:** projects and final written exam

This course will tackle advanced notions in statistics such as: Content

- Statistical inference (fundamental concepts, estimators, intervals and tests, quadratic error, bias and variance)confidence

- Likelihood (Fisher information, likelihood ratio test)

- Exponential family

- Convergence

- Multivariate Gaussian distributions

The R software will be used in the practicals.

## Mathematics courses

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Julien Olivier

**Evaluation:** project and final written exam

**Prerequisites: **

- Continuous dynamical systems, linear algebra and modelling
- Functional analysis

This course is an introduction to Fourier and Hilbertian analysis with an applications in biology.

- Fourier series
- Reminder of the classical theorems Dirichlet, Fejer, Plancherel Parseval
- Regularity and decrease of Fourier coefficients
- Fast Fourier Transform
- Application to the representation of biological signals in neuroscience, for example

- Fourier Transform
- Reminders of the main results
- First examples of applications to the resolution of diffusion equations in the open domain of biology

- Hilbert spaces
- Hilbertian bases, examples
- Approximation of Galerkin
- Application to the resolution of a diffusion equation in the bounded domain intervening in biology

**OPTION 1: Markov Chains and martingales **

**Organisation: **Lectures (15h), TD (15h)

**Lecturers: ** Glenn Merlet

**Evaluation:** continuous monitoring and final written exam

**Prerequisites: **

- Probability

In this course, we introduce the notion of Markov chain, the classification of these states (recurrent, transient, positive, null, periodic), as well as its stationary law. We then introduce the notion of a martingale, submartingale and a supermartingale. The precise outline is the following

- Definition of a Markov chain
- Classification of the states of a Markov chain
- Stationary law and ergodicity of a Markov chain
- Definition of a martingale, a submartingale, and a supermartingale

The question of numerical approximation will also be developed through finite difference schemes around the concepts of convergence, order, stability and consistency by including TPs to illustrate both the qualitative properties of equations and the concepts related to numerical approximation.

**OPTION 2: Decision making statistics**

**Organisation: **Lectures (15h), TD (15h)

**Lecturers: ** Mohammed Bouhatar

**Evaluation:** continuous monitoring and final written exam

**Prerequisites: **

- Advanced statistics

In this course, we introduce some basic notions in probability and statistics linked to the statistical estimation. More precisely, we will study:

- Hypothesis tests
- Bayesian Inference
- Statistical decision
- Binary logistic regression

**OPTION 3: Chronological series**

**Organisation: **Lectures (10h), TD (6h), TP (8h)

**Lecturers: ** Mohammed Bouhatar

**Evaluation:** continuous monitoring and final written exam

This course consists in introducing some long-term forecasting methods based on the decomposition of a time series (seasonal averages method, moving averages method ...). We will then introduce short-term forecasting methods based on exponential smoothing and the ARMA models. The following notions will be investigated

- Decomposition into trend and seasonal component
- Exponential smoothing (single-double and Holt-Winters)
- Second-order stationary model (ARMA)

**OPTION 4: Discrete time signals, deterministic and probabilistic models**

**Organisation: **Lectures (15h), TP (15h)

**Lecturers: ** Sandrine Anthoine, Caroline Chaux

**Evaluation:** continuous monitoring and final written exam

This course is an introduction to the notion of digital signal and to the classical tools for their analyze (1D and 2D). The notions of Discrete Fourier transform, digital filtering and random signal analysis will be studied and illustrated on concrete examples in signal processing and images. The outline of the course is the following:

- Deterministic numerical signals in finite and infinite dimensions: series, filtering, convolution, discrete Fourier transformation, subsampling
- Hilbert bases in discrete signal spaces, discrete wavelets, approximation
- Random digital signals: generalities, stationarity, notion of spectrum, filtering; base of Karhunen-Loève
- Application to some signal processing problems (eg Wiener filtering, noise reduction, optimal detection)