#
Statistics and Data Science major

Master 1 courses

# 1st Semester

## Shared courses

**Organisation: **Lectures (12h)

**Lecturers: ** Florence Hubert, Laurence Röderbb

**Evaluation:** project and continuous monitoring

Following [PRO2] seminars, the students will attend all the Centuri seminars of this semester. For two of them, they will be asked to broaden their knowledges on the subject and present an oral and written synthese.

**Organisation: **Lectures (6h), TD (6h), TD (6h)

**Lecturers: ** Thomas Lecuit, Guillaume Voisine, Claudio Riviera

**Evaluation:** project and continuous monitoring

Choose between 3 courses:

- Developmental Biology
- Neurosciences
- Immunology

Developmental Biology:

Fundamentals of morphogenesis: molecular, cellular and biophysical basis of tissue forms in animals and plants. The mechanical and biochemical basis of morphogenesis will be addressed to understand the origin of cell and tissue organization. This course is an introduction to the modeling of the emergence of spatial and temporal patterns in morphogenesis and to the description and understanding of Turing instabilities.

Neurosciences:

This course covers basic principle in neuroscience. During the lectures special emphasis will be placed on the mechanism of synapse formation and plasticity, functional network maturation, pathophysiology of the brain and role of glia-neuron interaction in network dynamics. Also an introduction to computational methods for analysis and modelling of neurobiological data will be presented. During the TD the students will be challenged in the form of a project where acquired knowledge in mathematical and computational biology will be used to solve specific problems in neuroscience.

Immunology:

The course will focus on the different aspects of immunology as approached by physicists. In particular, the following will be studied: Discrimination of antigens by T cells, communication between cells with cytokines and finally a part on differentiation.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Florence Hubert, Laurence Röder

**Evaluation:** project

At the end of the courses [PRO1], and [BIO1], students will choose a scientific article at the interface of several disciplines on which they will work in groups. They will have to present in a memory and an oral presentation, to explain the biological context and the related basic concepts, to explain the methods used to interpret the biological data, to synthesize the results obtained in the article.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Pierre Pudlo

**Evaluation:** continuous exam and projects

This course is an introduction to inferential statistics. It will be illustrated with biogical examples. The course will be composed by three parts

- Multiple tests
- Classification
- Time series analysis

## Mathematics courses

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Guillaumette Chapuisat

**Evaluation:** final exam and projects

We will study optimization tools in finite and infinite dimension (optimal control theory) as well as their numerical implementation using Python. We will apply these different tools to a chemotherapy optimization problem for an *in vitro* model of heterogeneous tumor growth, i.e. we will look for the best way to administer a chemotherapy to optimize the effect on a cancer cell culture mixing sensitive and resistant cells to chemotherapy. We will base our work on experiments conducted at the Faculty of Pharmacy of Timone.

### Signal et big data, compressed sensing

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Sébastien Darses, Frédéric Richard

**Evaluation:** continuous exam

The aim of this course is to introduce and exploit dimension reduction techniques based on random projections, with a focus on large-scale (but not only) signal acquisition applications. After a few reminders / complements on the problems and constraints introduced by large data, and classical approaches to dimension reduction, the course will focus on random projection techniques. At the end, the student must master the notions of sampling (regular classical) signals, methods of identifying parsimonious representation from random projections and the corresponding theoretical guarantees (property null space, property restricted isometry). It will also need to know how to apply these notions in other contexts, for example clustering. The outline of the course is thefollowing:

- Difficulties due to high dimension
- Dimensional reduction by random projections
- Acquisition of signals
- Applications to other contexts (decomposition CUR, regression, nearest neighbors method ...)

### Big Data

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Pierre Pudlo

**Evaluation:** continuous exam

The purpose of this course is to present some fundamental algorithms in data processing linked to the so-called "big data", based on re-sampling or random permutations of the data. At the end of the course, the student must have understood and know how to implement the bootstrap procedure, including in complicated situations: regression, survival analysis, post-selection model inference. He will also learn how to implement a multiple test procedure, and how to use permutation methods in this setting. The outline of the course will thus be:

- Introduction
- Bootstrap
- Selection of variables / models
- Survival analysis
- Multiple tests

### Parsimonious representations of signals and images

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Frédérique Richard

**Evaluation:** continuous exam

A representation of a data vector is said to be parsimonious if it is possible to find a generating system or even a base in which the vector can be described or approximated by a linear combination of a small number of elements. In this course, we will begin by presenting the classical bases and transformations in which certain types of data are naturally parsimonious. We will show the utility of parsimonious decomposition in the case of classical problems in signal processing (denoising, compression ...). Finally, we will present and study several algorithms of such parsimonious decomposition.

The outline of the course is thus the following:

- Classic basics and landmarks for parsimonious representation of signals
- Applications to concrete problems: denoising, compression, parsimonious regression
- Algorithms for the parsimonious representation of signals

### Stochastic algorithms

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Christophe Gomez, Erwan Hillion

**Evaluation:** continuous exam

The purpose of this course is to give an introduction to the main stochastic algorithms. A comparison with deterministic method will be given. The following notions will be investigated:

- Simulation of random variables
- Variance reduction methods
- Stochastic algorithms with decreasing steps
- Markov Chain Monte Carlo