#
Statistics and Data Science major

Master 1 courses

# 1st Semester

## Shared courses

**Organisation: **Lectures (12h)

**Lecturers: ** Bianca Habermann, Laurent Tichit

**Evaluation:** project and continuous monitoring

Scientific seminars constitute a good way to broaden your scientific horizon. In this regard, MSc students will frequently attend CENTURI seminars. At the end of the semester, students will be asked to write a summary of two seminars they have attended.

The students will learn to work in an interdisciplinary group, to deepen a subject and to communicate on it.

**Organisation: **Lectures (10h), TD (8h), TP (12h)

**Lecturers: ** Claudio Rivera Baeza

**Evaluation:** final exam, projects and oral presentation

This lecture is composed of studies on different articles, tutorial sessions to assimilate the concepts and practical work to implement the concepts on concrete cases.

The aim of this course is to present the analysis of interaction networks in biology. The courses/TD focus on the construction and interpretation of static networks "at large

scale", e.g. protein-protein interaction networks. And on dynamic networks, modelled by different mathematical formalisms, which make it possible to simulate, for example, genetic regulation.

**Organisation: **Lectures (6h), TD (6h), TD (6h)

**Evaluation:** project and continuous monitoring

Students are asked to pick 2 of the 3 following optional courses:

- Developmental Biology
- Immunology
- Neurobiology

**Organisation: **Project during the semester

**Evaluation:** project

Internship in a research laboratory practicing interdisciplinary studies.

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Pierre Pudlo

**Evaluation:** continuous exam and projects

This course focus on inferential statistics.

The lectures will propose a summary of the basic concepts, which will be applied in exercises (TD), practical work on computers (TP) and personal assignments (R programming project). Introduction to inferential statistics.

## Mathematics courses

**Organisation: **Lectures (6h), TD (6h), TP (6h)

**Lecturers: ** Florence Hubert

**Evaluation:** final exam and projects

The course focus on Optimization with or without constraint and examples of regression problems .

The lectures will propose a summary of the basic concepts, which will be applied in exercises (TD), practical work on computers (TP) and personal assignments (python programming project).

### Signal et big data, compressed sensing

**Organisation: **Lectures (17h), TD (12h)

**Lecturers: ** Sébastien Darses

**Evaluation:** continuous exam

The objective of this EU is to introduce and exploit dimensional reduction techniques based on random projections, with a focus on applications in (but not limited to) large-scale signal acquisition. After some reminders/additions on the problems and constraints introduced by large data, and the classical approaches to size-reduction, the course will focus on random projection techniques. At the end of the EU, the student should master the notions of (classical regular) signal sampling, methods for identifying parsimonious representations from random projections and the corresponding theoretical guarantees (null space property, restricted isometry property). He will also have to know how to apply these notions in other contexts, for example clustering.

### Latent variable models

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Jean-Marc Freyermuth

**Evaluation:** continuous exam

In this course we will deal with some latent variable models with a Bayesian approach. We will first describe the Bayesian paradigm and discuss some "standard" models. We will then look at mixture models before going into more detail on hierarchical models which are a good illustration of the success of Bayesian methods in biostatistics.

### Big Data

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Pierre Pudlo

**Evaluation:** continuous exam

The objective is to present some fundamental algorithms of the era of "big data" computer processing, by re-sampling or random permutations of data. At the end of the course, the student should have understood and be able to implement the bootstrap procedure, even in complicated situations: regression, survival analysis, post-model selection inference. He will also learn to implement a multiple test procedure, and to use permutation methods in this context.

### Parsimonious representations of signals and images

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Clothilde Mélot

**Evaluation:** continuous exam

A parsimonious representation of a data vector is said to exist if it is possible to find a generating system or even a base in which the vector can be described or approximated by a linear combination of a small number of vectors of the system. In this course we will start by presenting the classical bases and transforms in which certain types of data are naturally parsimonious. In the case of classical problems in signal processing (denoising, compression...) we will show the interest of using such decompositions. Finally, we will present several algorithms that allow to compute such decompositions and we will study their mathematical properties.

### Stochastic algorithms

**Organisation: **Lectures (12h), TD (12h)

**Lecturers: ** Fabienne Castell

**Evaluation:** continuous exam

At the end of this course, students will know how to program certain algorithms involving randomness. They will be able to identify the situations in which to use them, as well as their advantages and disadvantages compared to deterministic methods.