Artificial Intelligence for High throughput biomedical investigations

Description


The production of biological data has achieved unprecedented levels. On a single specimen, uncountable variables are generated by a single “omic” experiment, while several of them are already used in medical researches: genomics (DNA- or whole-genome sequencing), epigenomics (methylated-DNA-, RNA-, Chromatin-IP-, ATAC-sequencing, etc.), metabolomics, lipidomics, proteomics and others. Beside these multiple “omics”, in vivo imaging technologies and high-throughput environmental exposure data are also available. Relating high throughput data to health status unleash new opportunities, to understand molecular mechanisms of diseases or to provide biomarkers for disease prediction, management and prognosis. However, general-purpose AI methodologies need to be adapted to face the numerous machine learning issues raised when investigating relationships between biological features and phenotypes: small sample size (w. r. t. number of features), confounding variables, very low signal /noise ratio, and need for interpretability (confusion between causation and correlation). In this research chair, we ambition to investigate methodological issues, which are specific to omics data, in order to implement efficient machine learning solutions in biomedical research programs.

Activities

 

First, we seek to develop new methods to relate high-dimensional genomic features to high dimensional phenotypic obtained with Magnetic Resonance Imaging techniques (MRI). Machine learning techniques will be based on multi-constrained models that include spatial coherence for MRI and sparseness constraint for genotypes. The chair will leverage data from the national pilot program of Plan France Médecine Génomique 2025 on 1500 individuals (multimodal health data and Electronic Health Record - EHR) with intellectual disability. In this program, referred to as DEFIDIAG, (1) AVIESAN will provide whole-genome sequencing data in 2020. (2) A network of clinical geneticists from the French healthcare pathway for rare disorders with multiple congenital anomalies and intellectual disability (www.anddi-rares.org) will expertise the 1500 patients. (3) Raw brain MRI will be gathered in a national repository. This unique dataset will make it possible to learn accurate prioritization of genomic variants through the analysis of phenotypic data from EHR and from raw brain MRI signals.

Second, the chair will leverage the unique Grenoble position in the field of mass spectrometry (MS) based omics (proteomics, metabolomics, lipidomics): The Proteomics French national Infrastructure (www.profi-proteomics.fr), as well as its computational work-package are both coordinated by the Grenoble partner lab. In addition, a metabolomics/lipidomics platform is emerging as a joint effort of IAB lab, TIMC lab and CHUGA. MS-based sequencing is recent (compared to other omics) and its data processing is still hindered by open questions (non-exhaustive coverage, instrumental sources of data corruption, multiplexed acquisitions, occurrence and nature of missing values, etc.). Among these questions, we will investigate data disentangling methods to improve the capability of multiplexed MS acquisitions, leading to more accurate quantifications and deeper coverage.

Finally, outreach and education actions will be conducted: our industrial partners will foster the transfer of machine learning tools into routine medical practice (Genetic department of CHUGA). Software tools and data science know-how transfer will be organized, to bring most recently emerging omics technology to state-of-the-art data processing practices. Finally, a national educational program in medical data science for future physicians is already under construction.

Chair events

3ième Journée Intégrative de Protéomique et Métabolomique, Lyon, 2020.
• Conférence Midi-Minatec (pending to unlock-down)

Scientific publications

Y. Couté, C. Bruley, T. Burger. "Beyond target-decoy competition: stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics". Analytical Chemistry, accepted manuscript, 2020.