Research axis

Genomic data analyses

Sequence variants

Currently, the genetic component of Rheumatoid Arthritis (RA) is not entirely described despite the identification of the HLA-DRB1 gene (and specifically the Shared Epitope (SE) alleles) and more than 100 susceptible genes.

One possible explanation of this missing heritability is that rare or low-frequency variants may also contribute to the underlying genetic risk. Advances in sequencing technologies (Next Generation Sequencing - NGS) now allow testing the hypothesis “rare variant-common disease”. If the identification of rare variants in cases-controls studies is challenging, it can be improve with familial data where an aggregation of theses variants can be observed.

NGS data can also be exploited to explore other causes of “missing heritability” such as Copy Number Variation (CNV) and Gene-Gene (GxG) interaction in a particular biological pathway.

In order to investigate these different causes of missing heritability, we benefit of familial samples collected through the ECRAF consortium (European Consortium for Rheumatoid Arthritis Families). All our sequencing data have been/are produced by the Centre National de Recherche en Génomique Humaine (CNRGH/ Institut François Jacob, CEA, Evry) with an Illumina platform.

Characterization of new rare variants (SNV & indels)

This project has been initiated with the search for rare variants (Single Nucleotide Variants (SNV) or indels) from whole exome data in French RA-multiplex families where HLA-DRB1 SE alleles segregate (HLA-DRB1+) since clinical heterogeneity has been observed for HLA-DRB1+ cases. The calling and annotation of variants led to categorise them according to their MAF (Minor Allele Frequency), their predicted effect on proteins (cADD score), and penetrance and phenocopy values.

Variants with a MAF <1%, a cADD score ≥ 30, a full penetrance and no phenocopy were first analysed with the p-VAAST software that allows to compute a global score based on a linkage score in our families and an association score by comparing the genetic information of affected subjects of our families to external controls (chosen with European ancestor origin and sequenced with a similar Illumina platform). After validation, a nonsense variant, introducing a premature stop codon at the beginning of the SUPT20H gene, has been identified. This gene is involved in the regulation of macroautophagy, which plays a key role in the pathogenesis of RA (DOI : 10.1371/journal.pone.0213387).

In addition, the analysis of whole genome sequencing data in French RA-multiplex families where no HLA-DRB1 SE alleles segregate (HLA-DRB1-) is ongoing to identify potential causal variants that could be specific of non HLA-DRB1 genetic component but also to determine if there is a similar part of the genetic component.

Characterization of CNV

The study of CNV (Copy Number Variation) has already be initiated in the lab in order to evaluate the association of CNV candidate gene in our familial sample (see Copy Number Variants part). However, all the produced NGS data will be also exploited in order to identify new CNV in RA. Before to adapt the bio-informatics pipelines developed for SNVs and indels, simulation studies are ongoing in order to choose the better algorithm to determine CNV from whole exome data in our HLA-DRB1+ samples. This will be pursued for HLA-DRB1- samples. If the advantage of whole genome sequencing is that it can capture the CNV breakpoints outside the exons, the reliability of different tools will also be evaluated.

The new characterized CNVs will be validated in additional sample and compared to an updated human CNVs map since its last version dates from 2015. This map has been constructed in order to compile the CNVs detected on healthy individuals from various origins, and harvested on the Database of Genomic Variants (DGV). An in-house pipeline has been developed to update the 2015 CNV map and to estimate the CNV event frequencies (gain, loss, gain+loss). Our results will also be technically validated by digital PCR methodology developed in the lab (see Copy Number Variants part).

Pathways and interaction GxG

For genes carrying variants with incomplete penetrance identified by the preceding analyses of NGS data, over-representation (ORA) or gene enrichment (GSEA) analyses has been performed in order to identify biological pathways. If the GSEA approach has been developed for expression data, some adaptations have been made for SNP data.

In the characterized biological pathways, interactions GxG is evaluated with multifactor dimensionality reduction approaches adapted for logistic regression. The biological impact of our results could then be evaluated through an interactive molecular map for RA that is currently developed in the lab through a system biology approach (see computational systems biology part). This map that consists in the construction of all pathways implicated in the disease, will allow to place genes in their functional context and to test their effects on the disease development.

The identification of new susceptible loci through NGS data would allow a better understanding of rheumatoid arthritis mechanisms, and would help to characterize new therapeutic targets or biomarkers useful for early diagnosis.

Transcriptome/expression profiling

Differentially expressed genes in RA

From 2007, we have started the characterization of RA biomarkers with transcriptome analysis. We have benefited of RA cohort collected by Dr. P. Hilliquin and prepared for storage and sending by Dr Quillet and Dr Lemaire (GenHotel –Rheumatology & Biology Departments, CHSF Corbeil-Essonne). We have then extracted total RNA from blood samples of RA patients (cases). Through another in-house project, we obtained samples for healthy people (controls) (Dr. L Jacq, GenHotel – Cardiology Department, CHSF Corbeil-Essonne).

Our first goal was to measure expression of RA candidate genes by quantitative PCR (qPCR) and to study relation between level of expression and polymorphism of the candidate gene. We have then studied PRKCH gene (Teixeira et al., 2008a) and CASP7 gene, coding Caspase 7. Caspases are proteases involved in apoptosis mechanisms, which could be deregulated and contributed then to synovioycytes proliferation or to osteoblasts destruction, two physiopathological characteristics in RA. Measurement of expression level for alpha and beta isoforms of Caspase 7 show a significant decrease in cases in comparison with levels in controls, this one being more significant for alpha isoform. Alpha/beta ratio of expression level is then significantly decreased in cases in comparison with controls, suggesting a lower apoptotic activity related to alpha active form of caspase in RA (Teixeira et al., 2008b).

To extensively benefit of collected samples, we have then developed a collaboration with Dr. R. Olaso (Plateforme de Transcriptome, Centre National de Génotypage, Institut de Génomique, CEA, Evry). We performed whole transcriptome analysis using Illumina microarray technology on peripheral blood mononuclear cells (PBMCs) from RA cases and controls. We identified a remarkably elevated expression of a spectrum of genes involved in Immunity and Defence in PBMCs of RA cases compared to controls. This result is confirmed by GO analysis, suggesting that these genes could be activated systemically in RA (see figure; (Teixeira et al., 2009); PhD : https://www.theses.fr/147412374).

Among the genes showing the highest expression level change in this study, we have then carried out analysis of those located in a genomic region without copy number variation (CNV) known. Our goal was to analyze relation between expression and SNP and then identify expression Quantitative Trait Loci (eQTL) specific of RA. We have selected genes for which tagSNPs were suitable for our study. We then identify a suggested association of a SNP located in PGLYRP1 with RA but this preliminary result was not replicated in an extended sample of families (Fodil et al., 2015, co-supervising of PhD with Pr A Boudjema, USTO, Oran, Algeria, http://www.theses.fr/2015EVRY0017). Such study, focused on the relation between polymorphisms and expression level in complex diseases, constitutes a field of investigation significant in the determination of regulation regions associated to a specific phenotype.

Differentially expressed genes in pre-RA states

RA can be detected years before the first symptoms of the disease, with the development of a systemic autoimmunity. Indeed, auto-antibodies such as the rheumatoid factor and anti-citrullinated peptides antibodies (ACPAs), precede the clinical disease by a median period of at least 5 years.

Rheumatoid arthritis (RA), once fully developed, is difficult to treat and generally requires lifelong therapy. Treatments in the very early phases of the disease, or ideally before the clinical onset of the disease (= pre-clinical phases), are potentially curative. Several prevention trials for RA are ongoing and may lead to screening and preventive strategies for RA, much as controlling hypertension and reducing high cholesterol is helping to reduce the risk of cardiovascular diseases (Finckh A. et al. 2014) However, before preventing RA can become a reality, the precision of diagnosing preclinical RA will need to be improved. While the hereditability of RA is well established and preclinical stages have been identified, it is currently still impossible to provide patients with an individualized estimate of RA risk. Thus the precise diagnosis of pre-clinical RA has become a major scientific question.

Our work hypothesis is that the asymptomatic, pre-clinical phase of RA can be adequately identified by a combination of biologic markers and clinical risk factors. And the characterization of specific regulation profiles and biological abnormalities lead to the identification of new biomarkers present before the first symptoms appear and predictive of RA onset. Our objective is to establish the mechanisms of disease initiation: identify gene regulations in individuals who develop the disease within a year (pre-RA samples) and specific sequence of biological abnormalities leading to the development of disease. We then will be able to characterize biomarkers predictive of RA onset within one year, to be followed-up for evaluation of the diagnostic value for prognostic or treatment response and to characterize individuals at very high risk of developing RA.

This project is based on the SCREEN-RA cohort (www.arthritis-checkup.ch) aims to develop and evaluate a screening strategy for the development of RA in individuals genetically at risk, namely first degree relatives of patients with autoimmune diseases. The cohort was established by Pr. Axel Finckh,(HU Geneva, Switzerland), with the help of a previous SNSF grant (SNSF N° 32003B_120639). Since 2010, over 1300 first degree relatives of RA patients have enrolled, given informed consent, answered detailed epidemiological questionnaires and provided biologic samples (serum, DNA, RNA, stool). The study continues to enrol around 200 new participants per year. Only individuals without clinical evidence of RA are enrolled and followed-up yearly to assess incident arthritis or other phases of impending RA. Our project is also based on a similar cohort developed in France by Pr. F Cornélis.

A preliminary study has been performed with transcriptome analysis through array technology with the collaboration of Dr. R Olaso from CNRGH (Institut François Jacob, CEA/DSV, Evry). Analyses of results are in progress and benefit of the collaboration of Dr. C Dalmasso from LaMME laboratory (UMR8071 CNRS, Evry University).

Candidate Copy Number Variants

Copy Number Variant (CNV) is a segment of DNA that is 1 kb or larger and present at a variable copy number in comparison with a reference genome (Feuk et al., 2006). CNVs in general are stable and can be inherited. Deletions, duplications, segmental duplications, insertions, inversions and translocations represent some of the processes resulting in CNV. Investigation into the genetic basis of complex diseases without consideration of CNVs will miss important component of the heritability (Beckmann et al., 2007; Manolio et al., 2009).

Benefiting of our familial samples, we decided to investigate association of CNV gene candidate with RA. First we worked on methodologies for characterization of copy number. Multiplex standard PCR (mPCR) specific for presence and/or absence of gene were used. We then performed quantitative PCR (qPCR) using a specific fluorescent probe for the target gene and a second one for a reference gene without known CNV. Finally we developed a methodology based on droplet digital PCR (ddPCR), a recent technology based on the generation of about 20,000 micro-reactions in droplets from an initial reaction. Each droplet is analyzed regarding fluorescence of the two probes.

This method gave the highest sensitivity leading to an absolute quantification of copy numbers. Furthermore, in case of two copies identification, it will be able to indicate if the copies are on the same chromosome or not. This particular data is essential in identifying mechanisms causing copy variation such as non-allelic homologous recombination (Gu et al., 2008). We focus the on several genes involved in immunity and stress oxidative pathways. Copy Number genotypes were identified through this analysis of trio families (Ben Kilani et al., 2016; Ben Kilani et al, manuscript in progress; PhD: https://www.theses.fr/185469957).

Through whole genome sequencing analysis, methodologies of RA specific Copy Number variants are developed in the lab (see sequence variants project). ddPCR methodology will be used for technological validation of identified CNV.

Our final goal is to characterize a CNV signature specific to RA, which would complete the genomic factors associated to the genetic risk for this disease. Furthermore, it would be of interest to identify specific CNVs related to pre-clinical phenotypes through prospective cohorts of RA relatives mentioned above.

Collaborations

A Boudjema: USTO (Oran, Algérie)
R Olsao, JF Deleuze : CNRGH (Institut François Jacob, CEA/DSV, Evry)
C Dalmasso : LaMME (UMR8071 CNRS, l'université d'Evry)
A Finckh : HU Genève (Switzerland)
F Cornélis : CHU Auvergne, Clermont-Ferrand Auvergne

Computational systems biology

Interactive knowledge base

Protein-protein interactions are a major driving force behind most biological processes. They play a pivotal role in intra- and extra-cellular functions, and especially in the propagation of signals and cellular regulation. Signal transduction is a fundamental process for the communication of the cell with its environment, comprising several interacting receptors, proteins, enzymes, second messengers and transcription factors. Disruption and dysregulation of these complex molecular and signalling networks can lead to disease. Therefore, the mapping and accurate representation of pathways implicated is a primary but essential step for elucidating the mechanisms underlying disease pathogenesis.

In 2010 Wu et al. published a detailed molecular map concerning rheumatoid arthritis using the software CellDesigner. We decided to use this map as a basis, and expand. The map has been updated with information published after 2010 by exhaustive manual curation and the help of data mining tools. Only experimentally validated interactions in at least two peer reviewed scientific publications are kept. Due to the fact that the initial map was based on high throughput gene expression data from 28 studies and interactions inferred from KEGG database, all nodes and interactions are re-evaluated carefully in an effort to limit false positives. When validation with small scale experiments is not possible, we keep nodes that appear in at least two different high throughput studies. Detailed annotation including PubMED IDs and HUGO names is also added in the MIRIAM section of the CellDesigner file. As far as context representation and overall structure of the map, expert’s advice has been taken into account along with an effort to comply with SBGN standards. The RA map will be web-published in the coming months (a full length manuscript is under preparation) in the form of an interactive map, using the platform MINERVA [Gawron et al., 2016], allowing for easy access, navigation and search of all molecular pathways implicated in RA, serving thus, as an online knowledge base for the disease (figure 3). The user will have access to all literature used, with detailed annotations for every component and reaction, including PubMed IDs, and a list of identifiers such as Uniprot, EntrezGene, Ensembl, HGNC and RefSeq. As the map is constructed using information from various experimental studies, the user will also be able to opt for visualization of data with specific cell origin, highlighting cell-specific sub-networks within the global one. Moreover, the user will have the possibility to spot all known drug targets, and the corresponding drugs up to date for RA. Detailed view of an element will allow the search for drugs, chemicals and miRNAs targeting this particular element. Additionally, user-provided omic datasets could be displayed as overlay, giving a first estimation of affected pathways and components. Lastly, the map will provide feedback about the unmapped molecules from the dataset, allowing for better understanding of the experimental results and for further development of the map’s contents. We have used public datasets from proteomic and transcriptomic studies [Dasuri et al., 2004; Heruth et al., 2012, Berlin, Jena and Leipzig datasets from Woetzel et al, 2014] to demonstrate how the map can be used as a template for separate or simultaneous visualization of different experimental results. The RA map so far includes information derived from more than 150 scientific papers. It has six distinct compartments, namely extracellular space (with extracellular proteins), plasma membrane (with membrane receptors and ligand proteins), cytoplasm (with proteins, miRNAs, small molecules and the sub-compartments of mitochondrion, Golgi apparatus and endoplasmic reticulum), nucleus (with genes, RNAs and transcription factors), a compartment for the secreted molecules and a phenotype compartment including more than ten cellular fates. It comprises more than 400 components and a total of 324 reactions. Each component and reaction in the map is referenced with at least two PubMed IDs or database identifiers if inferred from a specific database (Singh et al. 2018, and second manuscript under preparation, 2018).

Topological analysis of the RA map using the software Cytoscape [Shannon et al., 2003] and relevant plugins reveals unconnected or loosely connected parts that reflect our fragmented knowledge about physical and/or genetic interactions, posing thus obstacles in the subsequent derivation of a reliable dynamical model. To improve connectivity we use dedicated PPI databases (through www.imexconsortium.org), pathway databases (e.g. KEGG, SIGNOR or REACTOME) and the commercial software Ingenuity Pathway Analysis (IPA, www.ingenuity.com) in order to investigate potential co-players of the proteins of interest. For the time being, we do not make use of simulated/computationally inferred interactions or interactions inferred from other species (i.e. mice), restricting our search to experimentally validated data of human origin.

Discrete modeling

Characteristic features of RA include synovial inflammation that can lead to bone erosion and permanent deformity. It is broadly recognized that in RA, synovial inflammation results from complex interactions between haematopoietic and stromal cells. Recent studies have shown that RA synovial fibroblasts play a crucial role in driving the persistent, destructive characteristics of the disease [Juarez et al., 2012]. The second scope of the project is to model synovial fibroblasts behavior under different initial conditions specific to RA, in order to see if we could influence the cellular fate (e.g. enhancing an apoptotic phenotype) or understand what could lead to patient’s resistance to a certain drug and how to overcome it (e.g. presence of rescue pathways, complex feedback mechanisms).

In general, pathway representation and modelling can be seen as two separate tasks with different primary objectives. The first is to draw an accurate, comprehensive diagram depicting current biological knowledge while the second is to study the emergent behavior of the system under different conditions. However, a detailed, fully annotated molecular map works as an excellent scaffold for the building of a regulatory graph and the subsequent derivation of the logical model. This process, that involves many iterations, obliges one to look meticulously into the mapped pathways, spotting potentially problematic or ambiguous aspects of the map. Model simulations can also reveal inconsistencies concerning the global behavior, advocating the necessity for further revisions and refinements. Leaning on the RA knowledge base and using the web platform Cell Collective [Helikar et al., 2012], we are currently building a Boolean dynamical model for the study of RA fibroblasts’ activation.

One major goal of this project is the automatization of the translation of a “graphical” model to a dynamic one, based on the network topology. To do so, we have been developing with collaboration of the Lifeware team, INRIA, Saclay, a tool (CaSQ: CellDesigner as SBML-qual) capable of translating a CellDesigner file to an SBML-qual file, that can retain the network layout, the annotations stored in the MIRIAM section, and also infer preliminary boolean rules based on the networks topology that allow the file to be “executable”.

The semantics for the inference of the preliminary rules are roughly described as follows:

a target is on if one of the reactions producing it is on
a reaction is on if all reactants are on, all inhibitors are off and one of the catalysts is on [where any modification that is not an inhibition is treated as catalysis]

To execute the dynamical model we use the modelling platform Cell Collective. We collaborate with the creators of the platform (Tomas Helikar’s group at University of Nebraska Lincoln, USA) and we have a privileged use of the platforms resources (for simulations and data analysis, and also for privacy until the model is tuned and published.

In order to cope with the vast complexity of the model, we are focusing currently on four major modules that concern fibroblasts contribution in RA, namely apoptosis, pro-inflammatory cytokines, chemokines and bone erosion & matrix degradation. We are able to extract these sub-modules from the main graph, translate them into a boolean model and execute them separately in order to understand and tune them. Once these modules are tuned, we will merge them to obtain the global model concerning RA fibroblasts. A paper concerning the pipeline and the tool is under preparation.

Lastly, the resulting global boolean model for RA fibroblasts could be further analyzed with the software GINsim [Chaouiya et al., 2012] and also serve as a template for the derivation of a continuous model using the software MaBoSS [Stoll et al., 2017] allowing the computation of phenotype probabilities and showcasing interoperability and complementarity between different tools of computational systems biology.

This work covers two aspects: the biological, which would allow the simulations and predictions in silico concerning the behaviour of fibroblasts in RA, cells of utmost importance in the chronicity of inflammation and also in the destruction of bone and cartilage. The second aspect is the technical one, as the automated pipeline developed can be applicable to other disease maps. Institut Curie has declared its interest in using the pipeline for the translation of the Cancer Atlas maps (https://acsn.curie.fr/Temp/ACSN2.html) to executable models

Collaborations

Marek Ostaszewsky, Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, Esch-sur-Alzette, Luxembourg
George Kalliolias, Arthritis & Tissue Degeneration Program, Hospital for Special Surgery, New York, USA, Department of Medicine, Weill Cornell Medical College, New York City, USA
Gilles Chiocchia, Faculty of Health Sciences Simone Veil, INSERM U1173, University of Versailles Saint-Quentin-en-Yvelines Montigny-le-Bretonneux, France
Robert Olasso, Centre National de Recherche en Génomique Humaine (CNRGH), CEA, Evry, France
Tomas Helikar, Department of Biochemistry, University of Nebraska-Lincoln Lincoln, NE, USA
Sylvain Solyman, Inria Saclay-Île-de-France - Équipe Lifeware, France

Research axis

Lab history

Research interests

Genomic data analyses

Sequence variants

Transcriptome/expression profiling

Differentially expressed genes in RA

Differentially expressed genes in pre-RA states

Candidate Copy Number Variants

Collaborations

Computational systems biology

Interactive knowledge base

Discrete modeling

Collaborations