Search | VHL Regional Portal

1.

Critical assessment of missense variant effect predictors on disease-relevant variant data.

Rastogi, Ruchir; Chung, Ryan; Li, Sindy; Li, Chang; Lee, Kyoungyeul; Woo, Junwoo; Kim, Dong-Wook; Keum, Changwon; Babbi, Giulia; Martelli, Pier Luigi; Savojardo, Castrense; Casadio, Rita; Chennen, Kirsley; Weber, Thomas; Poch, Olivier; Ancien, François; Cia, Gabriel; Pucci, Fabrizio; Raimondi, Daniele; Vranken, Wim; Rooman, Marianne; Marquet, Céline; Olenyi, Tobias; Rost, Burkhard; Andreoletti, Gaia; Kamandula, Akash; Peng, Yisu; Bakolitsa, Constantina; Mort, Matthew; Cooper, David N; Bergquist, Timothy; Pejaver, Vikas; Liu, Xiaoming; Radivojac, Predrag; Brenner, Steven E; Ioannidis, Nilah M.

bioRxiv ; 2024 Jun 08.

Article in English | MEDLINE | ID: mdl-38895200

ABSTRACT

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.

2.

IMPatienT: An Integrated Web Application to Digitize, Process and Explore Multimodal PATIENt daTa.

Meyer, Corentin; Romero, Norma Beatriz; Evangelista, Teresinha; Cadot, Brunot; Laporte, Jocelyn; Jeannin-Girardon, Anne; Collet, Pierre; Ayadi, Ali; Chennen, Kirsley; Poch, Olivier.

J Neuromuscul Dis ; 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38701156

ABSTRACT

Medical acts, such as imaging, lead to the production of various medical text reports that describe the relevant findings. This induces multimodality in patient data by combining image data with free-text and consequently, multimodal data have become central to drive research and improve diagnoses. However, the exploitation of patient data is problematic as the ecosystem of analysis tools is fragmented according to the type of data (images, text, genetics), the task (processing, exploration) and domain of interest (clinical phenotype, histology). To address the challenges, we developed IMPatienT (Integrated digital Multimodal PATIENt daTa), a simple, flexible and open-source web application to digitize, process and explore multimodal patient data. IMPatienT has a modular architecture allowing to: (i) create a standard vocabulary for a domain, (ii) digitize and process free-text data, (iii) annotate images and perform image segmentation, (iv) generate a visualization dashboard and provide diagnosis decision support. To demonstrate the advantages of IMPatienT, we present a use case on a corpus of 40 simulated muscle biopsy reports of congenital myopathy patients. As IMPatienT provides users with the ability to design their own vocabulary, it can be adapted to any research domain and can be used as a patient registry for exploratory data analysis. A demo instance of the application is available at https://impatient.lbgi.fr/.

3.

Hydrogen peroxide at the poles of Ganymede.

Trumbo, Samantha K; Brown, Michael E; Bockelée-Morvan, Dominique; de Pater, Imke; Fouchet, Thierry; Wong, Michael H; Cazaux, Stéphanie; Fletcher, Leigh N; de Kleer, Katherine; Lellouch, Emmanuel; Mura, Alessandro; Poch, Olivier; Quirico, Eric; Rodriguez-Ovalle, Pablo; Showalter, Mark R; Tiscareno, Matthew S; Tosi, Federico.

Sci Adv ; 9(29): eadg3724, 2023 Jul 21.

Article in English | MEDLINE | ID: mdl-37478185

ABSTRACT

Ganymede is the only satellite in the solar system known to have an intrinsic magnetic field. Interactions between this field and the Jovian magnetosphere are expected to funnel most of the associated impinging charged particles, which radiolytically alter surface chemistry across the Jupiter system, to Ganymede's polar regions. Using observations obtained with JWST as part of the Early Release Science program exploring the Jupiter system, we report the discovery of hydrogen peroxide, a radiolysis product of water ice, specifically constrained to the high latitudes. This detection directly implies radiolytic modification of the polar caps by precipitation of Jovian charged particles along partially open field lines within Ganymede's magnetosphere. Stark contrasts between the spatial distribution of this polar hydrogen peroxide, those of Ganymede's other radiolytic oxidants, and that of hydrogen peroxide on neighboring Europa have important implications for understanding water-ice radiolysis throughout the solar system.

4.

Real or fake? Measuring the impact of protein annotation errors on estimates of domain gain and loss events.

Kress, Arnaud; Poch, Olivier; Lecompte, Odile; Thompson, Julie D.

Front Bioinform ; 3: 1178926, 2023.

Article in English | MEDLINE | ID: mdl-37151482

ABSTRACT

Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.

5.

CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach.

Mayer, Claudine; Vogt, Arthur; Uslu, Tuba; Scalzitti, Nicolas; Chennen, Kirsley; Poch, Olivier; Thompson, Julie D.

J Fungi (Basel) ; 9(4)2023 Mar 29.

Article in English | MEDLINE | ID: mdl-37108879

ABSTRACT

In fungi, the most abundant transcription factor (TF) class contains a fungal-specific 'GAL4-like' Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as 'fungal_trans' or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these 'MHD-only' proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6-MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.

6.

Spliceator: multi-species splice site prediction using convolutional neural networks.

Scalzitti, Nicolas; Kress, Arnaud; Orhand, Romain; Weber, Thomas; Moulinier, Luc; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Bioinformatics ; 22(1): 561, 2021 Nov 23.

Article in English | MEDLINE | ID: mdl-34814826

ABSTRACT

BACKGROUND: Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. RESULTS: We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. CONCLUSIONS: Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Subject(s)

Algorithms , Neural Networks, Computer , Animals , Genome , Humans

7.

Novel Approach Combining Transcriptional and Evolutionary Signatures to Identify New Multiciliation Genes.

Defosset, Audrey; Merlat, Dorine; Poidevin, Laetitia; Nevers, Yannis; Kress, Arnaud; Poch, Olivier; Lecompte, Odile.

Genes (Basel) ; 12(9)2021 09 21.

Article in English | MEDLINE | ID: mdl-34573434

ABSTRACT

Multiciliogenesis is a complex process that allows the generation of hundreds of motile cilia on the surface of specialized cells, to create fluid flow across epithelial surfaces. Dysfunction of human multiciliated cells is associated with diseases of the brain, airway and reproductive tracts. Despite recent efforts to characterize the transcriptional events responsible for the differentiation of multiciliated cells, a lot of actors remain to be identified. In this work, we capitalize on the ever-growing quantity of high-throughput data to search for new candidate genes involved in multiciliation. After performing a large-scale screening using 10 transcriptomics datasets dedicated to multiciliation, we established a specific evolutionary signature involving Otomorpha fish to use as a criterion to select the most likely targets. Combining both approaches highlighted a list of 114 potential multiciliated candidates. We characterized these genes first by generating protein interaction networks, which showed various clusters of ciliated and multiciliated genes, and then by computing phylogenetic profiles. In the end, we selected 11 poorly characterized genes that seem like particularly promising multiciliated candidates. By combining functional and comparative genomics methods, we developed a novel type of approach to study biological processes and identify new promising candidates linked to that process.

Subject(s)

Cilia/physiology , Fish Proteins/genetics , Fishes , Genomics/methods , Animals , Biological Evolution , Cell Differentiation/genetics , Cilia/genetics , Databases, Genetic , Fish Proteins/metabolism , Gene Expression , Humans , Phylogeny , Transcriptome

8.

A DNA Repair and Cell Cycle Gene Expression Signature in Pediatric High-Grade Gliomas: Prognostic and Therapeutic Value.

Entz-Werlé, Natacha; Poidevin, Laetitia; Nazarov, Petr V; Poch, Olivier; Lhermitte, Benoit; Chenard, Marie Pierre; Burckel, Hélène; Guérin, Eric; Fuchs, Quentin; Castel, David; Noel, Georges; Choulier, Laurence; Dontenwill, Monique; Van Dyck, Eric.

Cancers (Basel) ; 13(9)2021 May 07.

Article in English | MEDLINE | ID: mdl-34067180

ABSTRACT

BACKGROUND: Pediatric high-grade gliomas (pHGGs) are the leading cause of mortality in pediatric neuro-oncology, displaying frequent resistance to standard therapies. Profiling DNA repair and cell cycle gene expression has recently been proposed as a strategy to classify adult glioblastomas. To improve our understanding of the DNA damage response pathways that operate in pHGGs and the vulnerabilities that these pathways might expose, we sought to identify and characterize a specific DNA repair and cell-cycle gene expression signature of pHGGs. METHODS: Transcriptomic analyses were performed to identify a DNA repair and cell-cycle gene expression signature able to discriminate pHGGs (n = 6) from low-grade gliomas (n = 10). This signature was compared to related signatures already established. We used the pHGG signature to explore already transcriptomic datasets of DIPGs and sus-tentorial pHGGs. Finally, we examined the expression of key proteins of the pHGG signature in 21 pHGG diagnostic samples and nine paired relapses. Functional inhibition of one DNA repair factor was carried out in four patients who derived H3.3 K27M mutant cell lines. RESULTS: We identified a 28-gene expression signature of DNA repair and cell cycle that clustered pHGGs cohorts, in particular sus-tentorial locations, in two groups. Differential protein expression levels of PARP1 and XRCC1 were associated to TP53 mutations and TOP2A amplification and linked significantly to the more radioresistant pHGGs displaying the worst outcome. Using patient-derived cell lines, we showed that the PARP-1/XRCC1 expression balance might be correlated with resistance to PARP1 inhibition. CONCLUSION: We provide evidence that PARP1 overexpression, associated to XRCC1 expression, TP53 mutations, and TOP2A amplification, is a new theranostic and potential therapeutic target.

9.

Potential role of the X circular code in the regulation of gene expression.

Thompson, Julie D; Ripp, Raymond; Mayer, Claudine; Poch, Olivier; Michel, Christian J.

Biosystems ; 203: 104368, 2021 May.

Article in English | MEDLINE | ID: mdl-33567309

ABSTRACT

The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.

Subject(s)

Codon/genetics , Gene Expression Regulation/genetics , Nucleotide Motifs/genetics , Genetic Code/genetics , Reading Frames , Ribosomes

10.

Dwarf planet (1) Ceres surface bluing due to high porosity resulting from sublimation.

Schröder, Stefan E; Poch, Olivier; Ferrari, Marco; Angelis, Simone De; Sultana, Robin; Potin, Sandra M; Beck, Pierre; De Sanctis, Maria Cristina; Schmitt, Bernard.

Nat Commun ; 12(1): 274, 2021 01 12.

Article in English | MEDLINE | ID: mdl-33436561

ABSTRACT

The Dawn mission found that the dominant colour variation on the surface of dwarf planet Ceres is a change of the visible spectral slope, where fresh impact craters are surrounded by blue (negative spectral-sloped) ejecta. The origin of this colour variation is still a mystery. Here we investigate a scenario in which an impact mixes the phyllosilicates present on the surface of Ceres with the water ice just below. In our experiment, Ceres analogue material is suspended in liquid water to create intimately mixed ice particles, which are sublimated under conditions approximating those on Ceres. The sublimation residue has a highly porous, foam-like structure made of phyllosilicates that scattered light in similar blue fashion as the Ceres surface. Our experiment provides a mechanism for the blue colour of fresh craters that can naturally emerge from the Ceres environment.

11.

Proteome-Scale Detection of Differential Conservation Patterns at Protein and Subprotein Levels with BLUR.

Defosset, Audrey; Kress, Arnaud; Nevers, Yannis; Ripp, Raymond; Thompson, Julie D; Poch, Olivier; Lecompte, Odile.

Genome Biol Evol ; 13(1)2021 01 07.

Article in English | MEDLINE | ID: mdl-33211099

ABSTRACT

In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype-phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein-protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.

Subject(s)

Evolution, Molecular , Genomics/methods , Proteins/genetics , Proteome/genetics , Proteome/metabolism , Animals , Armadillo Domain Proteins , Bacteria , Conserved Sequence/genetics , Fungi , Genotype , Humans , Phenotype , Phylogeny , Sequence Alignment , Sequence Analysis , Software

12.

Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes.

Meyer, Corentin; Scalzitti, Nicolas; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Bioinformatics ; 21(1): 513, 2020 Nov 10.

Article in English | MEDLINE | ID: mdl-33172385

ABSTRACT

BACKGROUND: Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon-intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. RESULTS: We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. CONCLUSIONS: Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon-intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction.

Subject(s)

Open Reading Frames/genetics , Primates/metabolism , Proteome , Amino Acid Sequence , Animals , Databases, Protein , Gene Deletion , Humans , Mutagenesis, Insertional , Receptor-Like Protein Tyrosine Phosphatases/chemistry , Receptor-Like Protein Tyrosine Phosphatases/genetics , Receptor-Like Protein Tyrosine Phosphatases/metabolism , Sequence Alignment

13.

MISTIC: A prediction tool to reveal disease-relevant deleterious missense variants.

Chennen, Kirsley; Weber, Thomas; Lornage, Xavière; Kress, Arnaud; Böhm, Johann; Thompson, Julie; Laporte, Jocelyn; Poch, Olivier.

PLoS One ; 15(7): e0236962, 2020.

Article in English | MEDLINE | ID: mdl-32735577

ABSTRACT

The diffusion of next-generation sequencing technologies has revolutionized research and diagnosis in the field of rare Mendelian disorders, notably via whole-exome sequencing (WES). However, one of the main issues hampering achievement of a diagnosis via WES analyses is the extended list of variants of unknown significance (VUS), mostly composed of missense variants. Hence, improved solutions are needed to address the challenges of identifying potentially deleterious variants and ranking them in a prioritized short list. We present MISTIC (MISsense deleTeriousness predICtor), a new prediction tool based on an original combination of two complementary machine learning algorithms using a soft voting system that integrates 113 missense features, ranging from multi-ethnic minor allele frequencies and evolutionary conservation, to physiochemical and biochemical properties of amino acids. Our approach also uses training sets with a wide spectrum of variant profiles, including both high-confidence positive (deleterious) and negative (benign) variants. Compared to recent state-of-the-art prediction tools in various benchmark tests and independent evaluation scenarios, MISTIC exhibits the best and most consistent performance, notably with the highest AUC value (> 0.95). Importantly, MISTIC maintains its high performance in the specific case of discriminating deleterious variants from benign variants that are rare or population-specific. In a clinical context, MISTIC drastically reduces the list of VUS (<30%) and significantly improves the ranking of "causative" deleterious variants. Pre-computed MISTIC scores for all possible human missense variants are available at http://lbgi.fr/mistic.

Subject(s)

Exome Sequencing/methods , Genetic Diseases, Inborn , Mutation, Missense , Software , Computational Biology , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Machine Learning

14.

Characterization of accessory genes in coronavirus genomes.

Michel, Christian Jean; Mayer, Claudine; Poch, Olivier; Thompson, Julie Dawn.

Virol J ; 17(1): 131, 2020 08 27.

Article in English | MEDLINE | ID: mdl-32854725

ABSTRACT

BACKGROUND: The Covid19 infection is caused by the SARS-CoV-2 virus, a novel member of the coronavirus (CoV) family. CoV genomes code for a ORF1a / ORF1ab polyprotein and four structural proteins widely studied as major drug targets. The genomes also contain a variable number of open reading frames (ORFs) coding for accessory proteins that are not essential for virus replication, but appear to have a role in pathogenesis. The accessory proteins have been less well characterized and are difficult to predict by classical bioinformatics methods. METHODS: We propose a computational tool GOFIX to characterize potential ORFs in virus genomes. In particular, ORF coding potential is estimated by searching for enrichment in motifs of the X circular code, that is known to be over-represented in the reading frames of viral genes. RESULTS: We applied GOFIX to study the SARS-CoV-2 and related genomes including SARS-CoV and SARS-like viruses from bat, civet and pangolin hosts, focusing on the accessory proteins. Our analysis provides evidence supporting the presence of overlapping ORFs 7b, 9b and 9c in all the genomes and thus helps to resolve some differences in current genome annotations. In contrast, we predict that ORF3b is not functional in all genomes. Novel putative ORFs were also predicted, including a truncated form of the ORF10 previously identified in SARS-CoV-2 and a little known ORF overlapping the Spike protein in Civet-CoV and SARS-CoV. CONCLUSIONS: Our findings contribute to characterizing sequence properties of accessory genes of SARS coronaviruses, and especially the newly acquired genes making use of overlapping reading frames.

Subject(s)

Betacoronavirus/genetics , Genome, Viral , Open Reading Frames , Severe acute respiratory syndrome-related coronavirus/genetics , Viral Regulatory and Accessory Proteins/genetics , Animals , Codon , Computational Biology , Evolution, Molecular , Genes, Viral , Humans , SARS-CoV-2 , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Viral Matrix Proteins/genetics , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Regulatory and Accessory Proteins/chemistry

15.

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.

Scalzitti, Nicolas; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Genomics ; 21(1): 293, 2020 Apr 09.

Article in English | MEDLINE | ID: mdl-32272892

ABSTRACT

BACKGROUND: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. RESULTS: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. CONCLUSIONS: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.

Subject(s)

Computational Biology/methods , Eukaryota/genetics , Molecular Sequence Annotation/methods , Animals , Data Curation , Evolution, Molecular , Humans , Phylogeny

16.

Ammonium salts are a reservoir of nitrogen on a cometary nucleus and possibly on some asteroids.

Poch, Olivier; Istiqomah, Istiqomah; Quirico, Eric; Beck, Pierre; Schmitt, Bernard; Theulé, Patrice; Faure, Alexandre; Hily-Blant, Pierre; Bonal, Lydie; Raponi, Andrea; Ciarniello, Mauro; Rousseau, Batiste; Potin, Sandra; Brissaud, Olivier; Flandinet, Laurène; Filacchione, Gianrico; Pommerol, Antoine; Thomas, Nicolas; Kappel, David; Mennella, Vito; Moroz, Lyuba; Vinogradoff, Vassilissa; Arnold, Gabriele; Erard, Stéphane; Bockelée-Morvan, Dominique; Leyrat, Cédric; Capaccioni, Fabrizio; De Sanctis, Maria Cristina; Longobardo, Andrea; Mancarella, Francesca; Palomba, Ernesto; Tosi, Federico.

Science ; 367(6483)2020 03 13.

Article in English | MEDLINE | ID: mdl-32165559

ABSTRACT

The measured nitrogen-to-carbon ratio in comets is lower than for the Sun, a discrepancy which could be alleviated if there is an unknown reservoir of nitrogen in comets. The nucleus of comet 67P/Churyumov-Gerasimenko exhibits an unidentified broad spectral reflectance feature around 3.2 micrometers, which is ubiquitous across its surface. On the basis of laboratory experiments, we attribute this absorption band to ammonium salts mixed with dust on the surface. The depth of the band indicates that semivolatile ammonium salts are a substantial reservoir of nitrogen in the comet, potentially dominating over refractory organic matter and more volatile species. Similar absorption features appear in the spectra of some asteroids, implying a compositional link between asteroids, comets, and the parent interstellar cloud.

17.

Novel IQCE variations confirm its role in postaxial polydactyly and cause ciliary defect phenotype in zebrafish.

Estrada-Cuzcano, Alejandro; Etard, Christelle; Delvallée, Clarisse; Stoetzel, Corinne; Schaefer, Elise; Scheidecker, Sophie; Geoffroy, Véronique; Schneider, Aline; Studer, Fouzia; Mattioli, Francesca; Chennen, Kirsley; Sigaudy, Sabine; Plassard, Damien; Poch, Olivier; Piton, Amélie; Strahle, Uwe; Muller, Jean; Dollfus, Hélène.

Hum Mutat ; 41(1): 240-254, 2020 01.

Article in English | MEDLINE | ID: mdl-31549751

ABSTRACT

Polydactyly is one of the most frequent inherited defects of the limbs characterized by supernumerary digits and high-genetic heterogeneity. Among the many genes involved, either in isolated or syndromic forms, eight have been implicated in postaxial polydactyly (PAP). Among those, IQCE has been recently identified in a single consanguineous family. Using whole-exome sequencing in patients with uncharacterized ciliopathies, including PAP, we identified three families with biallelic pathogenic variations in IQCE. Interestingly, the c.895_904del (p.Val301Serfs*8) was found in all families without sharing a common haplotype, suggesting a recurrent mechanism. Moreover, in two families, the systemic phenotype could be explained by additional pathogenic variants in known genes (TULP1, ATP6V1B1). RNA expression analysis on patients' fibroblasts confirms that the dysfunction of IQCE leads to the dysregulation of genes associated with the hedgehog-signaling pathway, and zebrafish experiments demonstrate a full spectrum of phenotypes linked to defective cilia: Body curvature, kidney cysts, left-right asymmetry, misdirected cilia in the pronephric duct, and retinal defects. In conclusion, we identified three additional families confirming IQCE as a nonsyndromic PAP gene. Our data emphasize the importance of taking into account the complete set of variations of each individual, as each clinical presentation could finally be explained by multiple genes.

Subject(s)

Ciliopathies/diagnosis , Ciliopathies/genetics , Fingers/abnormalities , Genetic Predisposition to Disease , Genetic Variation , Intracellular Signaling Peptides and Proteins/genetics , Membrane Proteins/genetics , Phenotype , Polydactyly/diagnosis , Polydactyly/genetics , Toes/abnormalities , Animals , Consanguinity , Fluorescent Antibody Technique , Gene Expression Profiling , Genetic Association Studies/methods , Homozygote , Humans , Immunohistochemistry , Intracellular Signaling Peptides and Proteins/metabolism , Membrane Proteins/metabolism , Pedigree , Signal Transduction , Transcriptome , Exome Sequencing , Zebrafish

18.

Circular code motifs in the ribosome: a missing link in the evolution of translation?

Dila, Gopal; Ripp, Raymond; Mayer, Claudine; Poch, Olivier; Michel, Christian J; Thompson, Julie D.

RNA ; 25(12): 1714-1730, 2019 12.

Article in English | MEDLINE | ID: mdl-31506380

ABSTRACT

The origin of the genetic code remains enigmatic five decades after it was elucidated, although there is growing evidence that the code coevolved progressively with the ribosome. A number of primordial codes were proposed as ancestors of the modern genetic code, including comma-free codes such as the RRY, RNY, or GNC codes (R = G or A, Y = C or T, N = any nucleotide), and the X circular code, an error-correcting code that also allows identification and maintenance of the reading frame. It was demonstrated previously that motifs of the X circular code are significantly enriched in the protein-coding genes of most organisms, from bacteria to eukaryotes. Here, we show that imprints of this code also exist in the ribosomal RNA (rRNA). In a large-scale study involving 133 organisms representative of the three domains of life, we identified 32 universal X motifs that are conserved in the rRNA of >90% of the organisms. Intriguingly, most of the universal X motifs are located in rRNA regions involved in important ribosome functions, notably in the peptidyl transferase center and the decoding center that form the original "proto-ribosome." Building on the existing accretion models for ribosome evolution, we propose that error-correcting circular codes represented an important step in the emergence of the modern genetic code. Thus, circular codes would have allowed the simultaneous coding of amino acids and synchronization of the reading frame in primitive translation systems, prior to the emergence of more sophisticated start codon recognition and translation initiation mechanisms.

Subject(s)

Evolution, Molecular , Genetic Code , Nucleotide Motifs , Protein Biosynthesis , Ribosomes/genetics , Ribosomes/metabolism , Models, Biological , Models, Molecular , Molecular Conformation , Nucleic Acid Conformation , RNA, Ribosomal/chemistry , RNA, Ribosomal/genetics , Ribosomes/chemistry , Structure-Activity Relationship

19.

The Photochemistry on Space Station (PSS) Experiment: Organic Matter under Mars-like Surface UV Radiation Conditions in Low Earth Orbit.

Stalport, Fabien; Rouquette, Laura; Poch, Olivier; Dequaire, Tristan; Chaouche-Mechidal, Naïla; Payart, Shanèle; Szopa, Cyril; Coll, Patrice; Chaput, Didier; Jaber, Maguy; Raulin, François; Cottin, Hervé.

Astrobiology ; 19(8): 1037-1052, 2019 08.

Article in English | MEDLINE | ID: mdl-31314573

ABSTRACT

The search for organic molecules at the surface of Mars is a top priority of the Mars Science Laboratory (NASA) and ExoMars 2020 (ESA) space missions. Their main goal is to search for past and/or present molecular compounds related to a potential prebiotic chemistry and/or a biological activity on the Red Planet. A key step to interpret their data is to characterize the preservation or the evolution of organic matter in the martian environmental conditions. Several laboratory experiments have been developed especially concerning the influence of ultraviolet (UV) radiation. However, the experimental UV sources do not perfectly reproduce the solar UV radiation reaching the surface of Mars. For this reason, the International Space Station (ISS) can be advantageously used to expose the same samples studied in the laboratory to UV radiation representative of martian conditions. Those laboratory simulations can be completed by experiments in low Earth orbit (LEO) outside the ISS. Our study was part of the Photochemistry on the Space Station experiment on board the EXPOSE-R2 facility that was kept outside the ISS from October 2014 to February 2016. Chrysene, adenine, and glycine, pure or deposited on an iron-rich amorphous mineral phase, were exposed to solar UV. The total duration of exposure to UV radiation is estimated to be in the 1250-1420 h range. Each sample was characterized prior to and after the flight by Fourier transform infrared (FTIR) spectroscopy. These measurements showed that all exposed samples were partially degraded. Their quantum efficiencies of photodecomposition were calculated in the 200-250 nm wavelength range. They range from 10-4 to 10-6 molecules·photon-1 for pure organic samples and from 10-2 to 10-5 molecules·photon-1 for organic samples shielded by the mineral phase. These results highlight that none of the tested organics are stable under LEO solar UV radiation conditions. The presence of an iron-rich mineral phase increases their degradation.

Subject(s)

Earth, Planet , Extraterrestrial Environment , Mars , Organic Chemicals/analysis , Photochemistry , Spacecraft , Ultraviolet Rays , Half-Life , Kinetics , Spectrophotometry, Infrared

20.

Bardet-Biedl syndrome: Antenatal presentation of forty-five fetuses with biallelic pathogenic variants in known Bardet-Biedl syndrome genes.

Mary, Laura; Chennen, Kirsley; Stoetzel, Corinne; Antin, Manuela; Leuvrey, Anne; Nourisson, Elsa; Alanio-Detton, Elisabeth; Antal, Maria C; Attié-Bitach, Tania; Bouvagnet, Patrice; Bouvier, Raymonde; Buenerd, Annie; Clémenson, Alix; Devisme, Louise; Gasser, Bernard; Gilbert-Dussardier, Brigitte; Guimiot, Fabien; Khau Van Kien, Philippe; Leroy, Brigitte; Loget, Philippe; Martinovic, Jelena; Pelluard, Fanny; Perez, Marie-Josée; Petit, Florence; Pinson, Lucile; Rooryck-Thambo, Caroline; Poch, Olivier; Dollfus, Hélène; Schaefer, Elise; Muller, Jean.

Clin Genet ; 95(3): 384-397, 2019 03.

Article in English | MEDLINE | ID: mdl-30614526

ABSTRACT

Bardet-Biedl syndrome (BBS) is an emblematic ciliopathy associated with retinal dystrophy, obesity, postaxial polydactyly, learning disabilities, hypogonadism and renal dysfunction. Before birth, enlarged/cystic kidneys as well as polydactyly are the hallmark signs of BBS to consider in absence of familial history. However, these findings are not specific to BBS, raising the problem of differential diagnoses and prognosis. Molecular diagnosis during pregnancies remains a timely challenge for this heterogeneous disease (22 known genes). We report here the largest cohort of BBS fetuses to better characterize the antenatal presentation. Prenatal ultrasound (US) and/or autopsy data from 74 fetuses with putative BBS diagnosis were collected out of which molecular diagnosis was established in 51 cases, mainly in BBS genes (45 cases) following the classical gene distribution, but also in other ciliopathy genes (6 cases). Based on this, an updated diagnostic decision tree is proposed. No genotype/phenotype correlation could be established but postaxial polydactyly (82%) and renal cysts (78%) were the most prevalent symptoms. However, autopsy revealed polydactyly that was missed by prenatal US in 55% of the cases. Polydactyly must be carefully looked for in pregnancies with apparently isolated renal anomalies in fetuses.

Subject(s)

Bardet-Biedl Syndrome/diagnosis , Genetic Association Studies , Genetic Predisposition to Disease , Phenotype , Alleles , Amino Acid Substitution , Autopsy , Bardet-Biedl Syndrome/genetics , Biopsy , Genotype , Humans , Mutation , Prenatal Diagnosis , Exome Sequencing

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL