Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 22(1): 561, 2021 Nov 23.
Article in English | MEDLINE | ID: mdl-34814826

ABSTRACT

BACKGROUND: Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. RESULTS: We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. CONCLUSIONS: Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.


Subject(s)
Algorithms , Neural Networks, Computer , Animals , Genome , Humans
2.
J Biol Chem ; 297(2): 100913, 2021 08.
Article in English | MEDLINE | ID: mdl-34175310

ABSTRACT

Trypanosomatid parasites are responsible for various human diseases, such as sleeping sickness, animal trypanosomiasis, or cutaneous and visceral leishmaniases. The few available drugs to fight related parasitic infections are often toxic and present poor efficiency and specificity, and thus, finding new molecular targets is imperative. Aminoacyl-tRNA synthetases (aaRSs) are essential components of the translational machinery as they catalyze the specific attachment of an amino acid onto cognate tRNA(s). In trypanosomatids, one gene encodes both cytosolic- and mitochondrial-targeted aaRSs, with only three exceptions. We identify here a unique specific feature of aaRSs from trypanosomatids, which is that most of them harbor distinct insertion and/or extension sequences. Among the 26 identified aaRSs in the trypanosome Leishmania tarentolae, 14 contain an additional domain or a terminal extension, confirmed in mature mRNAs by direct cDNA nanopore sequencing. Moreover, these RNA-Seq data led us to address the question of aaRS dual localization and to determine splice-site locations and the 5'-UTR lengths for each mature aaRS-encoding mRNA. Altogether, our results provided evidence for at least one specific mechanism responsible for mitochondrial addressing of some L. tarentolae aaRSs. We propose that these newly identified features of trypanosomatid aaRSs could be developed as relevant drug targets to combat the diseases caused by these parasites.


Subject(s)
Amino Acids/metabolism , Amino Acyl-tRNA Synthetases/metabolism , Leishmania/enzymology , Leishmaniasis/pathology , RNA, Transfer/genetics , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/chemistry , Amino Acyl-tRNA Synthetases/genetics , Animals , Cytosol/metabolism , Humans , Leishmania/isolation & purification , Leishmaniasis/enzymology , Leishmaniasis/parasitology , Mitochondria/metabolism , Phylogeny , RNA, Transfer/metabolism , Sequence Homology, Amino Acid
3.
Hum Mutat ; 38(10): 1316-1324, 2017 10.
Article in English | MEDLINE | ID: mdl-28608363

ABSTRACT

Numerous mutations in each of the mitochondrial aminoacyl-tRNA synthetases (aaRSs) have been implicated in human diseases. The mutations are autosomal and recessive and lead mainly to neurological disorders, although with pleiotropic effects. The processes and interactions that drive the etiology of the disorders associated with mitochondrial aaRSs (mt-aaRSs) are far from understood. The complexity of the clinical, genetic, and structural data requires concerted, interdisciplinary efforts to understand the molecular biology of these disorders. Toward this goal, we designed MiSynPat, a comprehensive knowledge base together with an ergonomic Web server designed to organize and access all pertinent information (sequences, multiple sequence alignments, structures, disease descriptions, mutation characteristics, original literature) on the disease-linked human mt-aaRSs. With MiSynPat, a user can also evaluate the impact of a possible mutation on sequence-conservation-structure in order to foster the links between basic and clinical researchers and to facilitate future diagnosis. The proposed integrated view, coupled with research on disease-related mt-aaRSs, will help to reveal new functions for these enzymes and to open new vistas in the molecular biology of the cell. The purpose of MiSynPat, freely available at http://misynpat.org, is to constitute a reference and a converging resource for scientists and clinicians.


Subject(s)
Amino Acyl-tRNA Synthetases/genetics , Databases, Genetic , Mitochondria/enzymology , Mutation/genetics , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/chemistry , Evolution, Molecular , Genetic Diseases, Inborn/genetics , Humans , Mitochondria/genetics , Molecular Structure , Protein Conformation
4.
Biochimie ; 100: 18-26, 2014 May.
Article in English | MEDLINE | ID: mdl-24120687

ABSTRACT

Mammalian mitochondrial aminoacyl-tRNA synthetases are nuclear-encoded enzymes that are essential for mitochondrial protein synthesis. Due to an endosymbiotic origin of the mitochondria, many of them share structural domains with homologous bacterial enzymes of same specificity. This is also the case for human mitochondrial aspartyl-tRNA synthetase (AspRS) that shares the so-called bacterial insertion domain with bacterial homologs. The function of this domain in the mitochondrial proteins is unclear. Here, we show by bioinformatic analyses that the sequences coding for the bacterial insertion domain are less conserved in opisthokont and protist than in bacteria and viridiplantae. The divergence suggests a loss of evolutionary pressure on this domain for non-plant mitochondrial AspRSs. This discovery is further connected with the herein described occurrence of alternatively spliced transcripts of the mRNAs coding for some mammalian mitochondrial AspRSs. Interestingly, the spliced transcripts alternately lack one of the four exons that code for the bacterial insertion domain. Although we showed that the human alternative transcript is present in all tested tissues; co-exists with the full-length form, possesses 5'- and 3'-UTRs, a poly-A tail and is bound to polysomes, we were unable to detect the corresponding protein. The relaxed selective pressure combined with the occurrence of alternative splicing, involving a single structural sub-domain, favors the hypothesis of the loss of function of this domain for AspRSs of mitochondrial location. This evolutionary divergence is in line with other characteristics, established for the human mt-AspRS, that indicate a functional relaxation of non-viridiplantae mt-AspRSs when compared to bacterial and plant ones, despite their common ancestry.


Subject(s)
Aspartate-tRNA Ligase/chemistry , Mitochondria/genetics , Mitochondrial Proteins/chemistry , Protein Biosynthesis , RNA, Messenger/chemistry , Alternative Splicing , Alveolata/enzymology , Alveolata/genetics , Amino Acid Sequence , Amoebozoa/enzymology , Amoebozoa/genetics , Animals , Archaea/enzymology , Archaea/genetics , Aspartate-tRNA Ligase/genetics , Aspartate-tRNA Ligase/metabolism , Base Sequence , Evolution, Molecular , Fungi/enzymology , Fungi/genetics , Gene Expression , Humans , Mitochondria/enzymology , Mitochondrial Proteins/genetics , Mitochondrial Proteins/metabolism , Models, Molecular , Molecular Sequence Data , Mutagenesis, Insertional , Protein Structure, Tertiary , RNA, Messenger/genetics , RNA, Messenger/metabolism , Selection, Genetic , Sequence Alignment , Viridiplantae/enzymology , Viridiplantae/genetics
5.
Nucleic Acids Res ; 40(Web Server issue): W71-5, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22641855

ABSTRACT

A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.


Subject(s)
Disease/genetics , Mutation, Missense , Polymorphism, Single Nucleotide , Software , Genetic Association Studies , Humans , Internet , Knowledge Bases , Phenotype , Proteins/chemistry , Proteins/genetics
6.
Database (Oxford) ; 2012: bas018, 2012.
Article in English | MEDLINE | ID: mdl-22491796

ABSTRACT

The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats. Database URL: http://decrypthon.igbmc.fr/msv3d.


Subject(s)
Databases, Protein , Mutation, Missense , Proteins/chemistry , Proteins/genetics , Amino Acid Substitution , Database Management Systems , Humans , Internet , Models, Molecular , Polymorphism, Single Nucleotide , Protein Conformation
7.
Hum Mutat ; 31(2): 127-35, 2010 Feb.
Article in English | MEDLINE | ID: mdl-19921752

ABSTRACT

Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program.


Subject(s)
Databases, Protein , Genetic Diseases, Inborn/genetics , Mutation, Missense/genetics , Software , Humans , Internet , Phenotype , Proteins , User-Computer Interface
8.
Bioinformatics ; 24(2): 276-8, 2008 Jan 15.
Article in English | MEDLINE | ID: mdl-18037684

ABSTRACT

UNLABELLED: With the establishment of high-throughput (HT) screening methods there is an increasing need for automatic analysis methods. Here we present RReportGenerator, a user-friendly portal for automatic routine analysis using the statistical platform R and Bioconductor. RReportGenerator is designed to analyze data using predefined analysis scenarios via a graphical user interface (GUI). A report in pdf format combining text, figures and tables is automatically generated and results may be exported. To demonstrate suitable analysis tasks we provide direct web access to a collection of analysis scenarios for summarizing data from transfected cell arrays (TCA), segmentation of CGH data, and microarray quality control and normalization. AVAILABILITY: RReportGenerator, a user manual and a collection of analysis scenarios are available under a GNU public license on http://www-bio3d-igbmc.u-strasbg.fr/~wraff


Subject(s)
Algorithms , Computer Graphics , Documentation/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Software , User-Computer Interface , Data Interpretation, Statistical
9.
BMC Bioinformatics ; 8: 62, 2007 Feb 23.
Article in English | MEDLINE | ID: mdl-17319945

ABSTRACT

BACKGROUND: The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies. RESULTS: An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria. CONCLUSION: The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues.


Subject(s)
Algorithms , Databases, Protein , Information Storage and Retrieval/methods , Proteins/chemistry , Proteins/metabolism , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Conserved Sequence , Database Management Systems , Molecular Sequence Data , Sequence Homology, Amino Acid , Structure-Activity Relationship
10.
Acta Crystallogr D Biol Crystallogr ; 59(Pt 12): 2094-103, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14646067

ABSTRACT

Structural refinement of proteins involves the minimization of a target function that combines X-ray data with a set of restraints enforcing stereochemistry and packing. Electrostatic interactions are not ordinarily included in the target function, partly because they cannot be calculated reliably without a description of dielectric screening by solvent in the crystal. With the recent development of accurate implicit solvent models to describe this screening, the question arises as to whether a more detailed target function including electrostatic and solvation terms can yield more accurate structures or somewhat different structures of equivalent accuracy. The Generalized Born (GB) model is one such model that describes the solvent as a dielectric continuum, taking into account its heterogeneous distribution within the crystal. It is used here for X-ray refinements of three protein structures with experimental diffraction data to 2.4, 2.9 and 3.2 A, respectively. In each case, a higher resolution structure is available for comparison. The new target function includes stereochemical restraints, van der Waals, Coulomb and solvation interactions, along with the usual X-ray pseudo-energy term, which employs the likelihood estimator of Pannu and Read. Multiple simulated-annealing refinements were performed in torsion-angle space with a conventional target function and the new GB target function, yielding ensembles of refined structures. The new target function yields structures of similar accuracy, as measured by the free R factor, map/model correlations and deviations from the high-resolution structures. About 10% of side-chain conformations differ between the two sets of refinements, in the sense that the two ensembles of conformations do not completely overlap. Over 75% of the differences correspond to surface side chains. For one of the proteins, the GB set has a greater dispersion, indicating that for this case the conventional target function overestimates the true precision. As GB parameterization continues to improve, we expect that this approach will become increasingly useful.


Subject(s)
Crystallography, X-Ray/methods , Proteins/chemistry , Aspartate-tRNA Ligase/chemistry , Computer Simulation , Histocompatibility Antigens Class I/chemistry , Hydroxymethyl and Formyl Transferases/chemistry , Models, Chemical , Models, Molecular , Protein Conformation , Solvents/chemistry , Static Electricity
11.
Nucleic Acids Res ; 31(13): 3829-32, 2003 Jul 01.
Article in English | MEDLINE | ID: mdl-12824430

ABSTRACT

PipeAlign is a protein family analysis tool integrating a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a high-quality, validated MACS (multiple alignment of complete sequences) in which sequences are clustered into potential functional subgroups. For the more experienced user, the PipeAlign server also provides numerous options to run only a part of the analysis, with the possibility to modify the default parameters of each software module. For example, the user can choose to enter an existing multiple sequence alignment for refinement, validation and subsequent clustering of the sequences. The aim is to provide an interactive workbench for the validation, integration and presentation of a protein family, not only at the sequence level, but also at the structural and functional levels. PipeAlign is available at http://igbmc.u-strasbg.fr/PipeAlign/.


Subject(s)
Proteins/classification , Sequence Analysis, Protein/methods , Software , Internet , Proteins/chemistry , Quality Control , Sequence Alignment , Software/standards , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...