Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 172: 108288, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38503094

ABSTRACT

Data sharing among different institutions represents one of the major challenges in developing distributed machine learning approaches, especially when data is sensitive, such as in medical applications. Federated learning is a possible solution, but requires fast communications and flawless security. Here, we propose SYNDSURV (SYNthetic Distributed SURVival), an alternative approach that simplifies the current state-of-the-art paradigm by allowing different centres to generate local simulated instances from real data and then gather them into a centralised hub, where an Artificial Intelligence (AI) model can learn in a standard way. The main advantage of this procedure is that it is model-agnostic, therefore prediction models can be directly applied in distributed applications without requiring particular adaptations as the current federated approaches do. To show the validity of our approach for medical applications, we tested it on a survival analysis task, offering a viable alternative to train AI models on distributed data. While federated learning has been mainly optimised for gradient-based approaches so far, our framework works with any predictive method, proving to be a comparable way of performing distributed learning without being too demanding towards each participating institute in terms of infrastructural requirements.


Subject(s)
Artificial Intelligence , Machine Learning , Survival Analysis
2.
Genes (Basel) ; 14(12)2023 12 17.
Article in English | MEDLINE | ID: mdl-38137050

ABSTRACT

Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.


Subject(s)
Computational Biology , Point Mutation , Computational Biology/methods , Proteins/metabolism , Protein Stability , Amino Acid Sequence
3.
Nucleic Acids Res ; 47(W1): W136-W141, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31114899

ABSTRACT

As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.


Subject(s)
Genome/genetics , Genomics , Polymorphism, Single Nucleotide/genetics , Software , Algorithms , Animals , Dogs , Genetic Variation , Genome-Wide Association Study , Genotype , Internet
4.
BMC Bioinformatics ; 14 Suppl 3: S4, 2013.
Article in English | MEDLINE | ID: mdl-23514411

ABSTRACT

BACKGROUND: In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowledge. The reference database for protein sequences is UniProtKB divided into two sections, UniProtKB/TrEMBL which is automatically annotated and not reviewed and UniProtKB/Swiss-Prot which is manually annotated and reviewed. The annotation process is essentially based on sequence similarity search. The question therefore arises as to which extent annotation based on transfer by inheritance is valuable and specifically if it is possible to statistically validate inherited features when little homology exists among the target sequence and its template(s). RESULTS: In this paper we address the problem of annotating protein sequences in a statistically validated manner considering as a reference annotation resource UniProtKB. The test case is the set of 48,298 proteins recently released by the Critical Assessment of Function Annotations (CAFA) organization. We show that we can transfer after validation, Gene Ontology (GO) terms of the three main categories and Pfam domains to about 68% and 72% of the sequences, respectively. This is possible after alignment of the CAFA sequences towards BAR+, our annotation resource that allows discriminating among statistically validated and not statistically validated annotation. By comparing with a direct UniProtKB annotation, we find that besides validating annotation of some 78% of the CAFA set, we assign new and statistically validated annotation to 14.8% of the sequences and find new structural templates for about 25% of the chains, half of which share less than 30% sequence identity to the corresponding template/s. CONCLUSION: Inheritance of annotation by transfer generally requires a careful selection of the identity value among the target and the template in order to transfer structural and/or functional features. Here we prove that even distantly remote homologs can be safely endowed with structural templates and GO and/or Pfam terms provided that annotation is done within clusters collecting cluster-related protein sequences and where a statistical validation of the shared structural and functional features is possible.


Subject(s)
Molecular Sequence Annotation , Sequence Analysis, Protein , Cluster Analysis , Data Interpretation, Statistical , Databases, Protein , Genomics , Proteins/genetics , Proteins/physiology , Vocabulary, Controlled
5.
Nucleic Acids Res ; 39(Web Server issue): W197-202, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21622657

ABSTRACT

We introduce BAR-PLUS (BAR(+)), a web server for functional and structural annotation of protein sequences. BAR(+) is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters. In this version, the method takes advantage of a large-scale pairwise sequence comparison of 13,495,736 protein chains also including 988 complete proteomes. Available sequence annotation is derived from UniProtKB, GO, Pfam and PDB. When PDB templates are present within a cluster (with or without their SCOP classification), profile Hidden Markov Models (HMMs) are computed on the basis of sequence to structure alignment and are cluster-associated (Cluster-HMM). Therefrom, a library of 10,858 HMMs is made available for aligning even distantly related sequences for structural modelling. The server also provides pairwise query sequence-structural target alignments computed from the correspondent Cluster-HMM. BAR(+) in its present version allows three main categories of annotation: PDB [with or without SCOP (*)] and GO and/or Pfam; PDB (*) without GO and/or Pfam; GO and/or Pfam without PDB (*) and no annotation. Each category can further comprise clusters where GO and Pfam functional annotations are or are not statistically significant. BAR(+) is available at http://bar.biocomp.unibo.it/bar2.0.


Subject(s)
Molecular Sequence Annotation , Protein Conformation , Sequence Analysis, Protein , Software , Cluster Analysis , Internet , Proteins/physiology , Sequence Alignment
6.
Nucleic Acids Res ; 39(Database issue): D80-5, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21051348

ABSTRACT

Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256,939 protein variants from 17,191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/.


Subject(s)
Alternative Splicing , Databases, Genetic , Proteins/chemistry , Proteins/genetics , Exons , Genetic Variation , Humans , Protein Isoforms/chemistry , Protein Isoforms/genetics , RNA, Messenger/chemistry , Sequence Analysis, Protein , User-Computer Interface
7.
BMC Bioinformatics ; 9 Suppl 2: S6, 2008 Mar 26.
Article in English | MEDLINE | ID: mdl-18387208

ABSTRACT

BACKGROUND: A basic question of protein structural studies is to which extent mutations affect the stability. This question may be addressed starting from sequence and/or from structure. In proteomics and genomics studies prediction of protein stability free energy change (DeltaDeltaG) upon single point mutation may also help the annotation process. The experimental DeltaDeltaG values are affected by uncertainty as measured by standard deviations. Most of the DeltaDeltaG values are nearly zero (about 32% of the DeltaDeltaG data set ranges from -0.5 to 0.5 kcal/mole) and both the value and sign of DeltaDeltaG may be either positive or negative for the same mutation blurring the relationship among mutations and expected DeltaDeltaG value. In order to overcome this problem we describe a new predictor that discriminates between 3 mutation classes: destabilizing mutations (DeltaDeltaG<-1.0 kcal/mol), stabilizing mutations (DeltaDeltaG>1.0 kcal/mole) and neutral mutations (-1.0

Subject(s)
Models, Chemical , Models, Molecular , Mutagenesis, Site-Directed/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acid Sequence , Amino Acid Substitution , Computer Simulation , Drug Stability , Molecular Sequence Data , Structure-Activity Relationship
8.
Proc Natl Acad Sci U S A ; 104(13): 5495-500, 2007 Mar 27.
Article in English | MEDLINE | ID: mdl-17372197

ABSTRACT

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.


Subject(s)
Alternative Splicing , RNA Precursors , Databases, Genetic , Gene Expression Regulation , Genome, Human , Humans , Internet , Models, Molecular , Protein Conformation , Protein Isoforms , Protein Sorting Signals , Protein Structure, Tertiary , Proteins/chemistry , RNA Splicing
9.
Brief Bioinform ; 8(2): 78-87, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17003074

ABSTRACT

The detection of remote homolog pairs of proteins using computational methods is a pivotal problem in structural bioinformatics, aiming to compute protein folds on the basis of information in the database of known structures. In the last 25 years, several methods have been developed to tackle this problem, based on different approaches including sequence-sequence alignments and/or structure comparison. In this article, we will briefly discuss When, Why, Where and How (WWWH) to perform remote homology search, reviewing some of the most widely adopted computational approaches. The specific aim is highlighting the basic criteria implemented by different research groups and commenting on the status of the art as well as on still-open questions.


Subject(s)
Algorithms , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Conserved Sequence , Molecular Sequence Data , Review Literature as Topic
10.
Nucleic Acids Res ; 34(Web Server issue): W169-72, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16844984

ABSTRACT

The annotation efforts of the BIOSAPIENS European Network of Excellence have generated several distributed annotation systems (DAS) with the aim of integrating Bioinformatics resources and annotating metazoan genomes (http://www.biosapiens.info). In this context, the PONGO DAS server (http://pongo.biocomp.unibo.it) provides the annotation on predictive basis for the all-alpha membrane proteins in the human genome, not only through DAS queries, but also directly using a simple web interface. In order to produce a more comprehensive analysis of the sequence at hand, this annotation is carried out with four selected and high scoring predictors: TMHMM2.0, MEMSAT, PRODIV and ENSEMBLE1.0. The stored and pre-computed predictions for the human proteins can be searched and displayed in a graphical view. However the web service allows the prediction of the topology of any kind of putative membrane proteins, regardless of the organism and more importantly with the same sequence profile for a given sequence when required. Here we present a new web server that incorporates the state-of-the-art topology predictors in a single framework, so that putative users can interactively compare and evaluate four predictions simultaneously for a given sequence. Together with the predicted topology, the server also displays a signal peptide prediction determined with SPEP. The PONGO web server is available at http://pongo.biocomp.unibo.it/pongo.


Subject(s)
Membrane Proteins/chemistry , Software , Humans , Internet , Protein Conformation , Sequence Analysis, Protein
11.
Nucleic Acids Res ; 33(Web Server issue): W198-201, 2005 Jul 01.
Article in English | MEDLINE | ID: mdl-15980454

ABSTRACT

TRAMPLE (http://gpcr.biocomp.unibo.it/biodec/) is a web application server dedicated to the detection and the annotation of transmembrane protein sequences. TRAMPLE includes different state-of-the-art algorithms for the prediction of signal peptides, transmembrane segments (both beta-strands and alpha-helices), secondary structure and fast fold recognition. TRAMPLE also includes a complete content management system to manage the results of the predictions. Each user of the server has his/her own workplace, where the data can be stored, organized, accessed and annotated with documents through a simple web-based interface. In this manner, TRAMPLE significantly improves usability with respect to other more traditional web servers.


Subject(s)
Membrane Proteins/chemistry , Sequence Analysis, Protein/methods , Software , Algorithms , Computer Security , Database Management Systems , Internet , Protein Folding , Protein Sorting Signals , Protein Structure, Secondary
12.
J Phys Chem B ; 109(8): 3586-93, 2005 Mar 03.
Article in English | MEDLINE | ID: mdl-16851397

ABSTRACT

Boltzmann-like distributions appear in many properties and energy-related quantities of proteins. A few examples are hydrophobicity, various types of side-chain/side-chain interactions, proline isomerization, hydrogen bonds, internal cavities, interactions at the level of specific atom types, and the propensity of the phi/'phi' ratio. Here, we conjecture that the Boltzmann hypothesis also holds for the intra-residue energy distribution. We confirm the conjecture by calculating the energies of 41,672 residues of the structures of highly resolved proteins, where at least 12 out of 20 naturally occurring amino acids follow Boltzmann's law. We further examine the entire set of all residue energies and find that the convolution of the individual distributions gives a Poisson function, which is followed by approximately 50% of individual proteins' structures.


Subject(s)
Chemistry, Physical/methods , Proteins/chemistry , Amino Acids/chemistry , Hydrogen Bonding , Models, Chemical , Models, Statistical , Poisson Distribution , Probability , Proline/chemistry , Protein Conformation
13.
Proteins ; 54(2): 351-60, 2004 Feb 01.
Article in English | MEDLINE | ID: mdl-14696197

ABSTRACT

Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program.


Subject(s)
Computational Biology/methods , Entropy , Proteins/chemistry , Sequence Alignment/methods , Sequence Homology, Amino Acid , Algorithms , Conserved Sequence , Databases, Protein , Models, Molecular , Protein Folding , Sensitivity and Specificity , Software
14.
Braz. j. microbiol ; 34(1): 61-65, Jan.-Apr. 2003. ilus, tab, graf
Article in English | LILACS | ID: lil-344567

ABSTRACT

The purpose of this study was to assess the myceliation rate, mycelial vigor and "estimated biomass" of Lentinula edodes (Berk.) Pegler, grown on a sugarcane bagasse substrate enriched with rice bran and sugarcane molasses for spawn production. The proportions of rice bran used were 0, 10, 15, 20, 25, 30 and 40 percent (dry weight/dry weight of bagasse) and the sugarcane molasses concentrations tested were 0, 10, 20, 30, 40, 50 and 60 g/kg (dry weight/dry weight of bagasse plus rice bran). The myceliation rate was decreased by the addition of the higher quantities of rice bran. The 25 and 30 percent rice bran proportions induced the highest stimulation of mycelial vigor. The addition of sugarcane molasses did not change myceliation rate or mycelial vigor. The "estimated biomass" values were similar when intermediate rice bran proportions were used and for all sugarcane molasses concentrations. Based on response surface obtained for the "estimated biomass" data, higher values were obtained with substrates containing 20 to 25 percent rice bran combined with 10 to 30 g sugarcane molasses, although the latter supplement was not considered to stimulate L. edodes growth.


Subject(s)
In Vitro Techniques , Molasses/analysis , Mycelium/growth & development , Mycelium/isolation & purification , Shiitake Mushrooms , Saccharum/growth & development , Saccharum/microbiology , Biomass , Methods
15.
Braz. j. microbiol ; 34(1): 66-71, Jan.-Apr. 2003. graf
Article in English | LILACS | ID: lil-344568

ABSTRACT

This investigation was performed to evaluate the biological efficiency (BE), mean mushroom weight (MMW), mean number of mushroom (MNM) and mushroom quality of Shiitake [ Lentinula edodes (Berk.) Pegler] when grown on a sterilized substrate composed by sugarcane bagasse enriched with rice bran and sugarcane molasses. The proportions of rice bran were 0, 15, 20, 25 and 30 percent (dry weight/dry weight of bagasse); and the concentrations of sugarcane molasses were 0, 30 and 60 g/kg (dry weight/dry weight of bagasse plus rice bran). Four flushes were obtained during the production cycle, providing 3 accumulated productions which were used for production analysis. The substrate supplemented with 25 and 30 percent rice bran yielded the highest BE (98.42 and 99.84 percent, respectively, about 230 days after spawning) and MNM and initially produced a lower MMW than the substrates supplemented with 15 and 20 percent rice bran. Any amount of rice bran added to the sugarcane bagasse improved mushroom quality, with the best production of marketable mushrooms obtained by the addition of 15 percent rice bran. The largest amount of sugarcane molasses (60 g/kg) increased BE (90.3 and 23.6 percent, on first and second accumulated productions, respectively) and MNM and no quantity affected mushroom quality.


Subject(s)
In Vitro Techniques , Molasses/analysis , Shiitake Mushrooms , Saccharum/growth & development , Methods , Substrates for Biological Treatment
16.
Bioinformatics ; 19(4): 500-5, 2003 Mar 01.
Article in English | MEDLINE | ID: mdl-12611805

ABSTRACT

MOTIVATION: A problem in predicting the topography of transmembrane proteins is the optimal localization of the transmembrane segments along the protein sequences, provided that each residue is associated with a propensity of being or not being included in the transmembrane protein region. From previous work it is known that post-processing of propensity signals with suited algorithms can greatly improve the quality and the accuracy of the predictions. In this paper we describe a general dynamic programming-like algorithm (MaxSubSeq, Maximal SubSequence) specifically designed to optimize the number and length of segments with constrained length in a given protein sequence. Previous application of our algorithm, has proved its effectiveness in the optimization task of both neural network and hidden Markov models output, and in this paper we present the detailed description of MaxSubSeq. RESULTS: We describe the application of MaxSubSeq to the location of both helical and beta strand transmembrane segments, optimizing the outputs derived with different predictive algorithms. For all-alpha transmembrane proteins we use both the standard Kyte-Doolittle (KD) hydropathy scale and the TMHMM predictor (http://www.cbs.dtu.dk/). Using a set of 188 well characterized membrane proteins, MaxSubSeq nearly doubles the correct location of transmembrane segments as compared to the standard KD hydrophobicity plot, reaching 51% accuracy. If MaxSubSeq is used to optimize the TMHMM method the accuracy increases from 68 to 72%. When used to regularize the prediction of beta transmembrane strands, obtained using both a neural network and a HMM based predictors, MaxSubSeq increases the accuracy per protein up to 72 and 73% respectively. AVAILABILITY: The program is available upon request to the authors, or it is accessible through our web server (http://gpcr.biocomp.unibo.it/predictors/)


Subject(s)
Algorithms , Membrane Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Animals , Calcium-Transporting ATPases/chemistry , Cattle , Electron Transport Complex IV/chemistry , Membrane Proteins/genetics , Molecular Conformation , Molecular Sequence Data , Porins/chemistry , Quality Control , Rabbits , Reproducibility of Results , Salmonella typhimurium/chemistry , Sarcoplasmic Reticulum/chemistry , Sensitivity and Specificity , Sequence Homology, Amino Acid
17.
J Org Chem ; 64(22): 8122-8126, 1999 Oct 29.
Article in English | MEDLINE | ID: mdl-11674725

ABSTRACT

The triethylamine-catalyzed addition reactions of benzenethiol to 2-cyclopenten-1-one and its 2- and 3-methyl derivatives have been found to be appreciably reversible in chloroform solution. Rates and equilibria have been carefully measured at 25 degrees C in order to assess the negative influence on addition exerted by methyl groups substituted on the carbon-carbon double bond. 2-Methyl-2-cyclopenten-1-one has been found to react with benzenethiol under kinetic control to give the cis adduct as the sole detectable product in a highly stereoselective anti addition process. However, on prolonged reaction times the system slowly evolved toward a new state of equilibrium in which the more stable trans adduct, derived from a syn addition mode, was the predominant isomer.

SELECTION OF CITATIONS
SEARCH DETAIL
...