Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 79
Filter
1.
Food Chem Toxicol ; 187: 114638, 2024 May.
Article in English | MEDLINE | ID: mdl-38582341

ABSTRACT

With a society increasingly demanding alternative protein food sources, new strategies for evaluating protein safety issues, such as allergenic potential, are needed. Large-scale and systemic studies on allergenic proteins are hindered by the limited and non-harmonized clinical information available for these substances in dedicated databases. A missing key information is that representing the symptomatology of the allergens, especially given in terms of standard vocabularies, that would allow connecting with other biomedical resources to carry out different studies related to human health. In this work, we have generated the first resource with a comprehensive annotation of allergens' symptomatology, using a text-mining approach that extracts significant co-mentions between these entities from the scientific literature (PubMed, ∼36 million abstracts). The method identifies statistically significant co-mentions between the textual descriptions of the two types of entities in the literature as indication of relationship. 1,180 clinical signs extracted from the Human Phenotype Ontology, the Medical Subject Heading terms of PubMed together with other allergen-specific symptoms, were linked to 1,036 unique allergens annotated in two main allergen-related public databases via 14,009 relationships. This novel resource, publicly available through an interactive web interface, could serve as a starting point for future manually curated compilation of allergen symptomatology.


Subject(s)
Allergens , Data Mining , Humans , Data Mining/methods , Databases, Factual , Proteins/metabolism
2.
Database (Oxford) ; 20242024 Apr 02.
Article in English | MEDLINE | ID: mdl-38564426

ABSTRACT

The CoMentG resource contains millions of relationships between terms of biomedical interest obtained from the scientific literature. At the core of the system is a methodology for detecting significant co-mentions of concepts in the entire PubMed corpus. That method was applied to nine sets of terms covering the most important classes of biomedical concepts: diseases, symptoms/clinical signs, molecular functions, biological processes, cellular compartments, anatomic parts, cell types, bacteria and chemical compounds. We obtained more than 7 million relationships between more than 74 000 terms, and many types of relationships were not available in any other resource. As the terms were obtained from widely used resources and ontologies, the relationships are given using the standard identifiers provided by them and hence can be linked to other data. A web interface allows users to browse these associations, searching for relationships for a set of terms of interests provided as input, such as between a disease and their associated symptoms, underlying molecular processes or affected tissues. The results are presented in an interactive interface where the user can explore the reported relationships in different ways and follow links to other resources. Database URL: https://csbg.cnb.csic.es/CoMentG/.


Subject(s)
Publications , PubMed , Databases, Factual
3.
Microorganisms ; 11(6)2023 May 31.
Article in English | MEDLINE | ID: mdl-37374967

ABSTRACT

Considering the ban on the use of antibiotics as growth stimulators in the livestock industry, the use of microbiota modulators appears to be an alternative solution to improve animal performance. This review aims to describe the effect of different families of modulators on the gastrointestinal microbiota of poultry, pigs and ruminants and their consequences on host physiology. To this end, 65, 32 and 4 controlled trials or systematic reviews were selected from PubMed for poultry, pigs and ruminants, respectively. Microorganisms and their derivatives were the most studied modulator family in poultry, while in pigs, the micronutrient family was the most investigated. With only four controlled trials selected for ruminants, it was difficult to conclude on the modulators of interest for this species. For some modulators, most studies showed a beneficial effect on both the phenotype and the microbiota. This was the case for probiotics and plants in poultry and minerals and probiotics in pigs. These modulators seem to be a good way for improving animal performance.

4.
Nucleic Acids Res ; 51(W1): W305-W309, 2023 07 05.
Article in English | MEDLINE | ID: mdl-37178003

ABSTRACT

MBROLE (Metabolites Biological Role) facilitates the biological interpretation of metabolomics experiments. It performs enrichment analysis of a set of chemical compounds through statistical analysis of annotations from several databases. The original MBROLE server was released in 2011 and, since then, different groups worldwide have used it to analyze metabolomics experiments from a variety of organisms. Here we present the latest version of the system, MBROLE3, accessible at http://csbg.cnb.csic.es/mbrole3. This new version contains updated annotations from previously included databases as well as a wide variety of new functional annotations, such as additional pathway databases and Gene Ontology terms. Of special relevance is the inclusion of a new category of annotations, 'indirect annotations', extracted from the scientific literature and from curated chemical-protein associations. The latter allows to analyze enriched annotations of the proteins known to interact with the set of chemical compounds of interest. Results are provided in the form of interactive tables, formatted data to download, and graphical plots.


Subject(s)
Metabolomics , Proteins , Software , Databases, Factual , Gene Ontology , Metabolomics/methods
5.
Genes (Basel) ; 14(4)2023 04 19.
Article in English | MEDLINE | ID: mdl-37107700

ABSTRACT

Scientific knowledge is being accumulated in the biomedical literature at an unprecedented pace. The most widely used database with biomedicine-related article abstracts, PubMed, currently contains more than 36 million entries. Users performing searches in this database for a subject of interest face thousands of entries (articles) that are difficult to process manually. In this work, we present an interactive tool for automatically digesting large sets of PubMed articles: PMIDigest (PubMed IDs digester). The system allows for classification/sorting of articles according to different criteria, including the type of article and different citation-related figures. It also calculates the distribution of MeSH (medical subject headings) terms for categories of interest, providing in a picture of the themes addressed in the set. These MeSH terms are highlighted in the article abstracts in different colors depending on the category. An interactive representation of the interarticle citation network is also presented in order to easily locate article "clusters" related to particular subjects, as well as their corresponding "hub" articles. In addition to PubMed articles, the system can also process a set of Scopus or Web of Science entries. In summary, with this system, the user can have a "bird's eye view" of a large set of articles and their main thematic tendencies and obtain additional information not evident in a plain list of abstracts.


Subject(s)
Bibliometrics , Humans , PubMed , Databases, Factual
6.
J Mol Biol ; 434(11): 167568, 2022 06 15.
Article in English | MEDLINE | ID: mdl-35662459

ABSTRACT

The mining of the massive amounts of biomedical information is hindered by the still scarce representation of these data using formal vocabularies and ontologies, which is necessary for cross-linking conceptual entities between different resources and, in general, representing the information in a computer-tractable way. Basic things such as retrieving a comprehensive list of associations between complex diseases and their reported symptoms or underlying biological processes, given in terms of formal identifiers, are not trivial and, in many cases, these have to be generated by manual curation or inferred/predicted from indirect evidences. In this work, using a text-mining approach based on detecting significant co-mentions in the scientific literature, we generated a resource with millions of relationships between thousands of terms representing diseases, symptoms, biological processes, molecular functions and cellular compartments, all given in terms of formal identifiers of these terms in the main resources dealing with them. We show some examples that highlight the differences between these relationships and those that are available in other resources. These relationships can be queried and inspected in an interactive web interface freely available at: https://sysbiol.cnb.csic.es/CoMent.


Subject(s)
Computational Biology , Data Mining
7.
Genes (Basel) ; 13(6)2022 06 17.
Article in English | MEDLINE | ID: mdl-35741843

ABSTRACT

Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete "diseases"; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.


Subject(s)
Phenotype , Humans
8.
Adv Protein Chem Struct Biol ; 130: 39-57, 2022.
Article in English | MEDLINE | ID: mdl-35534114

ABSTRACT

There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.


Subject(s)
Computational Biology , Proteins , Algorithms , Amino Acid Sequence , Biotechnology , Databases, Protein , Machine Learning , Proteins/chemistry
9.
Bioengineering (Basel) ; 8(12)2021 Dec 03.
Article in English | MEDLINE | ID: mdl-34940354

ABSTRACT

Specificity Determining Positions (SDPs) are protein sites responsible for functional specificity within a family of homologous proteins. These positions are extracted from a family's multiple sequence alignment and complement the fully conserved positions as predictors of functional sites. SDP analysis is now routinely used for locating these specificity-related sites in families of proteins of biomedical or biotechnological interest with the aim of mutating them to switch specificities or design new ones. There are many different approaches for detecting these positions in multiple sequence alignments. Nevertheless, existing methods report the potential SDP positions but they do not provide any clue on the physicochemical basis behind the functional specificity, which has to be inferred a-posteriori by manually inspecting these positions in the alignment. In this work, a new methodology is presented that, concomitantly with the detection of the SDPs, automatically provides information on the amino-acid physicochemical properties more related to the change in specificity. This new method is applied to two different multiple sequence alignments of homologous of the well-studied RasH protein representing different cases of functional specificity and the results discussed in detail.

10.
BMC Bioinformatics ; 22(1): 320, 2021 Jun 12.
Article in English | MEDLINE | ID: mdl-34118870

ABSTRACT

BACKGROUND: Assignment of chemical compounds to biological pathways is a crucial step to understand the relationship between the chemical repertory of an organism and its biology. Protein sequence profiles are very successful in capturing the main structural and functional features of a protein family, and can be used to assign new members to it based on matching of their sequences against these profiles. In this work, we extend this idea to chemical compounds, constructing a profile-inspired model for a set of related metabolites (those in the same biological pathway), based on a fragment-based vectorial representation of their chemical structures. RESULTS: We use this representation to predict the biological pathway of a chemical compound with good overall accuracy (AUC 0.74-0.90 depending on the database tested), and analyzed some factors that affect performance. The approach, which is compared with equivalent methods, can in addition detect those molecular fragments characteristic of a pathway. CONCLUSIONS: The method is available as a graphical interactive web server http://csbg.cnb.csic.es/iFragMent .


Subject(s)
Proteins , Software , Amino Acid Sequence , Databases, Factual , Internet
11.
Hum Genet ; 140(3): 457-475, 2021 Mar.
Article in English | MEDLINE | ID: mdl-32778951

ABSTRACT

Copy number variation (CNV) related disorders tend to show complex phenotypic profiles that do not match known diseases. This makes it difficult to ascertain their underlying molecular basis. A potential solution is to compare the affected genomic regions for multiple patients that share a pathological phenotype, looking for commonalities. Here, we present a novel approach to associate phenotypes with functional systems, in terms of GO categories and KEGG and Reactome pathways, based on patient data. The approach uses genomic and phenomic data from the same patients, finding shared genomic regions between patients with similar phenotypes. These regions are mapped to genes to find associated functional systems. We applied the approach to analyse patients in the DECIPHER database with de novo CNVs, finding functional systems associated with most phenotypes, often due to mutations affecting related genes in the same genomic region. Manual inspection of the ten top-scoring phenotypes found multiple FunSys connections supported by the previous studies for seven of them. The workflow also produces reports focussed on the genes and FunSys connected to the different phenotypes, alongside patient-specific reports, which give details of the associated genes and FunSys for each individual in the cohort. These can be run in "confidential" mode, preserving patient confidentiality. The workflow presented here can be used to associate phenotypes with functional systems using data at the level of a whole cohort of patients, identifying important connections that could not be found when considering them individually. The full workflow is available for download, enabling it to be run on any patient cohort for which phenotypic and CNV data are available.


Subject(s)
DNA Copy Number Variations , Genetic Predisposition to Disease , Genotype , Phenotype , Cohort Studies , Databases, Genetic , Humans
12.
Bioinformatics ; 37(8): 1076-1082, 2021 05 23.
Article in English | MEDLINE | ID: mdl-33135068

ABSTRACT

MOTIVATION: Predicting the residues controlling a protein's interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. RESULTS: In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent 'unsupervised' method that does not use interactome information. AVAILABILITY AND IMPLEMENTATION: http://csbg.cnb.csic.es/pazos/Xdet/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteins , Software , Humans , Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein
13.
Biol Methods Protoc ; 5(1): bpaa025, 2020.
Article in English | MEDLINE | ID: mdl-33376807

ABSTRACT

The environmental fate of many functional molecules that are produced on a large scale as precursors or as additives to specialty goods (plastics, fibers, construction materials, etc.), let alone those synthesized by the pharmaceutical industry, is generally unknown. Assessing their environmental fate is crucial when taking decisions on the manufacturing, handling, usage, and release of these substances, as is the evaluation of their toxicity in humans and other higher organisms. While this data are often hard to come by, the experimental data already available on the biodegradability and toxicity of many unusual compounds (including genuinely xenobiotic molecules) make it possible to develop machine learning systems to predict these features. As such, we have created a predictor of the "risk" associated with the use and release of any chemical. This new system merges computational methods to predict biodegradability with others that assess biological toxicity. The combined platform, named BiodegPred (https://sysbiol.cnb.csic.es/BiodegPred/), provides an informed prognosis of the chance a given molecule can eventually be catabolized in the biosphere, as well as of its eventual toxicity, all available through a simple web interface. While the platform described does not give much information about specific degradation kinetics or particular biodegradation pathways, BiodegPred has been instrumental in anticipating the probable behavior of a large number of new molecules (e.g. antiviral compounds) for which no biodegradation data previously existed.

14.
PLoS Genet ; 16(10): e1009054, 2020 10.
Article in English | MEDLINE | ID: mdl-33001999

ABSTRACT

Genetic and molecular analysis of rare disease is made difficult by the small numbers of affected patients. Phenotypic comorbidity analysis can help rectify this by combining information from individuals with similar phenotypes and looking for overlap in terms of shared genes and underlying functional systems. However, few studies have combined comorbidity analysis with genomic data. We present a computational approach that connects patient phenotypes based on phenotypic co-occurence and uses genomic information related to the patient mutations to assign genes to the phenotypes, which are used to detect enriched functional systems. These phenotypes are clustered using network analysis to obtain functionally coherent phenotype clusters. We applied the approach to the DECIPHER database, containing phenotypic and genomic information for thousands of patients with heterogeneous rare disorders and copy number variants. Validity was demonstrated through overlap with known diseases, co-mention within the biomedical literature, semantic similarity measures, and patient cluster membership. These connected pairs formed multiple phenotype clusters, showing functional coherence, and mapped to genes and systems involved in similar pathological processes. Examples include claudin genes from the 22q11 genomic region associated with a cluster of phenotypes related to DiGeorge syndrome and genes related to the GO term anterior/posterior pattern specification associated with abnormal development. The clusters generated can help with the diagnosis of rare diseases, by suggesting additional phenotypes for a given patient and potential underlying functional systems. Other tools to find causal genes based on phenotype were also investigated. The approach has been implemented as a workflow, named PhenCo, which can be adapted to any set of patients for which phenomic and genomic data is available. Full details of the analysis, including the clusters formed, their constituent functional systems and underlying genes are given. Code to implement the workflow is available from GitHub.


Subject(s)
Comorbidity , Genetic Predisposition to Disease , Genomics , Rare Diseases/genetics , DNA Copy Number Variations/genetics , Databases, Genetic , Genetic Association Studies , Genome, Human/genetics , Genotype , Humans , Mutation/genetics , Phenotype , Rare Diseases/diagnosis , Rare Diseases/pathology
15.
Bioengineering (Basel) ; 6(3)2019 Jul 25.
Article in English | MEDLINE | ID: mdl-31349743

ABSTRACT

Computational tools are essential in the process of designing a CRISPR/Cas experiment for the targeted modification of an organism's genome. Among other functionalities, these tools facilitate the design of a guide-RNA (gRNA) for a given nuclease that maximizes its binding to the intended genomic site, while avoiding binding to undesired sites with similar sequences in the genome of the organism of interest (off-targets). Due to the popularity of this methodology and the rapid pace at which it evolves and changes, new computational tools show up constantly. This rapid turnover, together with the intrinsic high death-rate of bioinformatics tools, mean that many of the published tools become unavailable at some point. Consequently, the traditional ways to inform the community about the landscape of available tools, i.e., reviews in the scientific literature, are not adequate for this fast-moving field. To overcome these limitations, we have developed "WeReview: CRISPR Tools," a live, on-line, user-updatable repository of computational tools to assist researchers in designing CRISPR/Cas experiments. In its web site users can find an updated comprehensive list of tools and search for those fulfilling their specific needs, as well as proposing modifications to the data associated with the tools or the incorporation of new ones.

16.
Bioinformatics ; 35(18): 3482-3483, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30844057

ABSTRACT

MOTIVATION: The results of some experimental and computational techniques are given in terms of large sets of organisms, especially prokaryotic. While their distinctive features can provide useful data regarding specific phenomenon, there are no automated tools for extracting them. RESULTS: We present here the Bacterial Feature Finder web server, a tool to automatically interrogate sets of prokaryotic organisms provided by the user to evaluate their specific biological features. At the core of the system is a searchable database of qualitative and quantitative features compiled for more than 23 000 prokaryotic organisms. Both the input set of organisms and the background set used to calculate the enriched features can be directly provided by the user, or they can be obtained by searching the database. The results are presented via an interactive graphical interface, with links to external resources. AVAILABILITY AND IMPLEMENTATION: The web server is freely available at http://csbg.cnb.csic.es/BaFF. It has been tested in the main web browsers and does not require any especial plug-ins or additional software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Internet , Software , Computational Biology , Databases, Factual , Prokaryotic Cells
17.
Brief Bioinform ; 20(4): 1329-1336, 2019 07 19.
Article in English | MEDLINE | ID: mdl-29351590

ABSTRACT

Daily work in molecular biology presently depends on a large number of computational tools. An in-depth, large-scale study of that 'ecosystem' of Web tools, its characteristics, interconnectivity, patterns of usage/citation, temporal evolution and rate of decay is crucial for understanding the forces that shape it and for informing initiatives aimed at its funding, long-term maintenance and improvement. In particular, the long-term maintenance of these tools is compromised because of their specific development model. Hundreds of published studies become irreproducible de facto, as the software tools used to conduct them become unavailable. In this study, we present a large-scale survey of >5400 publications describing Web servers within the two main bibliographic resources for disseminating new software developments in molecular biology. For all these servers, we studied their citation patterns, the subjects they address, their citation networks and the temporal evolution of these factors. We also analysed how these factors affect the availability of these servers (whether they are alive). Our results show that this ecosystem of tools is highly interconnected and adapts to the 'trendy' subjects in every moment. The servers present characteristic temporal patterns of citation/usage, and there is a worrying rate of server 'death', which is influenced by factors such as the server popularity and the institutions that hosts it. These results can inform initiatives aimed at the long-term maintenance of these resources.


Subject(s)
Molecular Biology/statistics & numerical data , Software , Computational Biology/methods , Computational Biology/trends , Internet , Molecular Biology/trends , Periodicals as Topic/statistics & numerical data , Software/trends
18.
Biol Methods Protoc ; 4(1): bpz012, 2019.
Article in English | MEDLINE | ID: mdl-32395629

ABSTRACT

Due to the large interdependence between the molecular components of living systems, many phenomena, including those related to pathologies, cannot be explained in terms of a single gene or a small number of genes. Molecular networks, representing different types of relationships between molecular entities, embody these large sets of interdependences in a framework that allow their mining from a systemic point of view to obtain information. These networks, often generated from high-throughput omics datasets, are used to study the complex phenomena of human pathologies from a systemic point of view. Complementing the reductionist approach of molecular biology, based on the detailed study of a small number of genes, systemic approaches to human diseases consider that these are better reflected in large and intricate networks of relationships between genes. These networks, and not the single genes, provide both better markers for diagnosing diseases and targets for treating them. Network approaches are being used to gain insight into the molecular basis of complex diseases and interpret the large datasets associated with them, such as genomic variants. Network formalism is also suitable for integrating large, heterogeneous and multilevel datasets associated with diseases from the molecular level to organismal and epidemiological scales. Many of these approaches are available to nonexpert users through standard software packages.

19.
BMC Genomics ; 19(1): 847, 2018 Nov 28.
Article in English | MEDLINE | ID: mdl-30486775

ABSTRACT

BACKGROUND: Epigenetic phenomena are crucial for explaining the phenotypic plasticity seen in the cells of different tissues, developmental stages and diseases, all holding the same DNA sequence. As technology is allowing to retrieve epigenetic information in a genome-wide fashion, massive epigenomic datasets are being accumulated in public repositories. New approaches are required to mine those data to extract useful knowledge. We present here an automatic approach for detecting genomic regions with epigenetic variation patterns across samples related to a grouping of these samples, as a way of detecting regions functionally associated to the phenomenon behind the classification. RESULTS: We show that the regions automatically detected by the method in the whole human genome associated to three different classifications of a set of epigenomes (cancer vs. healthy, brain vs. other organs, and fetal vs. adult tissues) are enriched in genes associated to these processes. CONCLUSIONS: The method is fully automatic and can exhaustively scan the whole human genome at any resolution using large collections of epigenomes as input, although it also produces good results with small datasets. Consequently, it will be valuable for obtaining functional information from the incoming epigenomic information as it continues to accumulate.


Subject(s)
Computational Biology/methods , Epigenesis, Genetic , Genome, Human , Automation , Brain/metabolism , Databases, Genetic , Fetus/metabolism , Humans , Neoplasms/genetics
20.
BMC Bioinformatics ; 19(1): 67, 2018 02 27.
Article in English | MEDLINE | ID: mdl-29482506

ABSTRACT

BACKGROUND: The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS: In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS: These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.


Subject(s)
Amino Acids/chemistry , Proteins/chemistry , Sequence Analysis, Protein/methods , Algorithms , Amino Acid Sequence , Binding Sites , Conserved Sequence , Molecular Sequence Annotation , Sequence Alignment , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...