Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Sci Data ; 8(1): 91, 2021 03 25.
Article in English | MEDLINE | ID: mdl-33767203

ABSTRACT

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.


Subject(s)
Data Mining/methods , Organic Chemicals/classification , Pharmaceutical Preparations/classification , Software , Terminology as Topic , PubMed
2.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32743661

ABSTRACT

Glial cells are phenotypically heterogeneous non-neuronal components of the central and peripheral nervous systems. These cells are endowed with diverse functions and molecular machineries to detect and regulate neuronal or their own activities by various secreted mediators, such as proteinaceous factors. In particular, glia-secreted proteins form a basis of a complex network of glia-neuron or glia-glia interactions in health and diseases. In recent years, the analysis and profiling of glial secretomes have raised new expectations for the diagnosis and treatment of neurological disorders due to the vital role of glia in numerous physiological or pathological processes of the nervous system. However, there is no online database of glia-secreted proteins available to facilitate glial research. Here, we developed a user-friendly 'Gliome' database (available at www.gliome.org), a web-based tool to access and analyze glia-secreted proteins. The database provides a vast collection of information on 3293 proteins that are released from glia of multiple species and have been reported to have differential functions under diverse experimental conditions. It contains a web-based interface with the following four key features regarding glia-secreted proteins: (i) fundamental information, such as signal peptide, SecretomeP value, functions and Gene Ontology category; (ii) differential expression patterns under distinct experimental conditions; (iii) disease association; and (iv) interacting proteins. In conclusion, the Gliome database is a comprehensive web-based tool to access and analyze glia-secretome data obtained from diverse experimental settings, whereby it may facilitate the integration of bioinformatics into glial research.


Subject(s)
Databases, Protein , Neuroglia/metabolism , Proteins , Animals , Humans , Internet , Proteins/analysis , Proteins/chemistry , Proteins/metabolism , Software
3.
Nucleic Acids Res ; 48(W1): W5-W11, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32383756

ABSTRACT

Manually annotated data is key to developing text-mining and information-extraction algorithms. However, human annotation requires considerable time, effort and expertise. Given the rapid growth of biomedical literature, it is paramount to build tools that facilitate speed and maintain expert quality. While existing text annotation tools may provide user-friendly interfaces to domain experts, limited support is available for figure display, project management, and multi-user team annotation. In response, we developed TeamTat (https://www.teamtat.org), a web-based annotation tool (local setup available), equipped to manage team annotation projects engagingly and efficiently. TeamTat is a novel tool for managing multi-user, multi-label document annotation, reflecting the entire production life cycle. Project managers can specify annotation schema for entities and relations and select annotator(s) and distribute documents anonymously to prevent bias. Document input format can be plain text, PDF or BioC (uploaded locally or automatically retrieved from PubMed/PMC), and output format is BioC with inline annotations. TeamTat displays figures from the full text for the annotator's convenience. Multiple users can work on the same document independently in their workspaces, and the team manager can track task completion. TeamTat provides corpus quality assessment via inter-annotator agreement statistics, and a user-friendly interface convenient for annotation review and inter-annotator disagreement resolution to improve corpus quality.


Subject(s)
Data Mining/methods , Software , Cooperative Behavior
4.
Nucleic Acids Res ; 46(W1): W523-W529, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29788413

ABSTRACT

Recently, advanced text-mining techniques have been shown to speed up manual data curation by providing human annotators with automated pre-annotations generated by rules or machine learning models. Due to the limited training data available, however, current annotation systems primarily focus only on common concept types such as genes or diseases. To support annotating a wide variety of biological concepts with or without pre-existing training data, we developed ezTag, a web-based annotation tool that allows curators to perform annotation and provide training data with humans in the loop. ezTag supports both abstracts in PubMed and full-text articles in PubMed Central. It also provides lexicon-based concept tagging as well as the state-of-the-art pre-trained taggers such as TaggerOne, GNormPlus and tmVar. ezTag is freely available at http://eztag.bioqrator.org.


Subject(s)
Data Curation/methods , Data Mining/methods , Simulation Training/methods , User-Computer Interface , Humans , Internet , PubMed
5.
Article in English | MEDLINE | ID: mdl-27589962

ABSTRACT

BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.


Subject(s)
Data Curation/methods , Data Mining/methods , Electronic Data Processing/methods , Information Dissemination/methods
6.
Article in English | MEDLINE | ID: mdl-27515823

ABSTRACT

BioC is an XML-based format designed to provide interoperability for text mining tools and manual curation results. A challenge of BioC as a standard format is to align annotations from multiple systems. Ideally, this should not be a major problem if users follow guidelines given by BioC key files. Nevertheless, the misalignment between text and annotations happens quite often because different systems tend to use different software development environments, e.g. ASCII vs. Unicode. We first implemented the BioC Viewer to assist BioGRID curators as a part of the BioCreative V BioC track (Collaborative Biocurator Assistant Task). For the BioC track, the BioC Viewer helped curate protein-protein interaction and genetic interaction pairs appearing in full-text articles. Here, we describe the BioC Viewer itself as well as improvements made to the BioC Viewer since the BioCreative V Workshop to address the misalignment issue of BioC annotations. While uploading BioC files, a BioC merge process is offered when there are files from the same full-text article. If there is a mismatch between an annotated offset and text, the BioC Viewer adjusts the offset to correctly align with the text. The BioC Viewer has a user-friendly interface, where most operations can be performed within a few mouse clicks. The feedback from BioGRID curators has been positive for the web interface, particularly for its usability and learnability.Database URL: http://viewer.bioqrator.org.


Subject(s)
Data Curation/methods , Data Mining/methods , Internet , User-Computer Interface
7.
Article in English | MEDLINE | ID: mdl-25052701

ABSTRACT

The time-consuming nature of manual curation and the rapid growth of biomedical literature severely limit the number of articles that database curators can scrutinize and annotate. Hence, semi-automatic tools can be a valid support to increase annotation throughput. Although a handful of curation assistant tools are already available, to date, little has been done to formally evaluate their benefit to biocuration. Moreover, most curation tools are designed for specific problems. Thus, it is not easy to apply an annotation tool for multiple tasks. BioQRator is a publicly available web-based tool for annotating biomedical literature. It was designed to support general tasks, i.e. any task annotating entities and relationships. In the BioCreative IV edition, BioQRator was tailored for protein- protein interaction (PPI) annotation by migrating information from PIE the search. The results obtained from six curators showed that the precision on the top 10 documents doubled with PIE the search compared with PubMed search results. It was also observed that the annotation time for a full PPI annotation task decreased for a beginner-intermediate level annotator. This finding is encouraging because text-mining techniques were not directly involved in the full annotation task and BioQRator can be easily integrated with any text-mining resources. Database URL: http://www.bioqrator.org/.


Subject(s)
Data Curation/methods , Data Mining/methods , Internet , Protein Interaction Mapping/methods , Software , Humans
8.
Article in English | MEDLINE | ID: mdl-24961236

ABSTRACT

As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ and Java, BioC provides libraries of code for reading and writing BioC text documents and annotations. We extend BioC to Perl, Python, Go and Ruby. We used SWIG to extend the C++ implementation for Perl and one Python implementation. A second Python implementation and the Ruby implementation use native data structures and libraries. BioC is also implemented in the Google language Go. BioC modules are functional in all of these languages, which can facilitate text mining tasks. BioC implementations are freely available through the BioC site: http://bioc.sourceforge.net. Database URL: http://bioc.sourceforge.net/


Subject(s)
Data Mining/methods , Programming Languages , Algorithms , Databases as Topic , User-Computer Interface
9.
Pflugers Arch ; 466(2): 173-82, 2014 Feb.
Article in English | MEDLINE | ID: mdl-23677537

ABSTRACT

Transient receptor potential (TRP) channels are a large family of non-selective cation channels that mediate numerous physiological and pathophysiological processes; however, still largely unknown are the underlying molecular mechanisms. With data generated on an unprecedented scale, network-based approaches have been revolutionizing the way in which we understand biology and disease, discover disease genes, and develop therapeutic strategies. These circumstances have created opportunities to encounter TRP channel research to data-intensive science. In this review, we provide an introduction of network-based approaches in biomedical science, describe the current state of TRP channel network biology, and discuss the future direction of TRP channel research. Network perspective will facilitate the discovery of latent roles and underlying mechanisms of TRP channels in biology and disease.


Subject(s)
Protein Interaction Maps , Transient Receptor Potential Channels/physiology , Databases, Protein , Humans , Protein Multimerization
10.
PLoS One ; 7(10): e47165, 2012.
Article in English | MEDLINE | ID: mdl-23071747

ABSTRACT

Transient receptor potential (TRP) channels are a family of Ca(2+)-permeable cation channels that play a crucial role in biological and disease processes. To advance TRP channel research, we previously created the TRIP (TRansient receptor potential channel-Interacting Protein) Database, a manually curated database that compiles scattered information on TRP channel protein-protein interactions (PPIs). However, the database needs to be improved for information accessibility and data utilization. Here, we present the TRIP Database 2.0 (http://www.trpchannel.org) in which many helpful, user-friendly web interfaces have been developed to facilitate knowledge acquisition and inspire new approaches to studying TRP channel functions: 1) the PPI information found in the supplementary data of referred articles was curated; 2) the PPI summary matrix enables users to intuitively grasp overall PPI information; 3) the search capability has been expanded to retrieve information from 'PubMed' and 'PIE the search' (a specialized search engine for PPI-related articles); and 4) the PPI data are available as sif files for network visualization and analysis using 'Cytoscape'. Therefore, our TRIP Database 2.0 is an information hub that works toward advancing data-driven TRP channel research.


Subject(s)
Databases, Protein , Protein Interaction Maps , Transient Receptor Potential Channels/metabolism , Computational Biology , Information Dissemination , Internet , Software , Transient Receptor Potential Channels/physiology , User-Computer Interface
11.
Nucleic Acids Res ; 40(Database issue): D331-6, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22135292

ABSTRACT

The Death Domain (DD) superfamily, which is one of the largest classes of protein interaction modules, plays a pivotal role in apoptosis, inflammation, necrosis and immune cell signaling pathways. Because aberrant or inappropriate DD superfamily-mediated signaling events are associated with various human diseases, such as cancers, neurodegenerative diseases and immunological disorders, the studies in these fields are of great biological and clinical importance. To facilitate the understanding of the molecular mechanisms by which the DD superfamily is associated with biological and disease processes, we have developed the DD database (http://www.deathdomain.org), a manually curated database that aims to offer comprehensive information on protein-protein interactions (PPIs) of the DD superfamily. The DD database was created by manually curating 295 peer-reviewed studies that were published in the literature; the current version documents 175 PPI pairs among the 99 DD superfamily proteins. The DD database provides a detailed summary of the DD superfamily proteins and their PPI data. Users can find in-depth information that is specified in the literature on relevant analytical methods, experimental resources and domain structures. Our database provides a definitive and valuable tool that assists researchers in understanding the signaling network that is mediated by the DD superfamily.


Subject(s)
Databases, Protein , Death Domain Receptor Signaling Adaptor Proteins/chemistry , Death Domain Receptor Signaling Adaptor Proteins/metabolism , Protein Interaction Mapping , Sequence Analysis, Protein , User-Computer Interface
12.
Bioinformatics ; 28(4): 597-8, 2012 Feb 15.
Article in English | MEDLINE | ID: mdl-22199390

ABSTRACT

MOTIVATION: Finding protein-protein interaction (PPI) information from literature is challenging but an important issue. However, keyword search in PubMed(®) is often time consuming because it requires a series of actions that refine keywords and browse search results until it reaches a goal. Due to the rapid growth of biomedical literature, it has become more difficult for biologists and curators to locate PPI information quickly. Therefore, a tool for prioritizing PPI informative articles can be a useful assistant for finding this PPI-relevant information. RESULTS: PIE (Protein Interaction information Extraction) the search is a web service implementing a competition-winning approach utilizing word and syntactic analyses by machine learning techniques. For easy user access, PIE the search provides a PubMed-like search environment, but the output is the list of articles prioritized by PPI confidence scores. By obtaining PPI-related articles at high rank, researchers can more easily find the up-to-date PPI information, which cannot be found in manually curated PPI databases. AVAILABILITY: http://www.ncbi.nlm.nih.gov/IRET/PIE/.


Subject(s)
Artificial Intelligence , Proteins/metabolism , Protein Interaction Mapping , PubMed
13.
Nucleic Acids Res ; 39(Database issue): D356-61, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20851834

ABSTRACT

Transient receptor potential (TRP) channels are a superfamily of Ca(2+)-permeable cation channels that translate cellular stimuli into electrochemical signals. Aberrant activity of TRP channels has been implicated in a variety of human diseases, such as neurological disorders, cardiovascular disease and cancer. To facilitate the understanding of the molecular network by which TRP channels are associated with biological and disease processes, we have developed the TRIP (TRansient receptor potential channel-Interacting Protein) Database (http://www.trpchannel.org), a manually curated database that aims to offer comprehensive information on protein-protein interactions (PPIs) of mammalian TRP channels. The TRIP Database was created by systematically curating 277 peer-reviewed literature; the current version documents 490 PPI pairs, 28 TRP channels and 297 cellular proteins. The TRIP Database provides a detailed summary of PPI data that fit into four categories: screening, validation, characterization and functional consequence. Users can find in-depth information specified in the literature on relevant analytical methods and experimental resources, such as gene constructs and cell/tissue types. The TRIP Database has user-friendly web interfaces with helpful features, including a search engine, an interaction map and a function for cross-referencing useful external databases. Our TRIP Database will provide a valuable tool to assist in understanding the molecular regulatory network of TRP channels.


Subject(s)
Databases, Protein , Transient Receptor Potential Channels/metabolism , Animals , Humans , Mammals/metabolism , Protein Interaction Mapping , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...