Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
BMC Bioinformatics ; 17: 43, 2016 Jan 20.
Article in English | MEDLINE | ID: mdl-26792120

ABSTRACT

BACKGROUND: Here we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. RESULTS: In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resulting functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. CONCLUSIONS: PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequence-based genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.


Subject(s)
Genome, Bacterial , Herbaspirillum/genetics , High-Throughput Nucleotide Sequencing/methods , Internet , Molecular Sequence Annotation/methods , Software , Computational Biology/methods , Computers , Water Microbiology
2.
J Autoimmun ; 50: 77-82, 2014 May.
Article in English | MEDLINE | ID: mdl-24387802

ABSTRACT

Previous cross-sectional analyses demonstrated that CD8(+) and CD4(+) T-cell reactivity to islet-specific antigens was more prevalent in T1D subjects than in healthy donors (HD). Here, we examined T1D-associated epitope-specific CD4(+) T-cell cytokine production and autoreactive CD8(+) T-cell frequency on a monthly basis for one year in 10 HD, 33 subjects with T1D, and 15 subjects with T2D. Autoreactive CD4(+) T-cells from both T1D and T2D subjects produced more IFN-γ when stimulated than cells from HD. In contrast, higher frequencies of islet antigen-specific CD8(+) T-cells were detected only in T1D. These observations support the hypothesis that general beta-cell stress drives autoreactive CD4(+) T-cell activity while islet over-expression of MHC class I commonly seen in T1D mediates amplification of CD8(+) T-cells and more rapid beta-cell loss. In conclusion, CD4(+) T-cell autoreactivity appears to be present in both T1D and T2D while autoreactive CD8(+) T-cells are unique to T1D. Thus, autoreactive CD8(+) cells may serve as a more T1D-specific biomarker.


Subject(s)
Autoantigens/immunology , CD4-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/immunology , Diabetes Mellitus, Type 1/immunology , Diabetes Mellitus, Type 2/immunology , Islets of Langerhans/immunology , Adult , Aged , CD4-Positive T-Lymphocytes/pathology , CD8-Positive T-Lymphocytes/pathology , Case-Control Studies , Cytotoxicity, Immunologic , Diabetes Mellitus, Type 1/pathology , Diabetes Mellitus, Type 2/pathology , Enzyme-Linked Immunospot Assay , Female , Humans , Interferon-gamma/biosynthesis , Islets of Langerhans/pathology , Longitudinal Studies , Male , Middle Aged
3.
BMC Bioinformatics ; 13: 321, 2012 Dec 02.
Article in English | MEDLINE | ID: mdl-23198735

ABSTRACT

BACKGROUND: Methods of weakening and attenuating pathogens' abilities to infect and propagate in a host, thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories, applicable to both generic and specific virulence categories. RESULTS: A lightweight method for data integration is used, which links information regarding a protein via a path-based query graph. A method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are tested against the problem of both generalized and specific virulence function prediction. CONCLUSIONS: This approach improves coverage of functional data over a protein. Moreover, while depending largely on noisy and potentially non-curated data from public sources, we find it outperforms other techniques to identification of general virulence factors and baseline remote homology detection methods for specific virulence categories.


Subject(s)
Proteins/classification , Sequence Analysis, Protein/methods , Sequence Analysis, Protein/statistics & numerical data , Virulence Factors/classification , Data Interpretation, Statistical , Databases, Protein , Proteins/chemistry , Virulence , Virulence Factors/chemistry
4.
BMC Res Notes ; 5: 96, 2012 Feb 14.
Article in English | MEDLINE | ID: mdl-22333139

ABSTRACT

BACKGROUND: Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo ß-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins. RESULTS: Using both sequence and structural analyses, we examine residues and structural features specific to various pathogen-borne MBL types. This analysis identifies a linker region within MBL-like folds that may act as a discriminating structural feature between these proteins, and specifically resistance-associated acquirable MBLs. Recently released crystal structures of the newly emerged NDM-1 protein were aligned against related MBL structures using a variety of global and local structural alignment methods, and the overall fold conformation is examined for structural conservation. Conservation appears to be present in most areas of the protein, yet is strikingly absent within a linker region, making NDM-1 unique with respect to a linker-based classification scheme. Variability analysis of the NDM-1 crystal structure highlights unique residues in key regions as well as identifying several characteristics shared with other transferable MBLs. CONCLUSIONS: A discriminating linker region identified in MBL proteins is highlighted and examined in the context of NDM-1 and primarily three other MBL types: IMP-1, VIM-2 and ccrA. The presence of an unusual linker region variant and uncommon amino acid composition at specific structurally important sites may help to explain the unusually broad kinetic profile of NDM-1 and may aid in directing research attention to areas of this protein, and possibly other MBLs, that may be targeted for inactivation or attenuation of enzymatic activity.

5.
J Biomed Semantics ; 2 Suppl 3: S2, 2011.
Article in English | MEDLINE | ID: mdl-21992591

ABSTRACT

BACKGROUND: Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task. METHODS: We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events. RESULTS: The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists. CONCLUSIONS: This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.

6.
J Am Med Inform Assoc ; 17(5): 514-8, 2010.
Article in English | MEDLINE | ID: mdl-20819854

ABSTRACT

The Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records focused on the identification of medications, their dosages, modes (routes) of administration, frequencies, durations, and reasons for administration in discharge summaries. This challenge is referred to as the medication challenge. For the medication challenge, i2b2 released detailed annotation guidelines along with a set of annotated discharge summaries. Twenty teams representing 23 organizations and nine countries participated in the medication challenge. The teams produced rule-based, machine learning, and hybrid systems targeted to the task. Although rule-based systems dominated the top 10, the best performing system was a hybrid. Of all medication-related fields, durations and reasons were the most difficult for all systems to detect. While medications themselves were identified with better than 0.75 F-measure by all of the top 10 systems, the best F-measure for durations and reasons were 0.525 and 0.459, respectively. State-of-the-art natural language processing systems go a long way toward extracting medication names, dosages, modes, and frequencies. However, they are limited in recognizing duration and reason fields and would benefit from future research.


Subject(s)
Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Pharmaceutical Preparations , Computers, Hybrid , Humans , Patient Dropouts
7.
J Am Med Inform Assoc ; 17(5): 519-23, 2010.
Article in English | MEDLINE | ID: mdl-20819855

ABSTRACT

OBJECTIVE: Within the context of the Third i2b2 Workshop on Natural Language Processing Challenges for Clinical Records, the authors (also referred to as 'the i2b2 medication challenge team' or 'the i2b2 team' for short) organized a community annotation experiment. DESIGN: For this experiment, the authors released annotation guidelines and a small set of annotated discharge summaries. They asked the participants of the Third i2b2 Workshop to annotate 10 discharge summaries per person; each discharge summary was annotated by two annotators from two different teams, and a third annotator from a third team resolved disagreements. MEASUREMENTS: In order to evaluate the reliability of the annotations thus produced, the authors measured community inter-annotator agreement and compared it with the inter-annotator agreement of expert annotators when both the community and the expert annotators generated ground truth based on pooled system outputs. For this purpose, the pool consisted of the three most densely populated automatic annotations of each record. The authors also compared the community inter-annotator agreement with expert inter-annotator agreement when the experts annotated raw records without using the pool. Finally, they measured the quality of the community ground truth by comparing it with the expert ground truth. RESULTS AND CONCLUSIONS: The authors found that the community annotators achieved comparable inter-annotator agreement to expert annotators, regardless of whether the experts annotated from the pool. Furthermore, the ground truth generated by the community obtained F-measures above 0.90 against the ground truth of the experts, indicating the value of the community as a source of high-quality ground truth even on intricate and domain-specific annotation tasks.


Subject(s)
Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Pharmaceutical Preparations , Humans , Patient Discharge
8.
J Biomed Inform ; 43(6): 873-82, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20643225

ABSTRACT

Though there have been many advances in providing access to linked and integrated biomedical data across repositories, developing methods which allow users to specify ambiguous and exploratory queries over disparate sources remains a challenge to extracting well-curated or diversely-supported biological information. In the following work, we discuss the concepts of data coverage and evidence in the context of integrated sources. We address diverse information retrieval via a simple framework for representing coverage and evidence that operates in parallel with an arbitrary schema, and a language upon which queries on the schema and framework may be executed. We show that this approach is capable of answering questions that require ranged levels of evidence or triangulation, and demonstrate that appropriately-formed queries can significantly improve the level of precision when retrieving well-supported biomedical data.


Subject(s)
Databases, Factual , Information Storage and Retrieval/methods , Biomedical Research , Internet , Semantics
9.
J Biomed Inform ; 43(3): 407-18, 2010 Jun.
Article in English | MEDLINE | ID: mdl-20015478

ABSTRACT

Genome wide association studies (GWAS) are an important approach to understanding the genetic mechanisms behind human diseases. Single nucleotide polymorphisms (SNPs) are the predominant markers used in genome wide association studies, and the ability to predict which SNPs are likely to be functional is important for both a priori and a posteriori analyses of GWA studies. This article describes the design, implementation and evaluation of a family of systems for the purpose of identifying SNPs that may cause a change in phenotypic outcomes. The methods described in this article characterize the feasibility of combinations of logical and probabilistic inference with federated data integration for both point and regional SNP annotation and analysis. Evaluations of the methods demonstrate the overall strong predictive value of logical, and logical with probabilistic, inference applied to the domain of SNP annotation.


Subject(s)
Models, Statistical , Polymorphism, Single Nucleotide , Databases, Genetic , Genome-Wide Association Study/methods , Logic
10.
AMIA Annu Symp Proc ; : 889, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18999001

ABSTRACT

In the following work, we test a generalized approach to integrating, transforming and learning data from disparate data sources for the classification of bacterial proteins involved in pathogenesis. We rely on the implicit inter-linkages between biological databases to draw relevant records, and leverage statistical learning methods to infer classification based on abundant, albeit noisy, data. Results suggest that types of public biological information have varying degrees of effectiveness in predictive data mining.


Subject(s)
Artificial Intelligence , Bacterial Proteins/classification , Bacterial Toxins/classification , Databases, Protein , Pattern Recognition, Automated/methods , Terminology as Topic , Virulence Factors/classification , Algorithms , Information Storage and Retrieval/methods , Natural Language Processing
11.
Pac Symp Biocomput ; : 343-54, 2007.
Article in English | MEDLINE | ID: mdl-17990504

ABSTRACT

Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than "gold standard" annotations approximately 80% of the time.


Subject(s)
Computational Biology , Databases, Genetic , Genomics/statistics & numerical data , Computer Systems , Data Interpretation, Statistical , Expert Systems , Software
12.
Science ; 309(5733): 404-9, 2005 Jul 15.
Article in English | MEDLINE | ID: mdl-16020724

ABSTRACT

A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.


Subject(s)
Genome, Protozoan , Leishmania major/genetics , Proteome , Protozoan Proteins/genetics , Trypanosoma brucei brucei/genetics , Trypanosoma cruzi/genetics , Animals , Biological Evolution , Chromosomes/genetics , Evolution, Molecular , Gene Transfer, Horizontal , Genes, Protozoan , Genomics , Leishmania major/chemistry , Leishmania major/metabolism , Molecular Sequence Data , Multigene Family , Mutation , Phylogeny , Plastids/genetics , Protozoan Proteins/chemistry , Protozoan Proteins/physiology , Recombination, Genetic , Retroelements , Species Specificity , Symbiosis , Synteny , Telomere/genetics , Trypanosoma brucei brucei/chemistry , Trypanosoma brucei brucei/metabolism , Trypanosoma cruzi/chemistry , Trypanosoma cruzi/metabolism
13.
Science ; 309(5733): 409-15, 2005 Jul 15.
Article in English | MEDLINE | ID: mdl-16020725

ABSTRACT

Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.


Subject(s)
Genome, Protozoan , Protozoan Proteins/genetics , Sequence Analysis, DNA , Trypanosoma cruzi/genetics , Animals , Chagas Disease/drug therapy , Chagas Disease/parasitology , DNA Repair , DNA Replication , DNA, Mitochondrial/genetics , DNA, Protozoan/genetics , Genes, Protozoan , Humans , Meiosis , Membrane Proteins/chemistry , Membrane Proteins/genetics , Membrane Proteins/physiology , Multigene Family , Protozoan Proteins/chemistry , Protozoan Proteins/physiology , Recombination, Genetic , Repetitive Sequences, Nucleic Acid , Retroelements , Signal Transduction , Telomere/genetics , Trypanocidal Agents/pharmacology , Trypanocidal Agents/therapeutic use , Trypanosoma cruzi/chemistry , Trypanosoma cruzi/physiology
14.
Science ; 309(5733): 436-42, 2005 Jul 15.
Article in English | MEDLINE | ID: mdl-16020728

ABSTRACT

Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.


Subject(s)
Genome, Protozoan , Leishmania major/genetics , Sequence Analysis, DNA , Animals , Chromatin/genetics , Chromatin/metabolism , Gene Expression Regulation , Genes, Protozoan , Genes, rRNA , Glycoconjugates/biosynthesis , Glycoconjugates/metabolism , Leishmania major/chemistry , Leishmania major/metabolism , Leishmaniasis, Cutaneous/parasitology , Lipid Metabolism , Membrane Proteins/biosynthesis , Membrane Proteins/chemistry , Membrane Proteins/genetics , Membrane Proteins/metabolism , Molecular Sequence Data , Multigene Family , Protein Biosynthesis , Protein Processing, Post-Translational , Protozoan Proteins/biosynthesis , Protozoan Proteins/chemistry , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , RNA Processing, Post-Transcriptional , RNA Splicing , RNA, Protozoan/genetics , RNA, Protozoan/metabolism , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...