Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 14(2): e1005962, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29447159

RESUMO

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.


Assuntos
Indexação e Redação de Resumos , Mineração de Dados/métodos , Armazenamento e Recuperação da Informação , MEDLINE , Área Sob a Curva , Biologia Computacional/métodos , Reações Falso-Positivas , Genes , Publicações Periódicas como Assunto , Proteínas/genética , Curva ROC , Software , Terminologia como Assunto
2.
J Proteome Res ; 9(11): 5715-26, 2010 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-20831161

RESUMO

Legume pods serve important functions during seed development and are themselves sources of food and feed. Compared to seeds, the metabolism and development of pods are not well-defined. The present characterization of pods from the model legume Lotus japonicus, together with the detailed analyses of the pod and seed proteomes in five developmental stages, paves the way for comparative pathway analysis and provides new metabolic information. Proteins were analyzed by two-dimensional gel electrophoresis and tandem-mass spectrometry. These analyses lead to the identification of 604 pod proteins and 965 seed proteins, including 263 proteins distinguishing the pod. The complete data set is publicly available at http://www.cbs.dtu.dk/cgi-bin/lotus/db.cgi , where spots in a reference map are linked to experimental data, such as matched peptides, quantification values, and gene accessions. Identified pod proteins represented enzymes from 85 different metabolic pathways, including storage globulins and a late embryogenesis abundant protein. In contrast to seed maturation, pod maturation was associated with decreasing total protein content, especially proteins involved in protein biosynthesis and photosynthesis. Proteins detected only in pods included three enzymes participating in the urea cycle and four in nitrogen and amino group metabolism, highlighting the importance of nitrogen metabolism during pod development. Additionally, five legume seed proteins previously unassigned in the glutamate metabolism pathway were identified.


Assuntos
Frutas/química , Lotus/química , Proteínas de Plantas/análise , Proteoma/análise , Sementes/química , Fabaceae , Frutas/crescimento & desenvolvimento , Lotus/crescimento & desenvolvimento , Redes e Vias Metabólicas , Sementes/crescimento & desenvolvimento , Espectrometria de Massas em Tandem
3.
Plant Physiol ; 149(3): 1325-40, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19129418

RESUMO

We have characterized the development of seeds in the model legume Lotus japonicus. Like soybean (Glycine max) and pea (Pisum sativum), Lotus develops straight seed pods and each pod contains approximately 20 seeds that reach maturity within 40 days. Histological sections show the characteristic three developmental phases of legume seeds and the presence of embryo, endosperm, and seed coat in desiccated seeds. Furthermore, protein, oil, starch, phytic acid, and ash contents were determined, and this indicates that the composition of mature Lotus seed is more similar to soybean than to pea. In a first attempt to determine the seed proteome, both a two-dimensional polyacrylamide gel electrophoresis approach and a gel-based liquid chromatography-mass spectrometry approach were used. Globulins were analyzed by two-dimensional polyacrylamide gel electrophoresis, and five legumins, LLP1 to LLP5, and two convicilins, LCP1 and LCP2, were identified by matrix-assisted laser desorption ionization quadrupole/time-of-flight mass spectrometry. For two distinct developmental phases, seed filling and desiccation, a gel-based liquid chromatography-mass spectrometry approach was used, and 665 and 181 unique proteins corresponding to gene accession numbers were identified for the two phases, respectively. All of the proteome data, including the experimental data and mass spectrometry spectra peaks, were collected in a database that is available to the scientific community via a Web interface (http://www.cbs.dtu.dk/cgi-bin/lotus/db.cgi). This database establishes the basis for relating physiology, biochemistry, and regulation of seed development in Lotus. Together with a new Web interface (http://bioinfoserver.rsbs.anu.edu.au/utils/PathExpress4legumes/) collecting all protein identifications for Lotus, Medicago, and soybean seed proteomes, this database is a valuable resource for comparative seed proteomics and pathway analysis within and beyond the legume family.


Assuntos
Lotus/embriologia , Lotus/metabolismo , Modelos Biológicos , Proteoma/metabolismo , Sementes/crescimento & desenvolvimento , Sementes/metabolismo , Biomassa , Cromatografia Líquida , Bases de Dados de Proteínas , Eletroforese em Gel Bidimensional , Ácidos Graxos/análise , Globulinas/genética , Globulinas/metabolismo , Internet , Proteínas de Armazenamento de Sementes/metabolismo , Sementes/citologia , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Amido/metabolismo , Água
4.
Stand Genomic Sci ; 1(2): 204-15, 2009 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-21304658

RESUMO

We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.

5.
Nucleic Acids Res ; 35(9): 3100-8, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17452365

RESUMO

The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.


Assuntos
Genes de RNAr , Software , Biologia Computacional/métodos , Genoma Bacteriano , Genômica/métodos , Cadeias de Markov
6.
Nucleic Acids Res ; 34(Web Server issue): W84-8, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845115

RESUMO

FeatureMap3D is a web-based tool that maps protein features onto 3D structures. The user provides sequences annotated with any feature of interest, such as post-translational modifications, protease cleavage sites or exonic structure and FeatureMap3D will then search the Protein Data Bank (PDB) for structures of homologous proteins. The results are displayed both as an annotated sequence alignment, where the user-provided annotations as well as the sequence conservation between the query and the target sequence are displayed, and also as a publication-quality image of the 3D protein structure with the selected features and sequence conservation enhanced. The results are also returned in a readily parsable text format as well as a PyMol (http://pymol.sourceforge.net/) script file, which allows the user to easily modify the protein structure image to suit a specific purpose. FeatureMap3D can also be used without sequence annotation, to evaluate the quality of the alignment of the input sequences to the most homologous structures in the PDB, through the sequence conservation colored 3D structure visualization tool. FeatureMap3D is available at: http://www.cbs.dtu.dk/services/FeatureMap3D/.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Homologia de Sequência de Aminoácidos , Software , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Aminoácidos/química , Gráficos por Computador , Sequência Conservada , Éxons , Internet , Modelos Moleculares , Proteínas/química , Alinhamento de Sequência
7.
Environ Microbiol ; 8(2): 353-61, 2006 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-16423021

RESUMO

To predict origins of replication in prokaryotic chromosomes, we analyse the leading and lagging strands of 200 chromosomes for differences in oligomer composition and show that these correlate strongly with taxonomic grouping, lifestyle and molecular details of the replication process. While all bacteria have a preference for Gs over Cs on the leading strand, we discover that the direction of the A/T skew is determined by the polymerase-alpha subunit that replicates the leading strand. The strength of the strand bias varies greatly between both phyla and environments and appears to correlate with growth rate. Finally we observe much greater diversity of skew among archaea than among bacteria. We have developed a program that accurately locates the origins of replication by measuring the differences between leading and lagging strand of all oligonucleotides up to 8 bp in length. The program and results for all publicly available genomes are available from http://www.cbs.dtu.dk/services/GenomeAtlas/suppl/origin.


Assuntos
Archaea/genética , Bactérias/genética , Cromossomos de Archaea/genética , Cromossomos Bacterianos/genética , Origem de Replicação , Replicação do DNA/genética , DNA Circular/genética , Filogenia
8.
BMC Genomics ; 6: 70, 2005 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-15885146

RESUMO

BACKGROUND: Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. RESULTS: We have generated approximately 3.84 million shotgun sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human. CONCLUSION: The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.


Assuntos
Genoma , Genômica/métodos , Análise de Sequência de DNA/métodos , Animais , Biologia Computacional/métodos , Evolução Molecular , Éxons , Genoma Humano , Humanos , Camundongos , Filogenia , RNA Mensageiro/metabolismo , Sequências Repetitivas de Ácido Nucleico , Especificidade da Espécie , Suínos
10.
Comput Chem ; 26(5): 531-41, 2002 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12144181

RESUMO

We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 bp or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine (pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected to contain 1.0% of purine tracts and also 1.0% of the alternating pyr/pur tracts. In the vast majority of cases, there are more purine tracts than would be expected from a random sequence, with an average of 3.5%, significantly larger than the expectation value. The fraction of the chromosomes containing pyr/pur tracts was slightly less than expected, with an average of 0.8%. One of the most surprising findings is a clear difference in the length distributions of the regions studied between prokaryotes and eukaryotes. Whereas short-range correlations can explain the length distributions in prokaryotes, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation.


Assuntos
Cromossomos/genética , Bases de Dados Genéticas , Purinas/análise , Animais , Sequência de Bases , Viés , Cromossomos de Archaea/genética , Cromossomos Bacterianos/genética , Eucariotos/genética , Células Eucarióticas , Genoma , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Plasmídeos/genética , Pirimidinas/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...