Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2744: 403-441, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38683334

RESUMO

BOLD, the Barcode of Life Data System, supports the acquisition, storage, validation, analysis, and publication of DNA barcodes, activities requiring the integration of molecular, morphological, and distributional data. Its pivotal role in curating the reference library of DNA barcodes, coupled with its data management and analysis capabilities, makes it a central resource for biodiversity science. It enables rapid, accurate identification of specimens and also reveals patterns of genetic diversity and evolutionary relationships among taxa.Launched in 2005, BOLD has become an increasingly powerful tool for advancing the understanding of planetary biodiversity. It currently hosts 17 million specimen records and 14 million barcodes that provide coverage for more than a million species from every continent and ocean. The platform has the long-term goal of providing a consistent, accurate system for identifying all species of eukaryotes.BOLD's integrated analytical tools, full data lifecycle support, and secure collaboration framework distinguish it from other biodiversity platforms. BOLD v4 brought enhanced data management and analysis capabilities as well as novel functionality for data dissemination and publication. Its next version will include features to strengthen its utility to the research community, governments, industry, and society-at-large.


Assuntos
Biodiversidade , Biologia Computacional , Código de Barras de DNA Taxonômico , Código de Barras de DNA Taxonômico/métodos , Biologia Computacional/métodos , Software , DNA/genética
2.
Artigo em Inglês | MEDLINE | ID: mdl-28092571

RESUMO

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).


Assuntos
Código de Barras de DNA Taxonômico/métodos , Genômica/métodos , Aprendizado de Máquina , Animais , Humanos , Redes Neurais de Computação
3.
Evolution ; 70(9): 1960-78, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27402284

RESUMO

The major branches of life diversified in the marine realm, and numerous taxa have since transitioned between marine and freshwaters. Previous studies have demonstrated higher rates of molecular evolution in crustaceans inhabiting continental saline habitats as compared with freshwaters, but it is unclear whether this trend is pervasive or whether it applies to the marine environment. We employ the phylogenetic comparative method to investigate relative molecular evolutionary rates between 148 pairs of marine or continental saline versus freshwater lineages representing disparate eukaryote groups, including bony fish, elasmobranchs, cetaceans, crustaceans, mollusks, annelids, algae, and other eukaryotes, using available protein-coding and noncoding genes. Overall, we observed no consistent pattern in nucleotide substitution rates linked to habitat across all genes and taxa. However, we observed some trends of higher evolutionary rates within protein-coding genes in freshwater taxa-the comparisons mainly involving bony fish-compared with their marine relatives. The results suggest no systematic differences in substitution rate between marine and freshwater organisms.


Assuntos
Organismos Aquáticos/genética , Meio Ambiente , Eucariotos/genética , Evolução Molecular , Animais , Água Doce/análise , Invertebrados/genética , Microalgas/genética , Filogenia , Água do Mar/análise , Análise de Sequência de DNA , Vertebrados/genética
4.
BMC Bioinformatics ; 11 Suppl 8: S4, 2010 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-21034429

RESUMO

UNLABELLED: This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. BACKGROUND: Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. RESULTS: Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. CONCLUSIONS: We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Bases de Dados Factuais , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Alcaloides , Animais , Camundongos , Proteínas/classificação , Relação Quantitativa Estrutura-Atividade , Ratos , Análise de Regressão , Reprodutibilidade dos Testes , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...