Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Res Notes ; 7: 365, 2014 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-24929426

RESUMO

BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.


Assuntos
Biologia Computacional/métodos , Validação de Programas de Computador , Software , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Reprodutibilidade dos Testes
2.
PLoS One ; 8(3): e50869, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23516388

RESUMO

BACKGROUND: The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation. METHODOLOGY/PRINCIPAL FINDINGS: We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious "guilty-by-association" approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs. CONCLUSION/SIGNIFICANCE: Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge.


Assuntos
Ecossistema , Metagenoma , Metagenômica , Domínios e Motivos de Interação entre Proteínas/fisiologia , Água do Mar/microbiologia , Análise por Conglomerados , Biologia Computacional/métodos , Cianobactérias/classificação , Cianobactérias/genética , Cianobactérias/metabolismo , Ferro/metabolismo , Fotossíntese/fisiologia
3.
FEMS Microbiol Ecol ; 81(2): 373-85, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22416918

RESUMO

The Global Ocean Sampling (GOS) expedition is currently the largest and geographically most comprehensive metagenomic dataset, including samples from the Atlantic, Pacific, and Indian Oceans. This study makes use of the wide range of environmental conditions and habitats encompassed within the GOS sites in order to investigate the ecological structuring of bacterial and archaeal taxon ranks. Community structures based on taxonomically classified 16S ribosomal RNA (rRNA) gene fragments at phylum, class, order, family, and genus rank levels were examined using multivariate statistical analysis, and the results were inspected in the context of oceanographic environmental variables and structured habitat classifications. At all taxon rank levels, community structures of neritic, oceanic, estuarine biomes, as well as other exotic biomes (salt marsh, lake, mangrove), were readily distinguishable from each other. A strong structuring of the communities with chlorophyll a concentration and a weaker yet significant structuring with temperature and salinity were observed. Furthermore, there were significant correlations between community structures and habitat classification. These results were used for further investigation of one-to-one relationships between taxa and environment and provided indications for ecological preferences shaped by primary production for both cultured and uncultured bacterial and archaeal clades.


Assuntos
Archaea/classificação , Bactérias/classificação , Metagenômica , Água do Mar/microbiologia , Archaea/genética , Bactérias/genética , Clorofila/análise , Clorofila A , Ecologia , Ecossistema , Genes de RNAr , Geografia , Análise Multivariada , Oceanos e Mares , RNA Ribossômico 16S/genética
4.
PLoS One ; 6(9): e24797, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21935468

RESUMO

State of the art (DNA) sequencing methods applied in "Omics" studies grant insight into the 'blueprints' of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion.


Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Genéticas , Genômica
5.
Nat Biotechnol ; 29(5): 415-20, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21552244

RESUMO

Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.


Assuntos
Biomarcadores , Meio Ambiente , Metagenômica/normas , Análise de Sequência de DNA/normas , Lista de Checagem , Bases de Dados Genéticas , Genes de RNAr , Variação Genética , Humanos , Armazenamento e Recuperação da Informação/normas , Internet , Linguagens de Programação , Software
6.
BMC Bioinformatics ; 11: 358, 2010 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-20591175

RESUMO

BACKGROUND: Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition. RESULTS: MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft Excel spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3). CONCLUSION: The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.


Assuntos
Bases de Dados Genéticas , Genômica , Armazenamento e Recuperação da Informação/métodos , Software , Sequência de Bases , Internet , Linguagens de Programação , Interface Usuário-Computador , Fluxo de Trabalho
7.
Nucleic Acids Res ; 38(Database issue): D391-5, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19858098

RESUMO

Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Animais , Biologia Computacional/tendências , Ecologia , Meio Ambiente , Genoma Bacteriano , Geografia , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Oceanos e Mares , Estrutura Terciária de Proteína , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...