Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 14(1): e1005944, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29373581

RESUMO

The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Animais , Arabidopsis/genética , Genoma Humano , Genoma de Planta , Genômica , Humanos , Modelos Teóricos , Pan troglodytes , Polimorfismo de Nucleotídeo Único , Linguagens de Programação , Análise de Sequência de DNA , Análise de Sequência de Proteína
2.
Genome Med ; 9(1): 30, 2017 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-28351419

RESUMO

BACKGROUND: Encoded by the var gene family, highly variable Plasmodium falciparum erythrocyte membrane protein-1 (PfEMP1) proteins mediate tissue-specific cytoadherence of infected erythrocytes, resulting in immune evasion and severe malaria disease. Sequencing and assembling the 40-60 var gene complement for individual infections has been notoriously difficult, impeding molecular epidemiological studies and the assessment of particular var elements as subunit vaccine candidates. METHODS: We developed and validated a novel algorithm, Exon-Targeted Hybrid Assembly (ETHA), to perform targeted assembly of var gene sequences, based on a combination of Pacific Biosciences and Illumina data. RESULTS: Using ETHA, we characterized the repertoire of var genes in 12 samples from uncomplicated malaria infections in children from a single Malian village and showed them to be as genetically diverse as vars from isolates from around the globe. The gene var2csa, a member of the var family associated with placental malaria pathogenesis, was present in each genome, as were vars previously associated with severe malaria. CONCLUSION: ETHA, a tool to discover novel var sequences from clinical samples, will aid the understanding of malaria pathogenesis and inform the design of malaria vaccines based on PfEMP1. ETHA is available at: https://sourceforge.net/projects/etha/ .


Assuntos
Algoritmos , Variação Genética , Plasmodium falciparum/metabolismo , Proteínas de Protozoários/genética , Análise de Sequência de DNA/métodos , Criança , Humanos , Malária Falciparum/genética , Malária Falciparum/metabolismo , Mali , Plasmodium falciparum/genética , Software
3.
Brief Bioinform ; 14(2): 213-24, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22199379

RESUMO

Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net.


Assuntos
Genômica/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos , Software , Animais , Biologia Computacional , Gráficos por Computador , Apresentação de Dados , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos
4.
Genome Res ; 22(3): 557-67, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22147368

RESUMO

New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.


Assuntos
Algoritmos , Genômica/métodos , Análise de Sequência de DNA , Animais , Biologia Computacional/métodos , Genoma , Genoma Bacteriano/genética , Humanos , Internet , Reprodutibilidade dos Testes
5.
Nucleic Acids Res ; 40(1): e9, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22102569

RESUMO

Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested.


Assuntos
Metagenômica/métodos , Análise de Sequência de DNA , Software , Análise por Conglomerados , Trato Gastrointestinal/microbiologia , Genes , Humanos , Metagenoma , Filogenia
6.
J Bacteriol ; 193(19): 5450-64, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21784931

RESUMO

Xanthomonas is a large genus of bacteria that collectively cause disease on more than 300 plant species. The broad host range of the genus contrasts with stringent host and tissue specificity for individual species and pathovars. Whole-genome sequences of Xanthomonas campestris pv. raphani strain 756C and X. oryzae pv. oryzicola strain BLS256, pathogens that infect the mesophyll tissue of the leading models for plant biology, Arabidopsis thaliana and rice, respectively, were determined and provided insight into the genetic determinants of host and tissue specificity. Comparisons were made with genomes of closely related strains that infect the vascular tissue of the same hosts and across a larger collection of complete Xanthomonas genomes. The results suggest a model in which complex sets of adaptations at the level of gene content account for host specificity and subtler adaptations at the level of amino acid or noncoding regulatory nucleotide sequence determine tissue specificity.


Assuntos
Genoma Bacteriano/genética , Xanthomonas/genética , Arabidopsis/microbiologia , Dados de Sequência Molecular , Oryza/microbiologia , Xanthomonas/fisiologia
7.
PLoS One ; 6(3): e14792, 2011 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-21483493

RESUMO

Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i) be long enough to contain phylogenetically useful information, (ii) not be subject to horizontal gene transfer, (iii) be a single copy (iv) have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v) predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP), which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3-100% for ychF, 97.8-100% for rpoB and 96.9-100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4-100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for investigating the bacterial identification, phylogeny and taxonomy.


Assuntos
Actinobacteria/genética , Proteínas de Bactérias/genética , Tipagem de Sequências Multilocus/métodos , Actinobacteria/classificação , Proteínas de Bactérias/classificação , Filogenia
8.
Nat Genet ; 43(2): 109-16, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21186353

RESUMO

The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.


Assuntos
Fragaria/genética , Genoma de Planta , Algoritmos , Cloroplastos/genética , Mapeamento Cromossômico , Perfilação da Expressão Gênica , Genes de Plantas , Ligação Genética , Hibridização in Situ Fluorescente , Funções Verossimilhança , Modelos Genéticos , Filogenia , Sequências Repetidas Terminais , Transcrição Gênica
9.
J Bacteriol ; 192(22): 6101-2, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20833805

RESUMO

Pollutants such as polychlorinated biphenyls and dioxins pose a serious threat to human and environmental health. Natural attenuation of these compounds by microorganisms provides one promising avenue for their removal from contaminated areas. Over the past 2 decades, studies of the bacterium Sphingomonas wittichii RW1 have provided a wealth of knowledge about how bacteria metabolize chlorinated aromatic hydrocarbons. Here we describe the finished genome sequence of S. wittichii RW1 and major findings from its annotation.


Assuntos
DNA Bacteriano/química , DNA Bacteriano/genética , Dioxinas/metabolismo , Genoma Bacteriano , Sphingomonas/genética , Sphingomonas/metabolismo , Poluentes Ambientais/metabolismo , Dados de Sequência Molecular , Análise de Sequência de DNA
10.
Genome Res ; 20(9): 1165-73, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20508146

RESUMO

Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Genoma Humano , Humanos
11.
Proc Natl Acad Sci U S A ; 106(40): 17095-100, 2009 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-19805156

RESUMO

Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Genoma Humano/genética , Expansão das Repetições de Trinucleotídeos/genética , Repetições de Trinucleotídeos/genética , Animais , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Frequência do Gene , Predisposição Genética para Doença/genética , Variação Genética , Humanos , Repetições de Microssatélites/genética , Pan troglodytes/genética , Polimorfismo Genético
12.
Nucleic Acids Res ; 37(11): e80, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19429899

RESUMO

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64,000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.


Assuntos
Genômica/métodos , Splicing de RNA , Alinhamento de Sequência/métodos , Software , Algoritmos , Animais , Cães , Genoma de Planta , Humanos , Camundongos , Padrões de Referência , Alinhamento de Sequência/normas , Vertebrados/genética
13.
Genome Biol ; 10(4): R42, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19393038

RESUMO

BACKGROUND: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. RESULTS: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions. CONCLUSIONS: By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.


Assuntos
Bovinos/genética , Genoma/genética , Análise de Sequência de DNA/métodos , Animais , Mapeamento Cromossômico , Feminino , Genoma Humano/genética , Genômica , Humanos , Masculino , Análise de Sequência de DNA/estatística & dados numéricos , Sintenia , Cromossomo Y/genética
14.
Bioinformatics ; 24(24): 2818-24, 2008 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-18952627

RESUMO

MOTIVATION: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a 'hybrid' approach to whole-genome shotgun sequencing projects, but assembly software must be modified to accommodate their different characteristics. This is true even of pyrosequencing mated and unmated read combinations. Without special modifications, assemblers tuned for homogeneous sequence data may perform poorly on hybrid data. RESULTS: Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage and heterogeneous read lengths. In tests on four genomes, it generated the longest contigs among all assemblers tested. It exploited the mate constraints provided by paired-end reads from either platform to build larger contigs and scaffolds, which were validated by comparison to a finished reference sequence. A low rate of contig mis-assembly was detected in some CABOG assemblies, but this was reduced in the presence of sufficient mate pair data. AVAILABILITY: The software is freely available as open-source from http://wgs-assembler.sf.net under the GNU Public License.


Assuntos
Análise de Sequência de DNA/métodos , Software , Biologia Computacional/métodos , Genoma , Genômica
15.
BMC Genomics ; 9: 204, 2008 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-18452608

RESUMO

BACKGROUND: Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. RESULTS: The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. CONCLUSION: Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world.


Assuntos
Evolução Molecular , Genoma Bacteriano/genética , Oryza/microbiologia , Xanthomonas/genética , Proteínas de Bactérias/genética , Sequência de Bases , Elementos de DNA Transponíveis/genética , Duplicação Gênica , Rearranjo Gênico , Transferência Genética Horizontal , Genômica , Repetições de Microssatélites , Reprodutibilidade dos Testes , Fatores de Tempo
16.
Nature ; 452(7190): 991-6, 2008 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-18432245

RESUMO

Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3x draft genome sequence of 'SunUp' papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.


Assuntos
Carica/genética , Genoma de Planta/genética , Arabidopsis/genética , Mapeamento de Sequências Contíguas , Bases de Dados Genéticas , Genes de Plantas/genética , Dados de Sequência Molecular , Plantas Geneticamente Modificadas/genética , Alinhamento de Sequência , Análise de Sequência de DNA , Fatores de Transcrição/genética , Clima Tropical
17.
BMC Bioinformatics ; 8: 474, 2007 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-18070356

RESUMO

BACKGROUND: The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. RESULTS: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. CONCLUSION: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.


Assuntos
Gráficos por Computador/instrumentação , Sistemas de Gerenciamento de Base de Dados , Alinhamento de Sequência/economia , Alinhamento de Sequência/instrumentação , Animais , Bacillus anthracis/genética , Sequência de Bases , Caenorhabditis/genética , Gráficos por Computador/economia , Computadores/economia , Mapeamento de Sequências Contíguas/economia , Mapeamento de Sequências Contíguas/instrumentação , DNA/ultraestrutura , Bases de Dados Genéticas , Biblioteca Genômica , Listeria monocytogenes/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Streptococcus suis/genética , Fatores de Tempo , Simplificação do Trabalho
18.
Science ; 317(5845): 1756-60, 2007 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-17885136

RESUMO

Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the approximately 90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict approximately 11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during approximately 350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.


Assuntos
Brugia Malayi/genética , Genoma Helmíntico , Animais , Brugia Malayi/fisiologia , Caenorhabditis/genética , Drosophila melanogaster/genética , Resistência a Medicamentos/genética , Filariose/parasitologia , Humanos , Dados de Sequência Molecular
19.
Mol Biol Evol ; 24(9): 2091-8, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17642473

RESUMO

Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation and repair of overlaps among adjacent genes where the 3' ends either overlap or nearly overlap. Our model, derived from a comprehensive analysis of complete prokaryotic genomes in GenBank, explains the nonuniform distribution of the lengths of such overlap regions far more simply than previously proposed models. Specifically, we explain the distribution of overlap lengths based on random extensions of genes to the next occurring downstream stop codon. Our model also provides an explanation for a newly observed (here) pattern in the distribution of the separation distances of closely spaced nonoverlapping genes. We provide evidence that the newly described biased distribution of separation distances is driven by the same phenomenon that creates the uneven distribution of overlap lengths. This suggests a dynamic picture of continual overlap creation and elimination.


Assuntos
Genes Arqueais/genética , Genes Bacterianos/genética , Homologia de Genes/genética , Células Procarióticas/metabolismo , Sequência de Bases , Evolução Molecular , Genoma Arqueal , Genoma Bacteriano , Dados de Sequência Molecular
20.
BMC Bioinformatics ; 8: 64, 2007 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-17324286

RESUMO

BACKGROUND: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses of assemblers, however, are best served by a simpler type of assembler that requires fewer software components, uses less memory, and is far easier to install and run. RESULTS: We have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus' performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly. CONCLUSION: We find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , DNA/química , DNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Dados de Sequência Molecular , Design de Software , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...