Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Protein Sci ; 33(8): e5088, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38988311

RESUMO

Antibiotic resistance is recognized as an imminent and growing global health threat. New antimicrobial drugs are urgently needed due to the decreasing effectiveness of conventional small-molecule antibiotics. Antimicrobial peptides (AMPs), a class of host defense peptides, are emerging as promising candidates to address this need. The potential sequence space of amino acids is combinatorially vast, making it possible to extend the current arsenal of antimicrobial agents with a practically infinite number of new peptide-based candidates. However, mining naturally occurring AMPs, whether directly by wet lab screening methods or aided by bioinformatics prediction tools, has its theoretical limit regarding the number of samples or genomic/transcriptomic resources researchers have access to. Further, manually designing novel synthetic AMPs requires prior field knowledge, restricting its throughput. In silico sequence generation methods are gaining interest as a high-throughput solution to the problem. Here, we introduce AMPd-Up, a recurrent neural network based tool for de novo AMP design, and demonstrate its utility over existing methods. Validation of candidates designed by AMPd-Up through antimicrobial susceptibility testing revealed that 40 of the 58 generated sequences possessed antimicrobial activity against Escherichia coli and/or Staphylococcus aureus. These results illustrate that AMPd-Up can be used to design novel synthetic AMPs with potent activities.


Assuntos
Peptídeos Antimicrobianos , Redes Neurais de Computação , Peptídeos Antimicrobianos/química , Peptídeos Antimicrobianos/farmacologia , Peptídeos Antimicrobianos/síntese química , Desenho de Fármacos , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Staphylococcus aureus/efeitos dos fármacos , Testes de Sensibilidade Microbiana , Antibacterianos/farmacologia , Antibacterianos/química , Antibacterianos/síntese química
2.
Bioinform Adv ; 3(1): vbad162, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38023332

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ∼10× faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

3.
G3 (Bethesda) ; 14(1)2023 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-37875130

RESUMO

Black spruce (Picea mariana [Mill.] B.S.P.) is a dominant conifer species in the North American boreal forest that plays important ecological and economic roles. Here, we present the first genome assembly of P. mariana with a reconstructed genome size of 18.3 Gbp and NG50 scaffold length of 36.0 kbp. A total of 66,332 protein-coding sequences were predicted in silico and annotated based on sequence homology. We analyzed the evolutionary relationships between P. mariana and 5 other spruces for which complete nuclear and organelle genome sequences were available. The phylogenetic tree estimated from mitochondrial genome sequences agrees with biogeography; specifically, P. mariana was strongly supported as a sister lineage to P. glauca and 3 other taxa found in western North America, followed by the European Picea abies. We obtained mixed topologies with weaker statistical support in phylogenetic trees estimated from nuclear and chloroplast genome sequences, indicative of ancient reticulate evolution affecting these 2 genomes. Clustering of protein-coding sequences from the 6 Picea taxa and 2 Pinus species resulted in 34,776 orthogroups, 560 of which appeared to be specific to P. mariana. Analysis of these specific orthogroups and dN/dS analysis of positive selection signatures for 497 single-copy orthogroups identified gene functions mostly related to plant development and stress response. The P. mariana genome assembly and annotation provides a valuable resource for forest genetics research and applications in this broadly distributed species, especially in relation to climate adaptation.


Assuntos
Picea , Filogenia , Picea/genética , América do Norte
4.
BMC Genomics ; 24(1): 390, 2023 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-37430186

RESUMO

BACKGROUND: The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations. Beauveria bassiana is an entomopathogenic fungus used as a biological agent in agriculture and forestry and has potential as a management tactic for the mountain pine beetle population. This work investigates the phenotypic and genomic variation between B. bassiana strains to identify optimal strains against a specific insect. RESULTS: Using comparative genome and transcriptome analyses of eight B. bassiana isolates, we have identified the genetic basis of virulence, which includes oosporein production. Genes unique to the more virulent strains included functions in biosynthesis of mycotoxins, membrane transporters, and transcription factors. Significant differential expression of genes related to virulence, transmembrane transport, and stress response was identified between the different strains, as well as up to nine-fold upregulation of genes involved in the biosynthesis of oosporein. Differential correlation analysis revealed transcription factors that may be involved in regulating oosporein production. CONCLUSION: This study provides a foundation for the selection and/or engineering of the most effective strain of B. bassiana for the biological control of mountain pine beetle and other insect pests populations.


Assuntos
Beauveria , Besouros , Animais , Beauveria/genética , Virulência/genética , Genômica
5.
Nat Commun ; 14(1): 2906, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217507

RESUMO

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.


Assuntos
Algoritmos , Genoma , Humanos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
6.
bioRxiv ; 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37214907

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ~10X faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

7.
Curr Protoc ; 3(4): e733, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37039735

RESUMO

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling, and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ntLink scaffolding using overlap detection Basic Protocol 2: ntLink scaffolding with gap-filling Basic Protocol 3: Running in-code iterations of ntLink scaffolding Alternate Protocol 1: Generating long-read to contig mappings with ntLink Alternate Protocol 2: Using ntLink mappings for genome assembly correction with Tigmint-long Support Protocol: Installing ntLink.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Genoma
8.
Antibiotics (Basel) ; 11(12)2022 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-36551368

RESUMO

Antimicrobial peptides (AMPs) are a diverse class of short, often cationic biological molecules that present promising opportunities in the development of new therapeutics to combat antimicrobial resistance. Newly developed in silico methods offer the ability to rapidly discover numerous novel AMPs with a variety of physiochemical properties. Herein, using the rAMPage AMP discovery pipeline, we bioinformatically identified 51 AMP candidates from amphibia and insect RNA-seq data and present their in-depth characterization. The studied AMPs demonstrate activity against a panel of bacterial pathogens and have undetected or low toxicity to red blood cells and human cultured cells. Amino acid sequence analysis revealed that 30 of these bioactive peptides belong to either the Brevinin-1, Brevinin-2, Nigrocin-2, or Apidaecin AMP families. Prediction of three-dimensional structures using ColabFold indicated an association between peptides predicted to adopt a helical structure and broad-spectrum antibacterial activity against the Gram-negative and Gram-positive species tested in our panel. These findings highlight the utility of associating the diverse sequences of novel AMPs with their estimated peptide structures in categorizing AMPs and predicting their antimicrobial activity.

9.
Genome Res ; 32(10): 1952-1964, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36109148

RESUMO

We assembled the 9.8-Gbp genome of western redcedar (WRC; Thuja plicata), an ecologically and economically important conifer species of the Cupressaceae. The genome assembly, derived from a uniquely inbred tree produced through five generations of self-fertilization (selfing), was determined to be 86% complete by BUSCO analysis, one of the most complete genome assemblies for a conifer. Population genomic analysis revealed WRC to be one of the most genetically depauperate wild plant species, with an effective population size of approximately 300 and no significant genetic differentiation across its geographic range. Nucleotide diversity, π, is low for a continuous tree species, with many loci showing zero diversity, and the ratio of π at zero- to fourfold degenerate sites is relatively high (approximately 0.33), suggestive of weak purifying selection. Using an array of genetic lines derived from up to five generations of selfing, we explored the relationship between genetic diversity and mating system. Although overall heterozygosity was found to decline faster than expected during selfing, heterozygosity persisted at many loci, and nearly 100 loci were found to deviate from expectations of genetic drift, suggestive of associative overdominance. Nonreference alleles at such loci often harbor deleterious mutations and are rare in natural populations, implying that balanced polymorphisms are maintained by linkage to dominant beneficial alleles. This may account for how WRC remains responsive to natural and artificial selection, despite low genetic diversity.


Assuntos
Traqueófitas , Traqueófitas/genética , Autofertilização/genética , Alelos , Heterozigoto , Polimorfismo Genético , Variação Genética , Seleção Genética
10.
Antibiotics (Basel) ; 11(7)2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35884206

RESUMO

Antibiotic resistance is a global health crisis increasing in prevalence every day. To combat this crisis, alternative antimicrobial therapeutics are urgently needed. Antimicrobial peptides (AMPs), a family of short defense proteins, are produced naturally by all organisms and hold great potential as effective alternatives to small molecule antibiotics. Here, we present rAMPage, a scalable bioinformatics discovery platform for identifying AMP sequences from RNA sequencing (RNA-seq) datasets. In our study, we demonstrate the utility and scalability of rAMPage, running it on 84 publicly available RNA-seq datasets from 75 amphibian and insect species-species known to have rich AMP repertoires. Across these datasets, we identified 1137 putative AMPs, 1024 of which were deemed novel by a homology search in cataloged AMPs in public databases. We selected 21 peptide sequences from this set for antimicrobial susceptibility testing against Escherichia coli and Staphylococcus aureus and observed that seven of them have high antimicrobial activity. Our study illustrates how in silico methods such as rAMPage can enable the fast and efficient discovery of novel antimicrobial peptides as an effective first step in the strenuous process of antimicrobial drug development.

11.
Plant J ; 111(5): 1469-1485, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35789009

RESUMO

Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.


Assuntos
Picea , Traqueófitas , Etiquetas de Sequências Expressas , Genoma de Planta/genética , Família Multigênica/genética , Filogenia , Picea/genética , Traqueófitas/genética
12.
BMC Bioinformatics ; 23(1): 246, 2022 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729491

RESUMO

BACKGROUND: De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by [Formula: see text] bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes. RESULTS: Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%. CONCLUSIONS: RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .


Assuntos
Genômica , Software , Algoritmos , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos
13.
Curr Protoc ; 2(5): e442, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35567771

RESUMO

High-quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long-read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory-intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment-free, k-mer-based genome finishing protocol that employs memory-efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error-corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long-read genome finishing with short reads Support Protocol: Selecting optimal values for k-mer lengths (k) and Bloom filter size (b).


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polônia , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos
14.
G3 (Bethesda) ; 12(4)2022 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-35171977

RESUMO

The highly diverse insect family of true weevils, Curculionidae, includes many agricultural and forest pests. Pissodes strobi, commonly known as the spruce weevil or white pine weevil, is a major pest of spruce and pine forests in North America. Pissodes strobi larvae feed on the apical shoots of young trees, causing stunted growth and can destroy regenerating spruce or pine forests. Here, we describe the nuclear and mitochondrial Pissodes strobi genomes and their annotations, as well as the genome of an apparent Wolbachia endosymbiont. We report a substantial expansion of the weevil nuclear genome, relative to other Curculionidae species, possibly driven by an abundance of class II DNA transposons. The endosymbiont observed belongs to a group (supergroup A) of Wolbachia species that generally form parasitic relationships with their arthropod host.


Assuntos
Picea , Gorgulhos , Wolbachia , Animais , Florestas , Insetos , Picea/genética , Gorgulhos/genética , Wolbachia/genética
15.
BMC Bioinformatics ; 22(1): 534, 2021 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-34717540

RESUMO

BACKGROUND: Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS: LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS: Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Genoma , Humanos , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA
16.
Microbiol Resour Announc ; 9(39)2020 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-32972944

RESUMO

Here, we present the chloroplast genome sequence of black spruce (Picea mariana), a conifer widely distributed throughout North American boreal forests. This complete and annotated chloroplast sequence is 123,961 bp long and will contribute to future studies on the genetic basis of evolutionary change in spruce and adaptation in conifers.

17.
Genome Biol Evol ; 12(7): 1174-1179, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32449750

RESUMO

Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.


Assuntos
Genoma Mitocondrial , Genoma de Planta , Picea/genética , Estrutura Molecular
18.
Bioinformatics ; 36(12): 3885-3887, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32311025

RESUMO

SUMMARY: The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. AVAILABILITY AND IMPLEMENTATION: ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Genoma Humano , Humanos , Análise de Sequência de DNA , Sintenia
19.
Microbiol Resour Announc ; 8(23)2019 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-31171622

RESUMO

Here, we present the complete chloroplast genome sequence of white spruce (Picea glauca, genotype WS77111), a coniferous tree widespread in the boreal forests of North America. This sequence contributes to genomic and phylogenetic analyses of the Picea genus that are part of ongoing research to understand their adaptation to environmental stress.

20.
Microbiol Resour Announc ; 8(24)2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31196920

RESUMO

Engelmann spruce (Picea engelmannii) is a conifer found primarily on the west coast of North America. Here, we present the complete chloroplast genome sequence of Picea engelmannii genotype Se404-851. This chloroplast sequence will benefit future conifer genomic research and contribute resources to further species conservation efforts.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...