Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Pharmacol ; 15: 1352311, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38495102

RESUMO

Friedreich's ataxia (FRDA), the most common recessive inherited ataxia, results from homozygous guanine-adenine-adenine (GAA) repeat expansions in intron 1 of the FXN gene, which leads to the deficiency of frataxin, a mitochondrial protein essential for iron-sulphur cluster synthesis. The study of frataxin protein regulation might yield new approaches for FRDA treatment. Here, we report tumorous imaginal disc 1 (TID1), a mitochondrial J-protein cochaperone, as a binding partner of frataxin that negatively controls frataxin protein levels. TID1 interacts with frataxin both in vivo in mouse cortex and in vitro in cortical neurons. Acute and subacute depletion of frataxin using RNA interference markedly increases TID1 protein levels in multiple cell types. In addition, TID1 overexpression significantly increases frataxin precursor but decreases intermediate and mature frataxin levels in HEK293 cells. In primary cultured human skin fibroblasts, overexpression of TID1S results in decreased levels of mature frataxin and increased fragmentation of mitochondria. This effect is mediated by the last 6 amino acids of TID1S as a peptide made from this sequence rescues frataxin deficiency and mitochondrial defects in FRDA patient-derived cells. Our findings show that TID1 negatively modulates frataxin levels, and thereby suggests a novel therapeutic target for treating FRDA.

2.
Bioinform Adv ; 3(1): vbad162, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38023332

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ∼10× faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

3.
Nat Commun ; 14(1): 2906, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217507

RESUMO

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.


Assuntos
Algoritmos , Genoma , Humanos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
4.
bioRxiv ; 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37214907

RESUMO

Motivation: K-mer hashing is a common operation in many foundational bioinformatics problems. However, generic string hashing algorithms are not optimized for this application. Strings in bioinformatics use specific alphabets, a trait leveraged for nucleic acid sequences in earlier work. We note that amino acid sequences, with complexities and context that cannot be captured by generic hashing algorithms, can also benefit from a domain-specific hashing algorithm. Such a hashing algorithm can accelerate and improve the sensitivity of bioinformatics applications developed for protein sequences. Results: Here, we present aaHash, a recursive hashing algorithm tailored for amino acid sequences. This algorithm utilizes multiple hash levels to represent biochemical similarities between amino acids. aaHash performs ~10X faster than generic string hashing algorithms in hashing adjacent k-mers. Availability and implementation: aaHash is available online at https://github.com/bcgsc/btllib and is free for academic use.

5.
Curr Protoc ; 3(4): e733, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37039735

RESUMO

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling, and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ntLink scaffolding using overlap detection Basic Protocol 2: ntLink scaffolding with gap-filling Basic Protocol 3: Running in-code iterations of ntLink scaffolding Alternate Protocol 1: Generating long-read to contig mappings with ntLink Alternate Protocol 2: Using ntLink mappings for genome assembly correction with Tigmint-long Support Protocol: Installing ntLink.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Genoma
6.
Bioinformatics ; 38(20): 4812-4813, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36000872

RESUMO

MOTIVATION: Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research. RESULTS: ntHash2 is up to 2.1× faster at hashing various spaced seeds than the previous version and 3.8× faster than conventional hashing algorithms with naïve adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism. AVAILABILITY AND IMPLEMENTATION: ntHash2 is freely available online at github.com/bcgsc/ntHash under an MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Sequência de Bases , Sementes , Análise de Sequência de DNA
7.
Plant J ; 111(5): 1469-1485, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35789009

RESUMO

Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.


Assuntos
Picea , Traqueófitas , Etiquetas de Sequências Expressas , Genoma de Planta/genética , Família Multigênica/genética , Filogenia , Picea/genética , Traqueófitas/genética
8.
BMC Bioinformatics ; 23(1): 246, 2022 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729491

RESUMO

BACKGROUND: De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by [Formula: see text] bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes. RESULTS: Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%. CONCLUSIONS: RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .


Assuntos
Genômica , Software , Algoritmos , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos
9.
Curr Protoc ; 2(5): e442, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35567771

RESUMO

High-quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long-read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory-intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment-free, k-mer-based genome finishing protocol that employs memory-efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error-corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long-read genome finishing with short reads Support Protocol: Selecting optimal values for k-mer lengths (k) and Bloom filter size (b).


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polônia , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos
10.
BMC Bioinformatics ; 22(1): 534, 2021 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-34717540

RESUMO

BACKGROUND: Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. RESULTS: LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. CONCLUSIONS: Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Genoma , Humanos , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA
11.
BMC Biol ; 19(1): 217, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-34587965

RESUMO

BACKGROUND: DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. RESULTS: We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells ("R10.3") which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. CONCLUSIONS: We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle ("Flongle") while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.


Assuntos
Biodiversidade , Biologia Computacional , Código de Barras de DNA Taxonômico , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
12.
Bioorg Med Chem ; 28(11): 115472, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32279920

RESUMO

Friedreich's Ataxia (FRDA) is an incurable genetic disease caused by an expanded trinucleotide AAG repeat within intronic RNA of the frataxin (FXN) gene. We have previously demonstrated that synthetic antisense oligonucleotides or duplex RNAs that are complementary to the expanded repeat can activate expression of FXN and return levels of FXN protein to near normal. The potency of these compounds, however, was too low to encourage vigorous pre-clinical development. We now report testing of "gapmer" oligonucleotides consisting of a central DNA portion flanked by chemically modified RNA that increases binding affinity. We find that gapmer antisense oligonucleotides are several fold more potent activators of FXN expression relative to previously tested compounds. The potency of FXN activation is similar to a potent benchmark gapmer targeting the nuclear noncoding RNA MALAT-1, suggesting that our approach has potential for developing more effective compounds to regulate FXN expression in vivo.


Assuntos
Descoberta de Drogas , Ataxia de Friedreich/tratamento farmacológico , Proteínas de Ligação ao Ferro/genética , Oligonucleotídeos Antissenso/farmacologia , Células Cultivadas , Relação Dose-Resposta a Droga , Ataxia de Friedreich/genética , Ataxia de Friedreich/metabolismo , Humanos , Proteínas de Ligação ao Ferro/metabolismo , Estrutura Molecular , Oligonucleotídeos Antissenso/química , Relação Estrutura-Atividade , Frataxina
13.
BMC Genomics ; 19(1): 536, 2018 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-30005633

RESUMO

BACKGROUND: Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe. RESULTS: We built a cloud-based targeted de novo transcript assembly and analysis pipeline that incorporates our previously developed cleavage site prediction tool, KLEAT. We applied this pipeline to elucidate the APA profiles of 114 genes in 9939 tumor and 729 tissue normal samples from The Cancer Genome Atlas (TCGA). The full set of 10,668 RNA-Seq samples from 33 cancer types has not been utilized by previous APA studies. By comparing the frequencies of predicted cleavage sites between normal and tumor sample groups, we identified 77 events (i.e. gene-cancer type pairs) of tumor-specific APA regulation in 13 cancer types; for 15 genes, such regulation is recurrent across multiple cancers. Our results also support a previous report showing the 3' UTR shortening of FGF2 in multiple cancers. However, over half of the events we identified display complex changes to 3' UTR length that resist simple classification like shortening or lengthening. CONCLUSIONS: Recurrent tumor-specific regulation of APA is widespread in cancer. However, the regulation pattern that we observed in TCGA RNA-seq data cannot be described as straightforward 3' UTR shortening or lengthening. Continued investigation into this complex, nuanced regulatory landscape will provide further insight into its role in tumor formation and development.


Assuntos
Neoplasias/genética , RNA Mensageiro/genética , Regiões 3' não Traduzidas , Computação em Nuvem , Bases de Dados Genéticas , Fator 2 de Crescimento de Fibroblastos/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Recidiva Local de Neoplasia/genética , Neoplasias/patologia , Poliadenilação , Clivagem do RNA , RNA Mensageiro/metabolismo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...