Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 619(7971): 793-800, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37380777

RESUMO

Aneuploidies-whole-chromosome or whole-arm imbalances-are the most prevalent alteration in cancer genomes1,2. However, it is still debated whether their prevalence is due to selection or ease of generation as passenger events1,2. Here we developed a method, BISCUT, that identifies loci subject to fitness advantages or disadvantages by interrogating length distributions of telomere- or centromere-bounded copy-number events. These loci were significantly enriched for known cancer driver genes, including genes not detected through analysis of focal copy-number events, and were often lineage specific. BISCUT identified the helicase-encoding gene WRN as a haploinsufficient tumour-suppressor gene on chromosome 8p, which is supported by several lines of evidence. We also formally quantified the role of selection and mechanical biases in driving aneuploidy, finding that rates of arm-level copy-number alterations are most highly correlated with their effects on cellular fitness1,2. These results provide insight into the driving forces behind aneuploidy and its contribution to tumorigenesis.


Assuntos
Aneuploidia , Transformação Celular Neoplásica , Neoplasias , Humanos , Transformação Celular Neoplásica/genética , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Neoplasias/patologia , Oncogenes/genética , Telômero/genética , Centrômero/genética , Linhagem da Célula , Cromossomos Humanos Par 8/genética , Genes Supressores de Tumor
2.
Cell Syst ; 13(10): 817-829.e3, 2022 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-36265468

RESUMO

Computing distance between two genomes without alignments or even access to assemblies has many downstream analyses. However, alignment-free methods, including in the fast-growing field of genome skimming, are hampered by a significant methodological gap. While accurate methods (many k-mer-based) for assembly-free distance calculation exist, measuring the uncertainty of estimated distances has not been sufficiently studied. In this paper, we show that bootstrapping, the standard non-parametric method of measuring estimator uncertainty, is not accurate for k-mer-based methods that rely on k-mer frequency profiles. Instead, we propose using subsampling (with no replacement) in combination with a correction step to reduce the variance of the inferred distribution. We show that the distribution of distances using our procedure matches the true uncertainty of the estimator. The resulting phylogenetic support values effectively differentiate between correct and incorrect branches and identify controversial branches that change across alignment-free and alignment-based phylogenies reported in the literature.


Assuntos
Algoritmos , Genoma , Filogenia , Alinhamento de Sequência , Incerteza
3.
NAR Genom Bioinform ; 4(2): lqac032, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35493723

RESUMO

DNA viruses are important infectious agents known to mediate a large number of human diseases, including cancer. Viral integration into the host genome and the formation of hybrid transcripts are also associated with increased pathogenicity. The high variability of viral genomes, however requires the use of sensitive ensemble hidden Markov models that add to the computational complexity, often requiring > 40 CPU-hours per sample. Here, we describe FastViFi, a fast 2-stage filtering method that reduces the computational burden. On simulated and cancer genomic data, FastViFi improved the running time by 2 orders of magnitude with comparable accuracy on challenging data sets. Recently published methods have focused on identification of location of viral integration into the human host genome using local assembly, but do not extend to RNA. To identify human viral hybrid transcripts, we additionally developed ensemble Hidden Markov Models for the Epstein Barr virus (EBV) to add to the models for Hepatitis B (HBV), Hepatitis C (HCV) viruses and the Human Papillomavirus (HPV), and used FastViFi to query RNA-seq data from Gastric cancer (EBV) and liver cancer (HBV/HCV). FastViFi ran in <10 minutes per sample and identified multiple hybrids that fuse viral and human genes suggesting new mechanisms for oncoviral pathogenicity. FastViFi is available at https://github.com/sara-javadzadeh/FastViFi.

4.
PLoS Comput Biol ; 17(11): e1009449, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34780468

RESUMO

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.


Assuntos
Algoritmos , Genoma , Genômica/estatística & dados numéricos , Sequências Repetitivas de Ácido Nucleico , Software , Animais , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Humanos , Invertebrados/classificação , Invertebrados/genética , Análise dos Mínimos Quadrados , Modelos Lineares , Mamíferos/classificação , Mamíferos/genética , Modelos Genéticos , Filogenia , Plantas/classificação , Plantas/genética , Vertebrados/classificação , Vertebrados/genética
5.
Syst Biol ; 69(3): 566-578, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31545363

RESUMO

Placing a new species on an existing phylogeny has increasing relevance to several applications. Placement can be used to update phylogenies in a scalable fashion and can help identify unknown query samples using (meta-)barcoding, skimming, or metagenomic data. Maximum likelihood (ML) methods of phylogenetic placement exist, but these methods are not scalable to reference trees with many thousands of leaves, limiting their ability to enjoy benefits of dense taxon sampling in modern reference libraries. They also rely on assembled sequences for the reference set and aligned sequences for the query. Thus, ML methods cannot analyze data sets where the reference consists of unassembled reads, a scenario relevant to emerging applications of genome skimming for sample identification. We introduce APPLES, a distance-based method for phylogenetic placement. Compared to ML, APPLES is an order of magnitude faster and more memory efficient, and unlike ML, it is able to place on large backbone trees (tested for up to 200,000 leaves). We show that using dense references improves accuracy substantially so that APPLES on dense trees is more accurate than ML on sparser trees, where it can run. Finally, APPLES can accurately identify samples without assembled reference or aligned queries using kmer-based distances, a scenario that ML cannot handle. APPLES is available publically at github.com/balabanmetin/apples.


Assuntos
Classificação/métodos , Filogenia , Software , Algoritmos , Sequência de Bases
6.
Cell Syst ; 8(6): 523-529.e4, 2019 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-31202632

RESUMO

Genome annotation remains a fundamental effort in modern biology. With reducing costs and new forms of sequencing technologies, annotations specific to tissue type and experimental conditions are continually being generated (e.g., histone methylation marks). Computing the statistical significance of overlap between two different annotations is key to many biological findings but has not been systematically addressed previously. We formalize the problem as follows: let I and If each describe a collection of n and m intervals of a genome with particular annotation. Under the null hypothesis that genomic intervals in I are randomly arranged with respect to If, what is the significance of k of m intervals of If intersecting with intervals in I? We describe a tool iSTAT that implements a combinatorial algorithm to accurately compute p values. We applied iSTAT to simulated and real datasets to obtain precise estimates and contrasted them against previous results using permutation or parametric tests.


Assuntos
Genoma Humano , Modelos Estatísticos , Anotação de Sequência Molecular , Software , Algoritmos , Conjuntos de Dados como Assunto , Humanos
7.
Genome Biol ; 20(1): 34, 2019 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-30760303

RESUMO

The ability to inexpensively describe taxonomic diversity is critical in this era of rapid climate and biodiversity changes. The recent genome-skimming approach extends current barcoding practices beyond short markers by applying low-pass sequencing and recovering whole organelle genomes computationally. This approach discards the nuclear DNA, which constitutes the vast majority of the data. In contrast, we suggest using all unassembled reads. We introduce an assembly-free and alignment-free tool, Skmer, to compute genomic distances between the query and reference genome skims. Skmer shows excellent accuracy in estimating distances and identifying the closest match in reference datasets.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Genoma de Inseto , Genômica/métodos , Modelos Genéticos , Animais , Aves/genética , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...