Pesquisa | Portal Regional da BVS

Dot2dot: accurate whole-genome tandem repeats discovery.

Genovese, Loredana M; Mosca, Marco M; Pellegrini, Marco; Geraci, Filippo.

Bioinformatics ; 35(6): 914-922, 2019 03 15.

Artigo em Inglês | MEDLINE | ID: mdl-30165507

RESUMO

MOTIVATION: Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms. RESULTS: Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC. AVAILABILITY AND IMPLEMENTATION: Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Sequências de Repetição em Tandem , Algoritmos , Eucariotos , Análise de Sequência de DNA

Genome-wide analysis of NGS data to compile cancer-specific panels of miRNA biomarkers.

Bhowmick, Shib Sankar; Saha, Indrajit; Bhattacharjee, Debotosh; Genovese, Loredana M; Geraci, Filippo.

PLoS One ; 13(7): e0200353, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30048452

RESUMO

MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3' UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.

Assuntos

Detecção Precoce de Câncer , Sequenciamento de Nucleotídeos em Larga Escala , MicroRNAs/genética , Neoplasias/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Mineração de Dados , Detecção Precoce de Câncer/métodos , Humanos , MicroRNAs/metabolismo , Neoplasias/classificação , Neoplasias/metabolismo , Máquina de Vetores de Suporte

A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies.

Genovese, Loredana M; Geraci, Filippo; Corrado, Lucia; Mangano, Eleonora; D'Aurizio, Romina; Bordoni, Roberta; Severgnini, Marco; Manzini, Giovanni; De Bellis, Gianluca; D'Alfonso, Sandra; Pellegrini, Marco.

Front Genet ; 9: 155, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29770143

RESUMO

Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.

SpeedHap: an accurate heuristic for the single individual SNP haplotyping problem with many gaps, high reading error rate and low coverage.

Genovese, Loredana M; Geraci, Filippo; Pellegrini, Marco.

IEEE/ACM Trans Comput Biol Bioinform ; 5(4): 492-502, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18989037

RESUMO

Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Humanos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA