Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Bioinformatics ; 38(Suppl_2): ii49-ii55, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124798

RESUMO

MOTIVATION: Tumors are the result of a somatic evolutionary process leading to substantial intra-tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. RESULTS: We present a new algorithm, MAximal tumor treeS TRajectOries (MASTRO), to discover significantly conserved evolutionary trajectories in cancer. MASTRO discovers all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations. MASTRO assesses the significance of the trajectories using a conditional statistical test that captures the coherence in the order in which alterations are observed in different tumors. We apply MASTRO to data from nonsmall-cell lung cancer bulk sequencing and to acute myeloid leukemia data from single-cell panel sequencing, and find significant evolutionary trajectories recapitulating and extending the results reported in the original studies. AVAILABILITY AND IMPLEMENTATION: MASTRO is available at https://github.com/VandinLab/MASTRO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Evolução Clonal , Humanos , Filogenia , Software
2.
Bioinformatics ; 38(13): 3343-3350, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35583271

RESUMO

MOTIVATION: The extraction of k-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all k-mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of k-mers to be considered. However, in several applications, only frequent k-mers, which are k-mers appearing in a relatively high proportion of the data, are required by the analysis. RESULTS: In this work, we present SPRISS, a new efficient algorithm to approximate frequent k-mers and their frequencies in next-generation sequencing data. SPRISS uses a simple yet powerful reads sampling scheme, which allows to extract a representative subset of the dataset that can be used, in combination with any k-mer counting algorithm, to perform downstream analyses in a fraction of the time required by the analysis of the whole data, while obtaining comparable answers. Our extensive experimental evaluation demonstrates the efficiency and accuracy of SPRISS in approximating frequent k-mers, and shows that it can be used in various scenarios, such as the comparison of metagenomic datasets, the identification of discriminative k-mers, and SNP (single nucleotide polymorphism) genotyping, to extract insights in a fraction of the time required by the analysis of the whole dataset. AVAILABILITY AND IMPLEMENTATION: SPRISS [a preliminary version (Santoro et al., 2021) of this work was presented at RECOMB 2021] is available at https://github.com/VandinLab/SPRISS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA , Algoritmos , Genômica
3.
Brief Bioinform ; 22(1): 88-95, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32577746

RESUMO

The study of microbial communities crucially relies on the comparison of metagenomic next-generation sequencing data sets, for which several methods have been designed in recent years. Here, we review three key challenges in the comparison of such data sets: species identification and quantification, the efficient computation of distances between metagenomic samples and the identification of metagenomic features associated with a phenotype such as disease status. We present current solutions for such challenges, considering both reference-based methods relying on a database of reference genomes and reference-free methods working directly on all sequencing reads from the samples.


Assuntos
Metagenômica/métodos , Microbiota/genética , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Metagenômica/normas
4.
iScience ; 23(10): 101619, 2020 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-33089107

RESUMO

Phenotypic heterogeneity in cancer is often caused by different patterns of genetic alterations. Understanding such phenotype-genotype relationships is fundamental for the advance of personalized medicine. We develop a computational method, named NETPHIX (NETwork-to-PHenotype association with eXclusivity) to identify subnetworks of genes whose genetic alterations are associated with drug response or other continuous cancer phenotypes. Leveraging interaction information among genes and properties of cancer mutations such as mutual exclusivity, we formulate the problem as an integer linear program and solve it optimally to obtain a subnetwork of associated genes. Applied to a large-scale drug screening dataset, NETPHIX uncovered gene modules significantly associated with drug responses. Utilizing interaction information, NETPHIX modules are functionally coherent and can thus provide important insights into drug action. In addition, we show that modules identified by NETPHIX together with their association patterns can be leveraged to suggest drug combinations.

5.
J Comput Biol ; 27(4): 534-549, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31891535

RESUMO

Estimating the abundances of all k-mers in a set of biological sequences is a fundamental and challenging problem with many applications in biological analysis. Although several methods have been designed for the exact or approximate solution of this problem, they all require to process the entire data set, which can be extremely expensive for high-throughput sequencing data sets. Although in some applications it is crucial to estimate all k-mers and their abundances, in other situations it may be sufficient to report only frequent k-mers, which appear with relatively high frequency in a data set. This is the case, for example, in the computation of k-mers' abundance-based distances among data sets of reads, commonly used in metagenomic analyses. In this study, we develop, analyze, and test a sampling-based approach, called Sampling Algorithm for K-mErs approxIMAtion (SAKEIMA), to approximate the frequent k-mers and their frequencies in a high-throughput sequencing data set while providing rigorous guarantees on the quality of the approximation. SAKEIMA employs an advanced sampling scheme and we show how the characterization of the Vapnik-Chervonenkis dimension, a core concept from statistical learning theory, of a properly defined set of functions leads to practical bounds on the sample size required for a rigorous approximation. Our experimental evaluation shows that SAKEIMA allows to rigorously approximate frequent k-mers by processing only a fraction of a data set and that the frequencies estimated by SAKEIMA lead to accurate estimates of k-mer-based distances between high-throughput sequencing data sets. Overall, SAKEIMA is an efficient and rigorous tool to estimate k-mers' abundances providing significant speedups in the analysis of large sequencing data sets.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Biologia Computacional , Metagenoma/genética , Tamanho da Amostra
6.
PLoS Comput Biol ; 15(5): e1006802, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31120875

RESUMO

Recent large cancer studies have measured somatic alterations in an unprecedented number of tumours. These large datasets allow the identification of cancer-related sets of genetic alterations by identifying relevant combinatorial patterns. Among such patterns, mutual exclusivity has been employed by several recent methods that have shown its effectiveness in characterizing gene sets associated to cancer. Mutual exclusivity arises because of the complementarity, at the functional level, of alterations in genes which are part of a group (e.g., a pathway) performing a given function. The availability of quantitative target profiles, from genetic perturbations or from clinical phenotypes, provides additional information that can be leveraged to improve the identification of cancer related gene sets by discovering groups with complementary functional associations with such targets. In this work we study the problem of finding groups of mutually exclusive alterations associated with a quantitative (functional) target. We propose a combinatorial formulation for the problem, and prove that the associated computational problem is computationally hard. We design two algorithms to solve the problem and implement them in our tool UNCOVER. We provide analytic evidence of the effectiveness of UNCOVER in finding high-quality solutions and show experimentally that UNCOVER finds sets of alterations significantly associated with functional targets in a variety of scenarios. In particular, we show that our algorithms find sets which are better than the ones obtained by the state-of-the-art method, even when sets are evaluated using the statistical score employed by the latter. In addition, our algorithms are much faster than the state-of-the-art, allowing the analysis of large datasets of thousands of target profiles from cancer cell lines. We show that on two such datasets, one from project Achilles and one from the Genomics of Drug Sensitivity in Cancer project, UNCOVER identifies several significant gene sets with complementary functional associations with targets. Software available at: https://github.com/VandinLab/UNCOVER.


Assuntos
Biologia Computacional/métodos , Neoplasias/genética , Análise de Sequência de DNA/métodos , Algoritmos , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Teste de Complementação Genética/métodos , Genômica/métodos , Humanos , Mutação , Software
7.
Algorithms Mol Biol ; 14: 10, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30976291

RESUMO

PROBLEM: We study the problem of identifying differentially mutated subnetworks of a large gene-gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard. ALGORITHM: We propose a novel and efficient algorithm, called DAMOKLE, to identify differentially mutated subnetworks given genome-wide mutation data for two sets of cancer samples. We prove that DAMOKLE identifies subnetworks with statistically significant difference in mutation frequency when the data comes from a reasonable generative model, provided enough samples are available. EXPERIMENTAL RESULTS: We test DAMOKLE on simulated and real data, showing that DAMOKLE does indeed find subnetworks with significant differences in mutation frequency and that it provides novel insights into the molecular mechanisms of the disease not revealed by standard methods.

8.
Front Genet ; 10: 265, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31024613

RESUMO

Next-generation sequencing technologies allow to measure somatic mutations in a large number of patients from the same cancer type: one of the main goals in their analysis is the identification of mutations associated with clinical parameters. The identification of such relationships is hindered by extensive genetic heterogeneity in tumors, with different genes mutated in different patients, due, in part, to the fact that genes and mutations act in the context of pathways: it is therefore crucial to study mutations in the context of interactions among genes. In this work we study the problem of identifying subnetworks of a large gene-gene interaction network with mutations associated with survival time. We formally define the associated computational problem by using a score for subnetworks based on the log-rank statistical test to compare the survival of two given populations. We propose a novel approach, based on a new algorithm, called Network of Mutations Associated with Survival (NoMAS) to find subnetworks of a large interaction network whose mutations are associated with survival time. NoMAS is based on the color-coding technique, that has been previously employed in other applications to find the highest scoring subnetwork with high probability when the subnetwork score is additive. In our case the score is not additive, so our algorithm cannot identify the optimal solution with the same guarantees associated to additive scores. Nonetheless, we prove that, under a reasonable model for mutations in cancer, NoMAS identifies the optimal solution with high probability. We also design a holdout approach to identify subnetworks significantly associated with survival time. We test NoMAS on simulated and cancer data, comparing it to approaches based on single gene tests and to various greedy approaches. We show that our method does indeed find the optimal solution and performs better than the other approaches. Moreover, on three cancer datasets our method identifies subnetworks with significant association to survival when none of the genes has significant association with survival when considered in isolation.

9.
BMC Bioinformatics ; 20(1): 17, 2019 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-30626316

RESUMO

BACKGROUND: Translational and post-translational control mechanisms in the cell result in widely observable differences between measured gene transcription and protein abundances. Herein, protein complexes are among the most tightly controlled entities by selective degradation of their individual proteins. They furthermore act as control hubs that regulate highly important processes in the cell and exhibit a high functional diversity due to their ability to change their composition and their structure. Better understanding and prediction of these functional states demands methods for the characterization of complex composition, behavior, and abundance across multiple cell states. Mass spectrometry provides an unbiased approach to directly determine protein abundances across different cell populations and thus to profile a comprehensive abundance map of proteins. RESULTS: We provide a tool to investigate the behavior of protein subunits in known complexes by comparing their abundance profiles across up to 140 cell types available in ProteomicsDB. Thorough assessment of different randomization methods and statistical scoring algorithms allows determining the significance of concurrent profiles within a complex, therefore providing insights into the conservation of their composition across human cell types as well as the identification of intrinsic structures in complex behavior to determine which proteins orchestrate complex function. This analysis can be extended to investigate common profiles within arbitrary protein groups. CoExpresso can be accessed through http://computproteomics.bmb.sdu.dk/Apps/CoExpresso . CONCLUSIONS: With the CoExpresso web service, we offer a potent scoring scheme to assess proteins for their co-regulation and thereby offer insight into their potential for forming functional groups like protein complexes.


Assuntos
Proteínas/metabolismo , Proteômica/métodos , Algoritmos , Humanos
10.
Eur J Hum Genet ; 27(4): 631-636, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30659261

RESUMO

Genetic interaction is a crucial issue in the understanding of functional pathways underlying complex diseases. However, detecting such interaction effects is challenging in terms of both methodology and statistical power. We address this issue by introducing a disease-concordant twin-case-only design, which applies to both monozygotic and dizygotic twins. To investigate the power, we conducted a computer simulation study by setting a series of parameter schemes with different minor allele frequencies and relative risks. Results from the simulation study reveals that the disease-concordant twin-case-only design largely reduces sample size required for sufficient power compared to the ordinary case-only design for detecting gene-gene interaction using unrelated individuals. Sample sizes for dizygotic and monozygotic twins were roughly 1/2 and 1/4 of sample sizes in the ordinary case-only design. Since dizygotic twins are genetically similar as siblings, the enriched power for dizygotic twins also applies to affected siblings, which could help to largely extend the application of the powerful twin-case-only design. In summary, our simulation reveals high value of disease-concordant twins and siblings in efficiently detecting gene-by-gene interactions.


Assuntos
Doenças em Gêmeos/genética , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Simulação por Computador/estatística & dados numéricos , Doenças em Gêmeos/patologia , Feminino , Frequência do Gene/genética , Doenças Genéticas Inatas/patologia , Humanos , Masculino , Risco , Tamanho da Amostra , Irmãos , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética
12.
Nucleic Acids Res ; 45(16): e151, 2017 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-28934488

RESUMO

Gene expression profiles have been extensively discussed as an aid to guide the therapy by predicting disease outcome for the patients suffering from complex diseases, such as cancer. However, prediction models built upon single-gene (SG) features show poor stability and performance on independent datasets. Attempts to mitigate these drawbacks have led to the development of network-based approaches that integrate pathway information to produce meta-gene (MG) features. Also, MG approaches have only dealt with the two-class problem of good versus poor outcome prediction. Stratifying patients based on their molecular subtypes can provide a detailed view of the disease and lead to more personalized therapies. We propose and discuss a novel MG approach based on de novo pathways, which for the first time have been used as features in a multi-class setting to predict cancer subtypes. Comprehensive evaluation in a large cohort of breast cancer samples from The Cancer Genome Atlas (TCGA) revealed that MGs are considerably more stable than SG models, while also providing valuable insight into the cancer hallmarks that drive them. In addition, when tested on an independent benchmark non-TCGA dataset, MG features consistently outperformed SG models. We provide an easy-to-use web service at http://pathclass.compbio.sdu.dk where users can upload their own gene expression datasets from breast cancer studies and obtain the subtype predictions from all the classifiers.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/classificação , Neoplasias da Mama/metabolismo , Metilação de DNA , Feminino , Genes Neoplásicos , Humanos
13.
Genome Res ; 27(9): 1573-1588, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28768687

RESUMO

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the "random walk facility location" (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: "multihitting time," the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients' survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology.


Assuntos
Neoplasias da Mama/genética , Variações do Número de Cópias de DNA/genética , Software , Transcriptoma/genética , Neoplasias da Mama/patologia , Biologia Computacional , Feminino , Genômica , Humanos , Mutação , Mapas de Interação de Proteínas/genética
14.
Front Genet ; 8: 83, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28659971

RESUMO

Advances in DNA sequencing technologies have allowed the characterization of somatic mutations in a large number of cancer genomes at an unprecedented level of detail, revealing the extreme genetic heterogeneity of cancer at two different levels: inter-tumor, with different patients of the same cancer type presenting different collections of somatic mutations, and intra-tumor, with different clones coexisting within the same tumor. Both inter-tumor and intra-tumor heterogeneity have crucial implications for clinical practices. Here, we review computational methods that use somatic alterations measured through next-generation DNA sequencing technologies for characterizing tumor heterogeneity and its association with clinical variables. We first review computational methods for studying inter-tumor heterogeneity, focusing on methods that attempt to summarize cancer heterogeneity by discovering pathways that are commonly mutated across different patients of the same cancer type. We then review computational methods for characterizing intra-tumor heterogeneity using information from bulk sequencing data or from single cell sequencing data. Finally, we present some of the recent computational methodologies that have been proposed to identify and assess the association between inter- or intra-tumor heterogeneity with clinical variables.

15.
Bioinformatics ; 33(4): 549-551, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27794558

RESUMO

Motivation: Epigenome-wide association studies (EWAS) generate big epidemiological datasets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group. Many bioinformatics tools exist for raw data analysis. However, most of them require some knowledge in the programming language R, have no user interface, and do not offer all necessary steps to guide users from raw data all the way down to statistically significant differentially methylated regions (DMRs) and the associated genes. Results: Here, we present DiMmeR (Discovery of Multiple Differentially Methylated Regions), the first free standalone software that interactively guides with a user-friendly graphical user interface (GUI) scientists the whole way through EWAS data analysis. It offers parallelized statistical methods for efficiently identifying DMRs in both Illumina 450K and 850K EPIC chip data. DiMmeR computes empirical P -values through randomization tests, even for big datasets of hundreds of patients and thousands of permutations within a few minutes on a standard desktop PC. It is independent of any third-party libraries, computes regression coefficients, P -values and empirical P -values, and it corrects for multiple testing. Availability and Implementation: DiMmeR is publicly available at http://dimmer.compbio.sdu.dk . Contact: diogoma@bmb.sdu.dk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Ilhas de CpG , Metilação de DNA , Epigenômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Humanos
16.
Ann Hum Genet ; 81(1): 20-26, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28009044

RESUMO

Genome-wide association studies with moderate sample sizes are underpowered, especially when testing SNP alleles with low allele counts, a situation that may lead to high frequency of false-positive results and lack of replication in independent studies. Related individuals, such as twin pairs concordant for a disease, should confer increased power in genetic association analysis because of their genetic relatedness. We conducted a computer simulation study to explore the power advantage of the disease-concordant twin design, which uses singletons from disease-concordant twin pairs as cases and ordinary healthy samples as controls. We examined the power gain of the twin-based design for various scenarios (i.e., cases from monozygotic and dizygotic twin pairs concordant for a disease) and compared the power with the ordinary case-control design with cases collected from the unrelated patient population. Simulation was done by assigning various allele frequencies and allelic relative risks for different mode of genetic inheritance. In general, for achieving a power estimate of 80%, the sample sizes needed for dizygotic and monozygotic twin cases were one half and one fourth of the sample size of an ordinary case-control design, with variations depending on genetic mode. Importantly, the enriched power for dizygotic twins also applies to disease-concordant sibling pairs, which largely extends the application of the concordant twin design. Overall, our simulation revealed a high value of disease-concordant twins in genetic association studies and encourages the use of genetically related individuals for highly efficiently identifying both common and rare genetic variants underlying human complex diseases without increasing laboratory cost.


Assuntos
Doenças em Gêmeos/genética , Estudo de Associação Genômica Ampla , Simulação por Computador , Frequência do Gene , Predisposição Genética para Doença , Humanos , Modelos Genéticos , Risco , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética
18.
Ann Hum Genet ; 80(2): 81-7, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26831219

RESUMO

Poor nutrition during critical growth phases may alter the structural and physiologic development of vital organs thus "programming" the susceptibility to adult-onset diseases and disease-related health conditions. Epigenome-wide association studies have been performed in birth-weight discordant twin pairs to find evidence for such "programming" effects, but no significant results emerged. We further investigated this issue using a new computational approach: Instead of probing single genomic sites for significant alterations in epigenetic marks, we scan for differentially methylated genomic regions. Whole genome DNA methylation levels were measured in whole blood from 150 pairs of adult identical twins discordant for birth-weight. Intrapair differential DNA methylation was associated with qualitative (large or small) and quantitative (percentage) birth-weight discordance at each genomic site using regression models adjusting for age and sex. Based on the regression results, genomic regions with consistent alteration patterns of DNA methylation were located and tested for significant robustness using computational permutation tests. This yielded an interesting genomic region on chromosome 1, which is significantly differentially methylated for quantitative birth-weight discordance. The region covers two genes (TYW3 and CRYZ) both reportedly associated with metabolism. We conclude that prenatal conditions for birth-weight discordance may result in persistent epigenetic modifications potentially affecting even adult health.


Assuntos
Peso ao Nascer , Metilação de DNA , Epigênese Genética , Adulto , Idoso , Feminino , Genoma Humano , Genômica , Humanos , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Gêmeos Monozigóticos
20.
J Integr Bioinform ; 13(4): 294, 2016 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-28187410

RESUMO

Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.


Assuntos
Biologia Computacional , Metilação de DNA , Linguagens de Programação , Estudos de Coortes , Estudo de Associação Genômica Ampla , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...