Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38597877

ABSTRACT

MOTIVATION: Phylogenetics has moved into the era of genomics, incorporating enormous volumes of data to study questions at both shallow and deep scales. With this increase in information, phylogeneticists need new tools and skills to manipulate and analyze these data. To facilitate these tasks and encourage reproducibility, the community is increasingly moving toward automated workflows. RESULTS: Here we present pipesnake, a phylogenomics pipeline written in Nextflow for the processing, assembly, and phylogenetic estimation of genomic data from short-read sequences. pipesnake is an easy to use and efficient software package designed for this next era in phylogenetics. AVAILABILITY AND IMPLEMENTATION: pipesnake is publicly available on GitHub at https://github.com/AusARG/pipesnake and accompanied by documentation and a wiki/tutorial.


Subject(s)
Genomics , Phylogeny , Software , Genomics/methods
2.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33834181

ABSTRACT

MOTIVATION: The high accuracy of recent haplotype phasing tools is enabling the integration of haplotype (or phase) information more widely in genetic investigations. One such possibility is phase-aware expression quantitative trait loci (eQTL) analysis, where haplotype-based analysis has the potential to detect associations that may otherwise be missed by standard SNP-based approaches. RESULTS: We present eQTLHap, a novel method to investigate associations between gene expression and genetic variants, considering their haplotypic and genotypic effect. Using multiple simulations based on real data, we demonstrate that phase-aware eQTL analysis significantly outperforms typical SNP-based methods when the causal genetic architecture involves multiple SNPs. We show that phase-aware eQTL analysis is robust to phasing errors, showing only a minor impact ($<4\%$) on sensitivity. Applying eQTLHap to real GEUVADIS and GTEx datasets detects numerous novel eQTLs undetected by a single-SNP approach, with 22 eQTLs replicating across studies or tissue types, highlighting the utility of phase-aware eQTL analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/ziadbkh/eQTLHap. CONTACT: ziad.albkhetan@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online.


Subject(s)
Computational Biology/methods , Genome-Wide Association Study/methods , Haplotypes , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Algorithms , Gene Expression Regulation , Genotype , Humans , Internet , Linkage Disequilibrium
3.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33236761

ABSTRACT

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Subject(s)
Algorithms , Databases, Nucleic Acid , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Haplotypes , Humans
4.
BMC Bioinformatics ; 20(1): 540, 2019 Oct 30.
Article in English | MEDLINE | ID: mdl-31666002

ABSTRACT

BACKGROUND: Knowledge of phase, the specific allele sequence on each copy of homologous chromosomes, is increasingly recognized as critical for detecting certain classes of disease-associated mutations. One approach for detecting such mutations is through phased haplotype association analysis. While the accuracy of methods for phasing genotype data has been widely explored, there has been little attention given to phasing accuracy at haplotype block scale. Understanding the combined impact of the accuracy of phasing tool and the method used to determine haplotype blocks on the error rate within the determined blocks is essential to conduct accurate haplotype analyses. RESULTS: We present a systematic study exploring the relationship between seven widely used phasing methods and two common methods for determining haplotype blocks. The evaluation focuses on the number of haplotype blocks that are incorrectly phased. Insights from these results are used to develop a haplotype estimator based on a consensus of three tools. The consensus estimator achieved the most accurate phasing in all applied tests. Individually, EAGLE2, BEAGLE and SHAPEIT2 alternate in being the best performing tool in different scenarios. Determining haplotype blocks based on linkage disequilibrium leads to more correctly phased blocks compared to a sliding window approach. We find that there is little difference between phasing sections of a genome (e.g. a gene) compared to phasing entire chromosomes. Finally, we show that the location of phasing error varies when the tools are applied to the same data several times, a finding that could be important for downstream analyses. CONCLUSIONS: The choice of phasing and block determination algorithms and their interaction impacts the accuracy of phased haplotype blocks. This work provides guidance and evidence for the different design choices needed for analyses using haplotype blocks. The study highlights a number of issues that may have limited the replicability of previous haplotype analysis.


Subject(s)
Haplotypes , Algorithms , Linkage Disequilibrium
5.
Methods ; 166: 83-90, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30853548

ABSTRACT

We present machine learning models of human genome three-dimensional structure that combine one dimensional (linear) sequence specificity, epigenomic information, and transcription factor binding profiles, with the polymer-based biophysical simulations in order to explain the extensive long-range chromatin looping observed in ChIA-PET experiments for lymphoblastoid cells. Random Forest, Gradient Boosting Machine (GBM), and Deep Learning models were constructed and evaluated, when predicting high-resolution interactions within Topologically Associating Domains (TADs). The predicted interactions are consistent with the experimental long-read ChIA-PET interactions mediated by CTCF and RNAPOL2 for GM12878 cell line. The contribution of sequence information and chromatin state defined by epigenomic features to the prediction task is analyzed and reported, when using them separately and combined. Furthermore, we design three-dimensional models of chromatin contact domains (CCDs) using real (ChIA-PET) and predicted looping interactions. Initial results show a similarity between both types of 3D computational models (constructed from experimental or predicted interactions). This observation confirms the association between genome sequence, epigenomic and transcription factor profiles, and three-dimensional interactions.


Subject(s)
Chromatin/ultrastructure , Computer Simulation , Epigenomics , Machine Learning , Gene Expression Regulation/genetics , Genome, Human , Humans , Polymers/chemistry , Promoter Regions, Genetic/genetics , Protein Binding/genetics
6.
Sci Rep ; 8(1): 5217, 2018 03 26.
Article in English | MEDLINE | ID: mdl-29581440

ABSTRACT

This study aims to understand through statistical learning the basic biophysical mechanisms behind three-dimensional folding of epigenomes. The 3DEpiLoop algorithm predicts three-dimensional chromatin looping interactions within topologically associating domains (TADs) from one-dimensional epigenomics and transcription factor profiles using the statistical learning. The predictions obtained by 3DEpiLoop are highly consistent with the reported experimental interactions. The complex signatures of epigenomic and transcription factors within the physically interacting chromatin regions (anchors) are similar across all genomic scales: genomic domains, chromosomal territories, cell types, and different individuals. We report the most important epigenetic and transcription factor features used for interaction identification either shared, or unique for each of sixteen (16) cell lines. The analysis shows that CTCF interaction anchors are enriched by transcription factors yet deficient in histone modifications, while the opposite is true in the case of RNAP II mediated interactions. The code is available at the repository https://bitbucket.org/4dnucleome/3depiloop .


Subject(s)
CCCTC-Binding Factor/genetics , Chromatin/genetics , Genome, Human/genetics , RNA Polymerase II/genetics , Animals , Cell Line , Epigenomics , Gene Expression Regulation/genetics , Histone Code/genetics , Humans , Mice , Promoter Regions, Genetic/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...