Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
iScience ; 24(7): 102755, 2021 Jul 23.
Article in English | MEDLINE | ID: mdl-34278263

ABSTRACT

The need to include the genetic variation within a population into a reference genome led to the concept of a genome sequence graph. Nodes of such a graph are labeled with DNA sequences occurring in represented genomes. Due to double-stranded nature of DNA, each node may be oriented in one of two possible ways, resulting in marking one end of the labeling sequence as in-side and the other as out-side. Edges join pairs of sides and reflect adjacency between node sequences in genomes constituting the graph. Linearization of a sequence graph aims at orienting and ordering graph nodes in a way that makes it more efficient for visualization and further analysis, e.g. access and traversal. We propose a new linearization algorithm, called ALIBI - Algorithm for Linearization by Incremental graph BuIlding. The evaluation shows that ALIBI is computationally very efficient and generates high-quality results.

2.
BMC Genomics ; 21(Suppl 2): 274, 2020 Apr 16.
Article in English | MEDLINE | ID: mdl-32299360

ABSTRACT

BACKGROUND: The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently. RESULTS: This work focuses on providing a tool for discovering and visualizing the relationships between the sequences constituting a pan-genome. A new structure to represent such relationships - called affinity tree - is proposed. Each node of this tree has assigned a subset of genomes, as well as their homogeneity level and averaged consensus sequence. Moreover, subsets assigned to sibling nodes form a partition of the genomes assigned to their parent. CONCLUSIONS: Functionality of affinity tree is demonstrated on simulated data and on the Ebola virus pan-genome. Furthermore, two software packages are provided: PangTreeBuild constructs affinity tree, while PangTreeVis presents its result.


Subject(s)
Ebolavirus/genetics , Genomics/methods , Algorithms , Computational Biology , Computer Simulation , Databases, Genetic , Models, Genetic , Phylogeny , Sequence Alignment , Software
3.
Nat Commun ; 10(1): 2313, 2019 05 24.
Article in English | MEDLINE | ID: mdl-31127121

ABSTRACT

DNA double-strand breaks (DSBs) are among the most lethal types of DNA damage and frequently cause genome instability. Sequencing-based methods for mapping DSBs have been developed but they allow measurement only of relative frequencies of DSBs between loci, which limits our understanding of the physiological relevance of detected DSBs. Here we propose quantitative DSB sequencing (qDSB-Seq), a method providing both DSB frequencies per cell and their precise genomic coordinates. We induce spike-in DSBs by a site-specific endonuclease and use them to quantify detected DSBs (labeled, e.g., using i-BLESS). Utilizing qDSB-Seq, we determine numbers of DSBs induced by a radiomimetic drug and replication stress, and reveal two orders of magnitude differences in DSB frequencies. We also measure absolute frequencies of Top1-dependent DSBs at natural replication fork barriers. qDSB-Seq is compatible with various DSB labeling methods in different organisms and allows accurate comparisons of absolute DSB frequencies across samples.


Subject(s)
Computational Biology/methods , DNA Breaks, Double-Stranded , Whole Genome Sequencing/methods , Cell Line, Tumor , DNA Replication/genetics , DNA Topoisomerases, Type I/metabolism , Genome, Fungal/genetics , Genome, Human/genetics , Humans , Saccharomycetales/genetics
4.
Mol Cell ; 72(2): 250-262.e6, 2018 10 18.
Article in English | MEDLINE | ID: mdl-30270107

ABSTRACT

Double-strand breaks (DSBs) are extremely detrimental DNA lesions that can lead to cancer-driving mutations and translocations. Non-homologous end joining (NHEJ) and homologous recombination (HR) represent the two main repair pathways operating in the context of chromatin to ensure genome stability. Despite extensive efforts, our knowledge of DSB-induced chromatin still remains fragmented. Here, we describe the distribution of 20 chromatin features at multiple DSBs spread throughout the human genome using ChIP-seq. We provide the most comprehensive picture of the chromatin landscape set up at DSBs and identify NHEJ- and HR-specific chromatin events. This study revealed the existence of a DSB-induced monoubiquitination-to-acetylation switch on histone H2B lysine 120, likely mediated by the SAGA complex, as well as higher-order signaling at HR-repaired DSBs whereby histone H1 is evicted while ubiquitin and 53BP1 accumulate over the entire γH2AX domains.


Subject(s)
Chromatin/genetics , DNA Repair/genetics , Histones/genetics , Cell Line, Tumor , DNA Breaks, Double-Stranded , Genomic Instability/genetics , Homologous Recombination/genetics , Humans , K562 Cells , Tumor Suppressor p53-Binding Protein 1/genetics
5.
Plant J ; 93(6): 1017-1031, 2018 03.
Article in English | MEDLINE | ID: mdl-29356198

ABSTRACT

Arabidopsis thaliana contains two nuclear XRN2/3 5'-3' exonucleases that are homologs of yeast and human Rat1/Xrn2 proteins involved in the processing and degradation of several classes of nuclear RNAs and in transcription termination of RNA polymerase II. Using strand-specific short read sequencing we show that knockdown of XRN3 leads to an altered expression of hundreds of genes and the accumulation of uncapped and polyadenylated read-through transcripts generated by inefficiently terminated Pol II. Our data support the notion that XRN3-mediated changes in the expression of a subset of genes are caused by upstream read-through transcription and these effects are enhanced by RNA-mRNA chimeras generated in xrn3 plants. In turn, read-through transcripts that are antisense to downstream genes may trigger production of siRNA. Our results highlight the importance of XRN3 exoribonuclease in Pol II transcription termination in plants and show that disturbance in this process may significantly alter gene expression.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis/genetics , Exoribonucleases/genetics , Gene Expression Regulation, Plant , RNA Interference , Transcription Termination, Genetic , Arabidopsis/metabolism , Arabidopsis Proteins/metabolism , Gene Expression Profiling , Mutation , RNA Polymerase II/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Plant/genetics , RNA, Plant/metabolism
6.
Blood ; 129(18): 2479-2492, 2017 05 04.
Article in English | MEDLINE | ID: mdl-28270450

ABSTRACT

Hematopoietic stem and progenitor cells (HSPCs) are vulnerable to endogenous damage and defects in DNA repair can limit their function. The 2 single-stranded DNA (ssDNA) binding proteins SSB1 and SSB2 are crucial regulators of the DNA damage response; however, their overlapping roles during normal physiology are incompletely understood. We generated mice in which both Ssb1 and Ssb2 were constitutively or conditionally deleted. Constitutive Ssb1/Ssb2 double knockout (DKO) caused early embryonic lethality, whereas conditional Ssb1/Ssb2 double knockout (cDKO) in adult mice resulted in acute lethality due to bone marrow failure and intestinal atrophy featuring stem and progenitor cell depletion, a phenotype unexpected from the previously reported single knockout models of Ssb1 or Ssb2 Mechanistically, cDKO HSPCs showed altered replication fork dynamics, massive accumulation of DNA damage, genome-wide double-strand breaks enriched at Ssb-binding regions and CpG islands, together with the accumulation of R-loops and cytosolic ssDNA. Transcriptional profiling of cDKO HSPCs revealed the activation of p53 and interferon (IFN) pathways, which enforced cell cycling in quiescent HSPCs, resulting in their apoptotic death. The rapid cell death phenotype was reproducible in in vitro cultured cDKO-hematopoietic stem cells, which were significantly rescued by nucleotide supplementation or after depletion of p53. Collectively, Ssb1 and Ssb2 control crucial aspects of HSPC function, including proliferation and survival in vivo by resolving replicative stress to maintain genomic stability.


Subject(s)
Cell Proliferation/physiology , DNA Breaks, Double-Stranded , Genomic Instability/physiology , Hematopoietic Stem Cells/metabolism , Suppressor of Cytokine Signaling Proteins/metabolism , Animals , Cell Survival/physiology , CpG Islands/physiology , Hematopoietic Stem Cells/cytology , Mice , Mice, Knockout , Suppressor of Cytokine Signaling Proteins/genetics , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism
7.
Int J Genomics ; 2015: 563482, 2015.
Article in English | MEDLINE | ID: mdl-26558255

ABSTRACT

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

8.
BMC Bioinformatics ; 16: 140, 2015 May 01.
Article in English | MEDLINE | ID: mdl-25927199

ABSTRACT

BACKGROUND: For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter. RESULTS: We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate. CONCLUSION: We provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets at http://bioputer.mimuw.edu.pl/papers/motifs/ .


Subject(s)
Chromatin Immunoprecipitation/methods , Computational Biology , Databases, Factual , Genome, Human , Nucleotide Motifs/genetics , Position-Specific Scoring Matrices , Transcription Factors/metabolism , Binding Sites , Humans , Protein Binding , Sequence Analysis, DNA
9.
Algorithms Mol Biol ; 9: 12, 2014.
Article in English | MEDLINE | ID: mdl-24735785

ABSTRACT

BACKGROUND: Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences. RESULTS: We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to the best progressive methods. Furthermore, MSARC outperforms them on sequence sets whose evolutionary distances are difficult to represent by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments. AVAILABILITY: MSARC is available at http://bioputer.mimuw.edu.pl/msarc.

10.
Bioinformatics ; 29(16): 2068-70, 2013 Aug 15.
Article in English | MEDLINE | ID: mdl-23818512

ABSTRACT

SUMMARY: Bayesian Networks (BNs) are versatile probabilistic models applicable to many different biological phenomena. In biological applications the structure of the network is usually unknown and needs to be inferred from experimental data. BNFinder is a fast software implementation of an exact algorithm for finding the optimal structure of the network given a number of experimental observations. Its second version, presented in this article, represents a major improvement over the previous version. The improvements include (i) a parallelized learning algorithm leading to an order of magnitude speed-ups in BN structure learning time; (ii) inclusion of an additional scoring function based on mutual information criteria; (iii) possibility of choosing the resulting network specificity based on statistical criteria and (iv) a new module for classification by BNs, including cross-validation scheme and classifier quality measurements with receiver operator characteristic scores. AVAILABILITY AND IMPLEMENTATION: BNFinder2 is implemented in python and freely available under the GNU general public license at the project Web site https://launchpad.net/bnfinder, together with a user's manual, introductory tutorial and supplementary methods.


Subject(s)
Models, Statistical , Software , Algorithms , Bayes Theorem , ROC Curve
11.
Nucleic Acids Res ; 41(15): 7240-59, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23771139

ABSTRACT

Using nuclear factor-κB (NF-κB) ChIP-Seq data, we present a framework for iterative learning of regulatory networks. For every possible transcription factor-binding site (TFBS)-putatively regulated gene pair, the relative distance and orientation are calculated to learn which TFBSs are most likely to regulate a given gene. Weighted TFBS contributions to putative gene regulation are integrated to derive an NF-κB gene network. A de novo motif enrichment analysis uncovers secondary TFBSs (AP1, SP1) at characteristic distances from NF-κB/RelA TFBSs. Comparison with experimental ENCODE ChIP-Seq data indicates that experimental TFBSs highly correlate with predicted sites. We observe that RelA-SP1-enriched promoters have distinct expression profiles from that of RelA-AP1 and are enriched in introns, CpG islands and DNase accessible sites. Sixteen novel NF-κB/RelA-regulated genes and TFBSs were experimentally validated, including TANK, a negative feedback gene whose expression is NF-κB/RelA dependent and requires a functional interaction with the AP1 TFBSs. Our probabilistic method yields more accurate NF-κB/RelA-regulated networks than a traditional, distance-based approach, confirmed by both analysis of gene expression and increased informativity of Genome Ontology annotations. Our analysis provides new insights into how co-occurring TFBSs and local chromatin context orchestrate activation of NF-κB/RelA sub-pathways differing in biological function and temporal expression patterns.


Subject(s)
Chromatin Immunoprecipitation/methods , Chromatin/metabolism , Gene Regulatory Networks , Genome, Human , NF-kappa B/analysis , Alu Elements , Binding Sites , Cell Line, Tumor , Chromatin/genetics , Chromatin Assembly and Disassembly , Humans , Models, Statistical , Molecular Sequence Annotation , NF-kappa B/genetics , Nucleotide Motifs , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, RNA , Transcription Factor RelA/genetics , Transcription Factor RelA/metabolism
12.
Nat Methods ; 10(4): 361-5, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23503052

ABSTRACT

We present a genome-wide approach to map DNA double-strand breaks (DSBs) at nucleotide resolution by a method we termed BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing). We validated and tested BLESS using human and mouse cells and different DSBs-inducing agents and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs and complex genome-wide DSB landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and we identified >2,000 nonuniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions, with a specificity and resolution unachievable by current techniques.


Subject(s)
DNA Breaks, Double-Stranded , Genomics/methods , Nucleic Acid Amplification Techniques/methods , Animals , Aphidicolin , Base Sequence , Cell Line, Tumor , Cloning, Molecular , DNA Replication , Fibroblasts/metabolism , Humans , Male , Mice , Mice, Inbred C57BL , Microsatellite Repeats , Physical Chromosome Mapping/methods , Sequence Analysis, DNA , Spleen , Testis , Virus Replication
13.
J Comput Biol ; 18(6): 809-19, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21563976

ABSTRACT

The identification of cis-regulatory modules (CRM) is one of the most important problems towards the understanding of transcriptional regulation in higher eukaryotes. Computational methods for CRM detection are gaining importance due to the availability of genomic data on one side, and costs and difficulties of experimental methods on the other side. One of proposed approaches, called Billboard, predicts CRMs based on the location of transcription factor binding sites in an analyzed sequence and a related one in so-called informant species. In the present article, we show how to combine information obtained in two symmetric runs (on the sequence of interest and on the related one) of the Billboard tool. In a series of experiments on data from various organisms, we show that the predictive power of our symmetric approach is significantly higher than the power of the one-way approach of Billboard. Moreover, we show that the evolutionary distance between organisms considerably influences the quality of prediction and we provide guidelines on the choice of an informant species.


Subject(s)
Regulatory Elements, Transcriptional , Sequence Analysis, DNA/methods , Software , Animals , Drosophila/genetics , Gene Expression Regulation , Humans , Mice/genetics , Models, Genetic , Promoter Regions, Genetic , Rats/genetics , Statistics, Nonparametric
14.
BMC Syst Biol ; 4: 86, 2010 Jun 17.
Article in English | MEDLINE | ID: mdl-20565733

ABSTRACT

BACKGROUND: It is often desirable to separate effects of different regulators on gene expression, or to identify effects of the same regulator across several systems. Here, we focus on the rat brain following stroke or seizures, and demonstrate how the two tasks can be approached simultaneously. RESULTS: We applied SVD to time-series gene expression datasets from the rat experimental models of stroke and seizures. We demonstrate conservation of two eigensystems, reflecting inflammation and/or apoptosis (eigensystem 2) and neuronal synaptic activity (eigensystem 3), between the stroke and seizures. We analyzed cis-regulation of gene expression in the subspaces of the conserved eigensystems. Bayesian networks analysis was performed separately for either experimental model, with cross-system validation of the highest-ranking features. In this way, we correctly re-discovered the role of AP1 in the regulation of apoptosis, and the involvement of Creb and Egr in the regulation of synaptic activity-related genes. We identified a novel antagonistic effect of the motif recognized by the nuclear matrix attachment region-binding protein Satb1 on AP1-driven transcriptional activation, suggesting a link between chromatin loop structure and gene activation by AP1. The effects of motifs binding Satb1 and Creb on gene expression in brain conform to the assumption of the linear response model of gene regulation. Our data also suggest that numerous enhancers of neuronal-specific genes are important for their responsiveness to the synaptic activity. CONCLUSION: Eigensystems conserved between stroke and seizures separate effects of inflammation/apoptosis and neuronal synaptic activity, exerted by different transcription factors, on gene expression in rat brain.


Subject(s)
Gene Expression Regulation/physiology , Gene Regulatory Networks/physiology , Models, Biological , Seizures/physiopathology , Stroke/physiopathology , Animals , Apoptosis/physiology , Bayes Theorem , Databases, Genetic , Gene Expression Profiling , Matrix Attachment Region Binding Proteins/metabolism , Microarray Analysis , Rats , Regression Analysis , Synapses/physiology , Transcription Factor AP-1/metabolism
15.
BMC Bioinformatics ; 10: 82, 2009 Mar 10.
Article in English | MEDLINE | ID: mdl-19284541

ABSTRACT

BACKGROUND: Finding functional regulatory elements in DNA sequences is a very important problem in computational biology and providing a reliable algorithm for this task would be a major step towards understanding regulatory mechanisms on genome-wide scale. Major obstacles in this respect are that the fact that the amount of non-coding DNA is vast, and that the methods for predicting functional transcription factor binding sites tend to produce results with a high percentage of false positives. This makes the problem of finding regions significantly enriched in binding sites difficult. RESULTS: We develop a novel method for predicting regulatory regions in DNA sequences, which is designed to exploit the evolutionary conservation of regulatory elements between species without assuming that the order of motifs is preserved across species. We have implemented our method and tested its predictive abilities on various datasets from different organisms. CONCLUSION: We show that our approach enables us to find a majority of the known CRMs using only sequence information from different species together with currently publicly available motif data. Also, our method is robust enough to perform well in predicting CRMs, despite differences in tissue specificity and even across species, provided that the evolutionary distances between compared species do not change substantially. The complexity of the proposed algorithm is polynomial, and the observed running times show that it may be readily applied.


Subject(s)
Algorithms , Computational Biology/methods , Evolution, Molecular , Regulatory Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA/methods , Base Sequence , DNA/chemistry
16.
Bioinformatics ; 25(2): 286-7, 2009 Jan 15.
Article in English | MEDLINE | ID: mdl-18826957

ABSTRACT

MOTIVATION: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers. RESULTS: We present a BNFinder software, which allows for Bayesian network reconstruction from experimental data. It supports dynamic Bayesian networks and, if the variables are partially ordered, also static Bayesian networks. The main advantage of BNFinder is the use exact algorithm, which is at the same time very efficient (polynomial with respect to the number of observations).


Subject(s)
Bayes Theorem , Software , Algorithms , Computer Simulation , Information Storage and Retrieval/methods
17.
BMC Bioinformatics ; 7: 249, 2006 May 08.
Article in English | MEDLINE | ID: mdl-16681847

ABSTRACT

BACKGROUND: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. RESULTS: We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. CONCLUSION: We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Gene Expression/physiology , Models, Biological , Oligonucleotide Array Sequence Analysis/methods , Proteome/metabolism , Signal Transduction/physiology , Artifacts , Artificial Intelligence , Bayes Theorem , Computer Simulation , Models, Statistical , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...