Search | VHL Regional Portal

Show: 20 | 50 | 100

Results 1 - 10 de 10

Filter

Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.

Qiu, Yutong; Shen, Yihang; Kingsford, Carl.

Algorithms Mol Biol ; 19(1): 17, 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38679703

ABSTRACT

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/ .

Improving Hi-C contact matrices using genome graphs.

Shen, Yihang; Yu, Lingge; Qiu, Yutong; Zhang, Tianyu; Kingsford, Carl.

bioRxiv ; 2023 Nov 10.

Article in English | MEDLINE | ID: mdl-37986943

ABSTRACT

Three-dimensional chromosome structure plays an important role in fundamental genomic functions. Hi-C, a high-throughput, sequencing-based technique, has drastically expanded our comprehension of 3D chromosome structures. The first step of Hi-C analysis pipeline involves mapping sequencing reads from Hi-C to linear reference genomes. However, the linear reference genome does not incorporate genetic variation information, which can lead to incorrect read alignments, especially when analyzing samples with substantial genomic differences from the reference such as cancer samples. Using genome graphs as the reference facilitates more accurate mapping of reads, however, new algorithms are required for inferring linear genomes from Hi-C reads mapped on genome graphs and constructing corresponding Hi-C contact matrices, which is a prerequisite for the subsequent steps of the Hi-C analysis such as identifying topologically associated domains and calling chromatin loops. We introduce the problem of genome sequence inference from Hi-C data mediated by genome graphs. We formalize this problem, show the hardness of solving this problem, and introduce a novel heuristic algorithm specifically tailored to this problem. We provide a theoretical analysis to evaluate the efficacy of our algorithm. Finally, our empirical experiments indicate that the linear genomes inferred from our method lead to the creation of improved Hi-C contact matrices. These enhanced matrices show a reduction in erroneous patterns caused by structural variations and are more effective in accurately capturing the structures of topologically associated domains.

Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants.

Qiu, Yutong; Shen, Yihang; Kingsford, Carl.

ArXiv ; 2023 Nov 08.

Article in English | MEDLINE | ID: mdl-37292475

ABSTRACT

Long-term outcomes of COVID-19 convalescents: An 18.5-month longitudinal study in Wuhan.

Guo, Yi; Wang, Hao; Xiao, Mingzhong; Guan, Xin; Lei, Yanshou; Diao, Tingyue; Long, Pinpin; Zeng, Rui; Lai, Xuefeng; Cai, Hao; You, Yutong; Wen, Yuying; Li, Wenhui; Wang, Xi; Wang, Yufei; Chen, Qinlin; Yang, Yuchan; Qiu, Yutong; Chen, Jishuai; Zeng, Huidan; Ni, Wei; Zhao, Youyun; Ouyang, Kani; Wang, Jingzhi; Wang, Qi; Liu, Li; Song, Lulu; Wang, Youjie; Guo, Huan; Li, Xiaodong; Wu, Tangchun; Yuan, Yu.

Int J Infect Dis ; 127: 85-92, 2023 Feb.

Article in English | MEDLINE | ID: mdl-36509334

ABSTRACT

OBJECTIVES: This study aimed to describe the full scope of long-term outcomes and the ongoing pathophysiological alterations among COVID-19 survivors. METHODS: We established a longitudinal cohort of 208 COVID-19 convalescents and followed them at 3.3 (interquartile range [IQR]: 1.3, 4.4, visit 1), 9.2 (IQR: 9.0, 9.6, visit 2), and 18.5 (IQR: 18.2, 19.1, visit 3) months after infection, respectively. Serial changes in multiple physical and psychological outcomes were comprehensively characterized. We, in addition, explored the potential risk factors of SARS-CoV-2 antibody response and sequelae symptoms. RESULTS: We observed continuous improvement of sequelae symptoms, lung function, chest computed tomography (CT), 6-minute walk test, and the Borg dyspnea scale, whereas sequelae symptoms (at least one) and abnormal chest CT patterns still existed in 45.2% and about 30% of participants at 18.5 months, respectively. Anxiety and depression disorders were alleviated for the convalescents, although depression status was sustained for a longer duration. CONCLUSIONS: Most COVID-19 convalescents had an overall improved physical and psychological health status, whereas sequelae symptoms, residual lesions on lung function, exercise impairment, and mental health disorders were still observed in a small proportion of participants at 18.5 months after infection. Implementing appropriate preventive and management strategies for the ever-growing COVID-19 population is warranted.

Subject(s)

COVID-19 , Humans , Longitudinal Studies , SARS-CoV-2 , Antibodies, Viral , Anxiety/epidemiology , Disease Progression

The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance.

Qiu, Yutong; Kingsford, Carl.

Bioinformatics ; 38(Suppl 1): i404-i412, 2022 06 24.

Article in English | MEDLINE | ID: mdl-35758819

ABSTRACT

MOTIVATION: Intra-sample heterogeneity describes the phenomenon where a genomic sample contains a diverse set of genomic sequences. In practice, the true string sets in a sample are often unknown due to limitations in sequencing technology. In order to compare heterogeneous samples, genome graphs can be used to represent such sets of strings. However, a genome graph is generally able to represent a string set universe that contains multiple sets of strings in addition to the true string set. This difference between genome graphs and string sets is not well characterized. As a result, a distance metric between genome graphs may not match the distance between true string sets. RESULTS: We extend a genome graph distance metric, Graph Traversal Edit Distance (GTED) proposed by Ebrahimpour Boroojeny et al., to FGTED to model the distance between heterogeneous string sets and show that GTED and FGTED always underestimate the Earth Mover's Edit Distance (EMED) between string sets. We introduce the notion of string set universe diameter of a genome graph. Using the diameter, we are able to upper-bound the deviation of FGTED from EMED and to improve FGTED so that it reduces the average error in empirically estimating the similarity between true string sets. On simulated T-cell receptor sequences and actual Hepatitis B virus genomes, we show that the diameter-corrected FGTED reduces the average deviation of the estimated distance from the true string set distances by more than 250%. AVAILABILITY AND IMPLEMENTATION: Data and source code for reproducing the experiments are available at: https://github.com/Kingsford-Group/gtedemedtest/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Genome , Genomics , Sequence Analysis, DNA , Software

Constructing small genome graphs via string compression.

Qiu, Yutong; Kingsford, Carl.

Bioinformatics ; 37(Suppl_1): i205-i213, 2021 07 12.

Article in English | MEDLINE | ID: mdl-34252955

ABSTRACT

MOTIVATION: The size of a genome graph-the space required to store the nodes, node labels and edges-affects the efficiency of operations performed on it. For example, the time complexity to align a sequence to a graph without a graph index depends on the total number of characters in the node labels and the number of edges in the graph. This raises the need for approaches to construct space-efficient genome graphs. RESULTS: We point out similarities in the string encoding mechanisms of genome graphs and the external pointer macro (EPM) compression model. We present a pair of linear-time algorithms that transform between genome graphs and EPM-compressed forms. The algorithms result in an upper bound on the size of the genome graph constructed in terms of an optimal EPM compression. To further reduce the size of the genome graph, we propose the source assignment problem that optimizes over the equivalent choices during compression and introduce an ILP formulation that solves that problem optimally. As a proof-of-concept, we introduce RLZ-Graph, a genome graph constructed based on the relative Lempel-Ziv algorithm. Using RLZ-Graph, across all human chromosomes, we are able to reduce the disk space to store a genome graph on average by 40.7% compared to colored compacted de Bruijn graphs constructed by Bifrost under the default settings. The RLZ-Graph scales well in terms of running time and graph sizes with an increasing number of human genome sequences compared to Bifrost and variation graphs produced by VGtoolkit. AVAILABILITY: The RLZ-Graph software is available at: https://github.com/Kingsford-Group/rlzgraph. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Data Compression , Software , Algorithms , Genome, Human , Humans , Sequence Analysis, DNA

A flexible ChIP-sequencing simulation toolkit.

Zheng, An; Lamkin, Michael; Qiu, Yutong; Ren, Kevin; Goren, Alon; Gymrek, Melissa.

BMC Bioinformatics ; 22(1): 201, 2021 Apr 20.

Article in English | MEDLINE | ID: mdl-33879052

ABSTRACT

BACKGROUND: A major challenge in evaluating quantitative ChIP-seq analyses, such as peak calling and differential binding, is a lack of reliable ground truth data. Accurate simulation of ChIP-seq data can mitigate this challenge, but existing frameworks are either too cumbersome to apply genome-wide or unable to model a number of important experimental conditions in ChIP-seq. RESULTS: We present ChIPs, a toolkit for rapidly simulating ChIP-seq data using statistical models of key experimental steps. We demonstrate how ChIPs can be used for a range of applications, including benchmarking analysis tools and evaluating the impact of various experimental parameters. ChIPs is implemented as a standalone command-line program written in C++ and is available from https://github.com/gymreklab/chips . CONCLUSIONS: ChIPs is an efficient ChIP-seq simulation framework that generates realistic datasets over a flexible range of experimental conditions. It can serve as an important component in various ChIP-seq analyses where ground truth data are needed.

Subject(s)

Chromatin Immunoprecipitation Sequencing , Software , Computer Simulation , Genome , High-Throughput Nucleotide Sequencing , Models, Statistical , Sequence Analysis, DNA

Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.

Qiu, Yutong; Ma, Cong; Xie, Han; Kingsford, Carl.

Algorithms Mol Biol ; 15: 9, 2020.

Article in English | MEDLINE | ID: mdl-32467720

ABSTRACT

BACKGROUND: Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. RESULTS: To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a 1 4 -approximation algorithm for k = 1 and a 3 4 -approximation algorithm for the diploid case ( k = 2 ) assuming an oracle for k = 1 . Combining these, we obtain a 3 16 -approximation algorithm for MCAP when k = 2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require k > 1 alleles to satisfy read concordancy and show that such structures are prevalent. CONCLUSIONS: We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.

H3K27me3-mediated silencing of structural genes is required for zebrafish heart regeneration.

Ben-Yair, Raz; Butty, Vincent L; Busby, Michele; Qiu, Yutong; Levine, Stuart S; Goren, Alon; Boyer, Laurie A; Burns, C Geoffrey; Burns, Caroline E.

Development ; 146(19)2019 10 09.

Article in English | MEDLINE | ID: mdl-31427288

ABSTRACT

Deciphering the genetic and epigenetic regulation of cardiomyocyte proliferation in organisms that are capable of robust cardiac renewal, such as zebrafish, represents an attractive inroad towards regenerating the human heart. Using integrated high-throughput transcriptional and chromatin analyses, we have identified a strong association between H3K27me3 deposition and reduced sarcomere and cytoskeletal gene expression in proliferative cardiomyocytes following cardiac injury in zebrafish. To move beyond an association, we generated an inducible transgenic strain expressing a mutant version of histone 3, H3.3K27M, that inhibits H3K27me3 catalysis in cardiomyocytes during the regenerative window. Hearts comprising H3.3K27M-expressing cardiomyocytes fail to regenerate, with wound edge cells showing heightened expression of structural genes and prominent sarcomeres. Although cell cycle re-entry was unperturbed, cytokinesis and wound invasion were significantly compromised. Collectively, our study identifies H3K27me3-mediated silencing of structural genes as requisite for zebrafish heart regeneration and suggests that repression of similar structural components in the border zone of an infarcted human heart might improve its regenerative capacity.

Subject(s)

Gene Silencing , Heart/physiology , Histones/metabolism , Lysine/metabolism , Regeneration/physiology , Zebrafish/genetics , Zebrafish/physiology , Animals , Cell Proliferation , Cytokinesis , Cytoskeleton/metabolism , Gene Expression Regulation, Developmental , Methylation , Myocytes, Cardiac/cytology , Myocytes, Cardiac/metabolism , Sarcomeres/metabolism

10.

Effects of blood-feeding on olfactory sensitivity of the malaria mosquito Anopheles gambiae: application of mixed linear models to account for repeated measurements.

Qiu, Yu-Tong; Gort, Gerrit; Torricelli, Roberta; Takken, Willem; van Loon, Joop J A.

J Insect Physiol ; 59(11): 1111-8, 2013 Nov.

Article in English | MEDLINE | ID: mdl-24036172

ABSTRACT

Olfaction plays an important role in the host-seeking behavior of the malaria mosquito Anopheles gambiae. After a complete blood meal, female mosquitoes will not engage in host-seeking behavior until oviposition has occurred. We investigated if peripheral olfactory sensitivity changed after a blood meal by recording electroantennograms (EAGs) of female mosquitoes at three time points (2h, 48 h and 72 h) to 15 volatile kairomones of either human origin or documented to emanate from oviposition sites. The EAG-sensitivity was compared with that of females of similar age post eclosion. As is common practice in electrophysiological studies, the EAG recordings were obtained by repeated stimulation of the same antennal preparations. We introduce mixed linear modeling as an improved statistical analysis for electrophysiological data. Two hours after blood ingestion, olfactory sensitivity as quantified through EAG-recording increased significantly and selectively, i.e. for seven compounds, compared to unfed females of the same age. Such short-term electrophysiological sensitization in the olfactory system as a result of feeding has not been documented before for insects. Sensitization to six compounds persisted until 48 h or 72 h post-blood meal at one or more concentrations. Desensitization was observed at 48 and 72 h pbm in response to two and three kairomones, respectively. For several compounds, sensitization at the EAG-level corresponded with sensitization found previously in single sensillum studies on olfactory neurons in antennal sensilla trichodea of An. gambiae females. These effects are likely to reflect sensitization to oviposition cues, as eggs have matured 48-72 h pbm. Knowledge of changes in olfactory sensitivity to kairomones can be applied to increase trap catches of malaria mosquitoes that have taken a blood meal and need to locate oviposition sites.

Subject(s)

Anopheles/physiology , Blood , Cues , Feeding Behavior/physiology , Smell/physiology , Animals , Arthropod Antennae/physiology , Female , Liberia , Linear Models , Oviposition/physiology , Time Factors

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL