Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38349060

ABSTRACT

The recent development of deep learning methods have undoubtedly led to great improvement in various machine learning tasks, especially in prediction tasks. This type of methods have also been adapted to answer various problems in bioinformatics, including automatic genome annotation, artificial genome generation or phenotype prediction. In particular, a specific type of deep learning method, called graph neural network (GNN) has repeatedly been reported as a good candidate to predict phenotypes from gene expression because its ability to embed information on gene regulation or co-expression through the use of a gene network. However, up to date, no complete and reproducible benchmark has ever been performed to analyze the trade-off between cost and benefit of this approach compared to more standard (and simpler) machine learning methods. In this article, we provide such a benchmark, based on clear and comparable policies to evaluate the different methods on several datasets. Our conclusion is that GNN rarely provides a real improvement in prediction performance, especially when compared to the computation effort required by the methods. Our findings on a limited but controlled simulated dataset shows that this could be explained by the limited quality or predictive power of the input biological gene network itself.


Subject(s)
Gene Expression Profiling , Transcriptome , Benchmarking , Computational Biology , Neural Networks, Computer
2.
World Neurosurg ; 181: e953-e962, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37952887

ABSTRACT

OBJECTIVES: Symptomatic lumbar spinal stenosis (LSS) leads to functional impairment and pain. While radiologic characterization of the morphological stenosis grade can aid in the diagnosis, it may not always correlate with patient symptoms. Artificial intelligence (AI) may diagnose symptomatic LSS in patients solely based on self-reported history questionnaires. METHODS: We evaluated multiple machine learning (ML) models to determine the likelihood of LSS using a self-reported questionnaire in patients experiencing low back pain and/or numbness in the legs. The questionnaire was built from peer-reviewed literature and a multidisciplinary panel of experts. Random forest, lasso logistic regression, support vector machine, gradient boosting trees, deep neural networks, and automated machine learning models were trained and performance metrics were compared. RESULTS: Data from 4827 patients (4690 patients without LSS: mean age 62.44, range 27-84 years, 62.8% females, and 137 patients with LSS: mean age 50.59, range 30-71 years, 59.9% females) were retrospectively collected. Among the evaluated models, the random forest model demonstrated the highest predictive accuracy with an area under the receiver operating characteristic curve (AUROC) between model prediction and LSS diagnosis of 0.96, a sensitivity of 0.94, a specificity of 0.88, a balanced accuracy of 0.91, and a Cohen's kappa of 0.85. CONCLUSIONS: Our results indicate that ML can automate the diagnosis of LSS based on self-reported questionnaires with high accuracy. Implementation of standardized and intelligence-automated workflow may serve as a supportive diagnostic tool to streamline patient management and potentially lower health care costs.


Subject(s)
Spinal Stenosis , Female , Humans , Adult , Middle Aged , Aged , Aged, 80 and over , Male , Spinal Stenosis/diagnosis , Self Report , Artificial Intelligence , Retrospective Studies , Lumbar Vertebrae , Surveys and Questionnaires
3.
Eur Spine J ; 33(3): 941-948, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38150003

ABSTRACT

OBJECTIVES: To develop a three-stage convolutional neural network (CNN) approach to segment anatomical structures, classify the presence of lumbar spinal stenosis (LSS) for all 3 stenosis types: central, lateral recess and foraminal and assess its severity on spine MRI and to demonstrate its efficacy as an accurate and consistent diagnostic tool. METHODS: The three-stage model was trained on 1635 annotated lumbar spine MRI studies consisting of T2-weighted sagittal and axial planes at each vertebral level. Accuracy of the model was evaluated on an external validation set of 150 MRI studies graded on a scale of absent, mild, moderate or severe by a panel of 7 radiologists. The reference standard for all types was determined by majority voting and in case of disagreement, adjudicated by an external radiologist. The radiologists' diagnoses were then compared to the diagnoses of the model. RESULTS: The model showed comparable performance to the radiologist average both in terms of the determination of presence/absence of LSS as well as severity classification, for all 3 stenosis types. In the case of central canal stenosis, the sensitivity, specificity and AUROC of the CNN were (0.971, 0.864, 0.963) for binary (presence/absence) classification compared to the radiologist average of (0.786, 0.899, 0.842). For lateral recess stenosis, the sensitivity, specificity and AUROC of the CNN were (0.853, 0.787, 0.907) compared to the radiologist average of (0.713, 0.898, 805). For foraminal stenosis, the sensitivity, specificity and AUROC of the CNN were (0.942, 0.844, 0.950) compared to the radiologist average of (0.879, 0.877, 0.878). Multi-class severity classifications showed similarly comparable statistics. CONCLUSIONS: The CNN showed comparable performance to radiologist subspecialists for the detection and classification of LSS. The integration of neural network models in the detection of LSS could bring higher accuracy, efficiency, consistency, and post-hoc interpretability in diagnostic practices.


Subject(s)
Spinal Stenosis , Humans , Spinal Stenosis/diagnostic imaging , Constriction, Pathologic , Lumbar Vertebrae/diagnostic imaging , Magnetic Resonance Imaging , Neural Networks, Computer
5.
Nature ; 623(7985): 183-192, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37853125

ABSTRACT

The DNA damage response is essential to safeguard genome integrity. Although the contribution of chromatin in DNA repair has been investigated1,2, the contribution of chromosome folding to these processes remains unclear3. Here we report that, after the production of double-stranded breaks (DSBs) in mammalian cells, ATM drives the formation of a new chromatin compartment (D compartment) through the clustering of damaged topologically associating domains, decorated with γH2AX and 53BP1. This compartment forms by a mechanism that is consistent with polymer-polymer phase separation rather than liquid-liquid phase separation. The D compartment arises mostly in G1 phase, is independent of cohesin and is enhanced after pharmacological inhibition of DNA-dependent protein kinase (DNA-PK) or R-loop accumulation. Importantly, R-loop-enriched DNA-damage-responsive genes physically localize to the D compartment, and this contributes to their optimal activation, providing a function for DSB clustering in the DNA damage response. However, DSB-induced chromosome reorganization comes at the expense of an increased rate of translocations, also observed in cancer genomes. Overall, we characterize how DSB-induced compartmentalization orchestrates the DNA damage response and highlight the critical impact of chromosome architecture in genomic instability.


Subject(s)
Cell Compartmentation , Chromatin , DNA Damage , Animals , Ataxia Telangiectasia Mutated Proteins/metabolism , Cell Line , Chromatin/genetics , Chromatin/metabolism , DNA Breaks, Double-Stranded , DNA Repair , DNA-Activated Protein Kinase/metabolism , G1 Phase , Histones/metabolism , Neoplasms/genetics , R-Loop Structures , Tumor Suppressor p53-Binding Protein 1/metabolism
6.
BMC Bioinformatics ; 24(1): 186, 2023 May 05.
Article in English | MEDLINE | ID: mdl-37147561

ABSTRACT

MOTIVATION: Genome-wide association studies have systematically identified thousands of single nucleotide polymorphisms (SNPs) associated with complex genetic diseases. However, the majority of those SNPs were found in non-coding genomic regions, preventing the understanding of the underlying causal mechanism. Predicting molecular processes based on the DNA sequence represents a promising approach to understand the role of those non-coding SNPs. Over the past years, deep learning was successfully applied to regulatory sequence prediction using supervised learning. Supervised learning required DNA sequences associated with functional data for training, whose amount is strongly limited by the finite size of the human genome. Conversely, the amount of mammalian DNA sequences is exponentially increasing due to ongoing large sequencing projects, but without functional data in most cases. RESULTS: To alleviate the limitations of supervised learning, we propose a paradigm shift with semi-supervised learning, which does not only exploit labeled sequences (e.g. human genome with ChIP-seq experiment), but also unlabeled sequences available in much larger amounts (e.g. from other species without ChIP-seq experiment, such as chimpanzee). Our approach is flexible and can be plugged into any neural architecture including shallow and deep networks, and shows strong predictive performance improvements compared to supervised learning in most cases (up to [Formula: see text]). AVAILABILITY AND IMPLEMENTATION: https://forgemia.inra.fr/raphael.mourad/deepgnn .


Subject(s)
Genome-Wide Association Study , Genomics , Animals , Humans , Supervised Machine Learning , Sequence Analysis , Genome, Human , Mammals
7.
Global Spine J ; : 21925682231155844, 2023 Feb 08.
Article in English | MEDLINE | ID: mdl-36752058

ABSTRACT

STUDY DESIGN: Medical vignettes. OBJECTIVES: Lumbar spinal stenosis (LSS) is a degenerative condition with a high prevalence in the elderly population, that is associated with a significant economic burden and often requires spinal surgery. Prior authorization of surgical candidates is required before patients can be covered by a health plan and must be approved by medical directors (MDs), which is often subjective and clinician specific. In this study, we hypothesized that the prediction accuracy of machine learning (ML) methods regarding surgical candidates is comparable to that of a panel of MDs. METHODS: Based on patient demographic factors, previous therapeutic history, symptoms and physical examinations and imaging findings, we propose an ML which computes the probability of spinal surgical recommendations for LSS. The model implements a random forest model trained from medical vignette data reviewed by MDs. Sets of 400 and 100 medical vignettes reviewed by MDs were used for training and testing. RESULTS: The predictive accuracy of the machine learning model was with a root mean square error (RMSE) between model predictions and ground truth of .1123, while the average RMSE between individual MD's recommendations and ground truth was .2661. For binary classification, the AUROC and Cohen's kappa were .959 and .801, while the corresponding average metrics based on individual MD's recommendations were .844 and .564, respectively. CONCLUSIONS: Our results suggest that ML can be used to automate prior authorization approval of surgery for LSS with performance comparable to a panel of MDs.

8.
Eur Spine J ; 31(8): 2149-2155, 2022 08.
Article in English | MEDLINE | ID: mdl-35802195

ABSTRACT

PURPOSE: Lumbar spinal stenosis (LSS) is a condition affecting several hundreds of thousands of adults in the United States each year and is associated with significant economic burden. The current decision-making practice to determine surgical candidacy for LSS is often subjective and clinician specific. In this study, we hypothesize that the performance of artificial intelligence (AI) methods could prove comparable in terms of prediction accuracy to that of a panel of spine experts. METHODS: We propose a novel hybrid AI model which computes the probability of spinal surgical recommendations for LSS, based on patient demographic factors, clinical symptom manifestations, and MRI findings. The hybrid model combines a random forest model trained from medical vignette data reviewed by surgeons, with an expert Bayesian network model built from peer-reviewed literature and the expert opinions of a multidisciplinary team in spinal surgery, rehabilitation medicine, interventional and diagnostic radiology. Sets of 400 and 100 medical vignettes reviewed by surgeons were used for training and testing. RESULTS: The model demonstrated high predictive accuracy, with a root mean square error (RMSE) between model predictions and ground truth of 0.0964, while the average RMSE between individual doctor's recommendations and ground truth was 0.1940. For dichotomous classification, the AUROC and Cohen's kappa were 0.9266 and 0.6298, while the corresponding average metrics based on individual doctor's recommendations were 0.8412 and 0.5659, respectively. CONCLUSIONS: Our results suggest that AI can be used to automate the evaluation of surgical candidacy for LSS with performance comparable to a multidisciplinary panel of physicians.


Subject(s)
Lumbar Vertebrae , Spinal Stenosis , Adult , Artificial Intelligence , Bayes Theorem , Constriction, Pathologic , Humans , Lumbar Vertebrae/diagnostic imaging , Lumbar Vertebrae/surgery , Spinal Stenosis/diagnostic imaging , Spinal Stenosis/surgery
9.
BMC Bioinformatics ; 23(1): 82, 2022 Mar 02.
Article in English | MEDLINE | ID: mdl-35236295

ABSTRACT

BACKGROUND/AIM: In higher eukaryotes, the three-dimensional (3D) organization of the genome is intimately related to numerous key biological functions including gene expression, DNA repair and DNA replication regulations. Alteration of 3D organization, in particular topologically associating domains (TADs), is detrimental to the organism and can give rise to a broad range of diseases such as cancers. METHODS: Here, we propose a versatile regression framework which not only identifies TADs in a fast and accurate manner, but also detects differential TAD borders across conditions for which few methods exist, and predicts 3D genome reorganization after chromosomal rearrangement. Moreover, the framework is biologically meaningful, has an intuitive interpretation and is easy to visualize. RESULT AND CONCLUSION: The novel regression ranks among top TAD callers. Moreover, it identifies new features of the genome we called TAD facilitators, and that are enriched with specific transcription factors. It also unveils the importance of cell-type specific transcription factors in establishing novel TAD borders during neuronal differentiation. Lastly, it compares favorably with the state-of-the-art method for predicting rearranged 3D genome.


Subject(s)
Genome , Transcription Factors , Chromatin , Transcription Factors/genetics
10.
PLoS Comput Biol ; 17(8): e1009308, 2021 08.
Article in English | MEDLINE | ID: mdl-34383754

ABSTRACT

DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4.


Subject(s)
Deep Learning , G-Quadruplexes , Algorithms , Chromatin Immunoprecipitation , Genome , Humans , Neoplasms/genetics , Neoplasms/pathology , Neural Networks, Computer
11.
Nature ; 590(7847): 660-665, 2021 02.
Article in English | MEDLINE | ID: mdl-33597753

ABSTRACT

The repair of DNA double-strand breaks (DSBs) is essential for safeguarding genome integrity. When a DSB forms, the PI3K-related ATM kinase rapidly triggers the establishment of megabase-sized, chromatin domains decorated with phosphorylated histone H2AX (γH2AX), which act as seeds for the formation of DNA-damage response foci1. It is unclear how these foci are rapidly assembled to establish a 'repair-prone' environment within the nucleus. Topologically associating domains are a key feature of 3D genome organization that compartmentalize transcription and replication, but little is known about their contribution to DNA repair processes2,3. Here we show that topologically associating domains are functional units of the DNA damage response, and are instrumental for the correct establishment of γH2AX-53BP1 chromatin domains in a manner that involves one-sided cohesin-mediated loop extrusion on both sides of the DSB. We propose a model in which H2AX-containing nucleosomes are rapidly phosphorylated as they actively pass by DSB-anchored cohesin. Our work highlights the importance of chromosome conformation in the maintenance of genome integrity and demonstrates the establishment of a chromatin modification by loop extrusion.


Subject(s)
DNA Breaks, Double-Stranded , DNA Repair , DNA/chemistry , DNA/metabolism , Nucleic Acid Conformation , Saccharomyces cerevisiae , Cell Cycle Proteins/metabolism , Cell Line , Chromosomal Proteins, Non-Histone/metabolism , DNA/genetics , Genome/genetics , Histones/metabolism , Humans , Nucleosomes/chemistry , Nucleosomes/genetics , Nucleosomes/metabolism , Phosphorylation , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/metabolism , Tumor Suppressor p53-Binding Protein 1/metabolism , Cohesins
12.
Bioinformatics ; 36(5): 1367-1373, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31605131

ABSTRACT

MOTIVATION: The three dimensions (3D) genome is essential to numerous key processes such as the regulation of gene expression and the replication-timing program. In vertebrates, chromatin looping is often mediated by CTCF, and marked by CTCF motif pairs in convergent orientation. Comparative high-throughput sequencing technique (Hi-C) recently revealed that chromatin looping evolves across species. However, Hi-C experiments are complex and costly, which currently limits their use for evolutionary studies over a large number of species. RESULTS: Here, we propose a novel approach to study the 3D genome evolution in vertebrates using the genomic sequence only, e.g. without the need for Hi-C data. The approach is simple and relies on comparing the distances between convergent and divergent CTCF motifs by computing a ratio we named the 3D ratio or '3DR'. We show that 3DR is a powerful statistic to detect CTCF looping encoded in the human genome sequence, thus reflecting strong evolutionary constraints encoded in DNA and associated with the 3D genome. When comparing vertebrate genomes, our results reveal that 3DR which underlies CTCF looping and topologically associating domain organization evolves over time and suggest that ancestral character reconstruction can be used to infer 3DR in ancestral genomes. AVAILABILITY AND IMPLEMENTATION: The R code is available at https://github.com/morphos30/PhyloCTCFLooping. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromatin , Genomics , Animals , Evolution, Molecular , Genome, Human , High-Throughput Nucleotide Sequencing , Humans
13.
Semin Cell Dev Biol ; 90: 128-137, 2019 06.
Article in English | MEDLINE | ID: mdl-30030142

ABSTRACT

In higher eukaryotes, the three-dimensional (3D) organization of the genome is intimately related to numerous key biological functions including gene expression, DNA repair and DNA replication regulations. Alteration of this 3D organization is detrimental to the organism and can give rise to a broad range of diseases such as cancers. Here, we review recent advances in the field. We first describe how the genome is packed in 3D to form chromosome territories, compartments and domains. We also give an overview of the recent techniques that allow to map the genome in 3D up to the kilobase resolution. We then discuss potential mechanisms by which genome misfolding can affect proper gene expression by distal enhancers, and how the 3D genome influences the formation of genomic rearrangements.


Subject(s)
Chromosomes/genetics , Disease/genetics , Neoplasms/genetics , Chromosomes/chemistry , Humans
14.
Genome Biol ; 19(1): 34, 2018 03 15.
Article in English | MEDLINE | ID: mdl-29544533

ABSTRACT

Double-strand breaks (DSBs) result from the attack of both DNA strands by multiple sources, including radiation and chemicals. DSBs can cause the abnormal chromosomal rearrangements associated with cancer. Recent techniques allow the genome-wide mapping of DSBs at high resolution, enabling the comprehensive study of their origins. However, these techniques are costly and challenging. Hence, we devise a computational approach to predict DSBs using the epigenomic and chromatin context, for which public data are readily available from the ENCODE project. We achieve excellent prediction accuracy at high resolution. We identify chromatin accessibility, activity, and long-range contacts as the best predictors.


Subject(s)
DNA Breaks, Double-Stranded , DNA/chemistry , Epigenesis, Genetic , Cell Line , Chromatin/metabolism , Histone Code , Humans , Nucleotide Motifs
15.
Nucleic Acids Res ; 46(5): e27, 2018 03 16.
Article in English | MEDLINE | ID: mdl-29272504

ABSTRACT

The three-dimensional (3D) organization of the genome is intimately related to numerous key biological functions including gene expression and DNA replication regulations. The mechanisms by which molecular drivers functionally organize the 3D genome, such as topologically associating domains (TADs), remain to be explored. Current approaches consist in assessing the enrichments or influences of proteins at TAD borders. Here, we propose a TAD-free model to directly estimate the blocking effects of architectural proteins, insulators and DNA motifs on long-range contacts, making the model intuitive and biologically meaningful. In addition, the model allows analyzing the whole Hi-C information content (2D information) instead of only focusing on TAD borders (1D information). The model outperforms multiple logistic regression at TAD borders in terms of parameter estimation accuracy and is validated by enhancer-blocking assays. In Drosophila, the results support the insulating role of simple sequence repeats and suggest that the blocking effects depend on the number of repeats. Motif analysis uncovered the roles of the transcriptional factors pannier and tramtrack in blocking long-range contacts. In human, the results suggest that the blocking effects of the well-known architectural proteins CTCF, cohesin and ZNF143 depend on the distance between loci, where each protein may participate at different scales of the 3D chromatin organization.


Subject(s)
Genome/genetics , Insulator Elements/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Algorithms , Animals , Binding Sites/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster , Gene Expression Profiling , Gene Expression Regulation , Gene Regulatory Networks , Humans , Models, Genetic , Protein Binding
16.
PLoS Comput Biol ; 13(5): e1005538, 2017 05.
Article in English | MEDLINE | ID: mdl-28542178

ABSTRACT

Chromosomal organization in 3D plays a central role in regulating cell-type specific transcriptional and DNA replication timing programs. Yet it remains unclear to what extent the resulting long-range contacts depend on specific molecular drivers. Here we propose a model that comprehensively assesses the influence on contacts of DNA-binding proteins, cis-regulatory elements and DNA consensus motifs. Using real data, we validate a large number of predictions for long-range contacts involving known architectural proteins and DNA motifs. Our model outperforms existing approaches including enrichment test, random forests and correlation, and it uncovers numerous novel long-range contacts in Drosophila and human. The model uncovers the orientation-dependent specificity for long-range contacts between CTCF motifs in Drosophila, highlighting its conserved property in 3D organization of metazoan genomes. Our model further unravels long-range contacts depending on co-factors recruited to DNA indirectly, as illustrated by the influence of cohesin in stabilizing long-range contacts between CTCF sites. It also reveals asymmetric contacts such as enhancer-promoter contacts that highlight opposite influences of the transcription factors EBF1, EGR1 or MEF2C depending on RNA Polymerase II pausing.


Subject(s)
Chromatin/chemistry , Chromatin/metabolism , DNA/chemistry , DNA/metabolism , Animals , Binding Sites , Computational Biology , Drosophila , Humans , Reproducibility of Results
17.
PLoS Comput Biol ; 12(5): e1004908, 2016 05.
Article in English | MEDLINE | ID: mdl-27203237

ABSTRACT

Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1.


Subject(s)
Chromatin/chemistry , Chromatin/genetics , Models, Genetic , Animals , CCCTC-Binding Factor , Computational Biology , Computer Simulation , Drosophila Proteins/genetics , Drosophila melanogaster/genetics , Genome, Human , Genome, Insect , High-Throughput Nucleotide Sequencing , Humans , Imaging, Three-Dimensional , Logistic Models , Polymorphism, Single Nucleotide , Repressor Proteins/genetics , Sequence Analysis, DNA
18.
AIDS ; 29(15): 1917-25, 2015 Sep 24.
Article in English | MEDLINE | ID: mdl-26355570

ABSTRACT

OBJECTIVE: Antiretroviral-naive HIV-positive individuals contribute to the transmission of drug-resistant viruses, compromising first-line therapy. Using phylogenetic inference, we quantified the proportion of transmitted drug-resistance originating from a treatment-naive source. METHODS: Using a novel phylotype-based approach, 24 550 HIV-1 subtype B partial pol gene sequences from the UK HIV Drug Resistance database were analysed. Ongoing transmission of drug resistance amongst HIV-positive individuals was identified as phylotypes of at least three sequences with at least one shared drug resistance mutation, a maximum intra-clade genetic distance of 4.0% and a basal branch support at least 90%. The time of persistence of the transmission chains was estimated using a fast least-squares molecular clock inference approach. RESULTS: Around 70% of transmitted drug-resistance had a treatment-naive source. The most commonly transmitted mutations were L90M in the protease gene and K103N, T215D and T215S in reverse transcriptase. Reversion to wild type occurred at a low frequency and drug-independent reservoirs of resistance have persisted for up to 13 years. CONCLUSION: These results illustrate the impact of viral fitness on the establishment of resistance reservoirs and support the notion that earlier diagnoses and treatment of HIV infections are warranted for counteracting the spread of antiretroviral resistance. Phylotype-based phylogenetic inference is an attractive approach for the routine surveillance of transmitted drug resistance in HIV as well as in other pathogens for which genotypic resistance data are available.


Subject(s)
Disease Transmission, Infectious , Drug Resistance, Viral , HIV Infections/transmission , HIV Infections/virology , HIV-1/drug effects , Cohort Studies , Computational Biology , Female , Genotype , HIV Infections/epidemiology , HIV-1/classification , HIV-1/genetics , Humans , Male , Molecular Epidemiology , Phylogeny , Sequence Analysis, DNA , United Kingdom/epidemiology , pol Gene Products, Human Immunodeficiency Virus
19.
Genome Biol ; 16: 182, 2015 Aug 29.
Article in English | MEDLINE | ID: mdl-26319942

ABSTRACT

Chromosome folding can reinforce the demarcation between euchromatin and heterochromatin. Two new studies show how epigenetic data, including DNA methylation, can accurately predict chromosome folding in three dimensions. Such computational approaches reinforce the idea of a linkage between epigenetically marked chromatin domains and their segregation into distinct compartments at the megabase scale or topological domains at a higher resolution. Please see related articles: http://dx.doi.org/10.1186/s13059-015-0741-y and http://dx.doi.org/10.1186/s13059-015-0740-z.


Subject(s)
Chromatin/chemistry , Chromatin/metabolism , DNA Methylation , Epigenesis, Genetic , Histone Code , Humans , Male
20.
Nat Commun ; 6: 5965, 2015 Jan 16.
Article in English | MEDLINE | ID: mdl-25591454

ABSTRACT

Common variants at many loci have been robustly associated with asthma but explain little of the overall genetic risk. Here we investigate the role of rare (<1%) and low-frequency (1-5%) variants using the Illumina HumanExome BeadChip array in 4,794 asthma cases, 4,707 non-asthmatic controls and 590 case-parent trios representing European Americans, African Americans/African Caribbeans and Latinos. Our study reveals one low-frequency missense mutation in the GRASP gene that is associated with asthma in the Latino sample (P=4.31 × 10(-6); OR=1.25; MAF=1.21%) and two genes harbouring functional variants that are associated with asthma in a gene-based analysis: GSDMB at the 17q12-21 asthma locus in the Latino and combined samples (P=7.81 × 10(-8) and 4.09 × 10(-8), respectively) and MTHFR in the African ancestry sample (P=1.72 × 10(-6)). Our results suggest that associations with rare and low-frequency variants are ethnic specific and not likely to explain a significant proportion of the 'missing heritability' of asthma.


Subject(s)
Asthma/genetics , Genome-Wide Association Study/methods , Carrier Proteins/genetics , Female , Genetic Predisposition to Disease/genetics , Humans , Linkage Disequilibrium/genetics , Male , Membrane Proteins/genetics , Neoplasm Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...