Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Nat Genet ; 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38977852

ABSTRACT

Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.

2.
Nature ; 626(8000): 799-807, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38326615

ABSTRACT

Linking variants from genome-wide association studies (GWAS) to underlying mechanisms of disease remains a challenge1-3. For some diseases, a successful strategy has been to look for cases in which multiple GWAS loci contain genes that act in the same biological pathway1-6. However, our knowledge of which genes act in which pathways is incomplete, particularly for cell-type-specific pathways or understudied genes. Here we introduce a method to connect GWAS variants to functions. This method links variants to genes using epigenomics data, links genes to pathways de novo using Perturb-seq and integrates these data to identify convergence of GWAS loci onto pathways. We apply this approach to study the role of endothelial cells in genetic risk for coronary artery disease (CAD), and discover 43 CAD GWAS signals that converge on the cerebral cavernous malformation (CCM) signalling pathway. Two regulators of this pathway, CCM2 and TLNRD1, are each linked to a CAD risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. These results suggest a model whereby CAD risk is driven in part by the convergence of causal genes onto a particular transcriptional pathway in endothelial cells. They highlight shared genes between common and rare vascular diseases (CAD and CCM), and identify TLNRD1 as a new, previously uncharacterized member of the CCM signalling pathway. This approach will be widely useful for linking variants to functions for other common polygenic diseases.


Subject(s)
Coronary Artery Disease , Endothelial Cells , Genome-Wide Association Study , Hemangioma, Cavernous, Central Nervous System , Humans , Coronary Artery Disease/genetics , Coronary Artery Disease/pathology , Endothelial Cells/metabolism , Endothelial Cells/pathology , Genetic Predisposition to Disease/genetics , Hemangioma, Cavernous, Central Nervous System/genetics , Hemangioma, Cavernous, Central Nervous System/pathology , Polymorphism, Single Nucleotide , Epigenomics , Signal Transduction/genetics , Multifactorial Inheritance
3.
bioRxiv ; 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-37292653

ABSTRACT

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be over-looked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, s het . Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

4.
Genetics ; 225(3)2023 11 01.
Article in English | MEDLINE | ID: mdl-37724741

ABSTRACT

The discrete-time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here, we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.


Subject(s)
Biological Specimen Banks , Genetics, Population , Gene Frequency , Genetic Drift , Probability , Models, Genetic , Selection, Genetic
5.
Res Sq ; 2023 Jun 13.
Article in English | MEDLINE | ID: mdl-37398424

ABSTRACT

Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.

6.
bioRxiv ; 2023 May 22.
Article in English | MEDLINE | ID: mdl-37293115

ABSTRACT

The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.

7.
ACS Chem Biol ; 17(6): 1334-1342, 2022 06 17.
Article in English | MEDLINE | ID: mdl-35593877

ABSTRACT

The conversion of N1-methyladenosine (m1A) to N6-methyladenosine (m6A) on RNA is an important step for both allowing efficient reverse transcription read-though for sequencing analysis and mapping modifications in the transcriptome. Enzymatic transformation is often used, but the efficiency of the removal can depend on local sequence context. Chemical conversion through the application of the Dimroth rearrangement, in which m1A rearranges into m6A under heat and alkaline conditions, is an alternative, but the required alkaline conditions result in significant RNA degradation by hydrolysis of the phosphodiester backbone. Here, we report novel, mild pH conditions that catalyze m1A-to-m6A arrangement using 4-nitrothiophenol as a catalyst. We demonstrate the efficient rearrangement in mononucleosides, synthetic RNA oligonucleotides, and RNAs isolated from human cell lines, thereby validating a new approach for converting m1A-to-m6A in RNA samples for sequencing analyses.


Subject(s)
Oligonucleotides , RNA , Catalysis , Humans , RNA/metabolism , Transcriptome
8.
Genome Biol ; 23(1): 103, 2022 04 21.
Article in English | MEDLINE | ID: mdl-35449021

ABSTRACT

Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.


Subject(s)
Pangolins , RNA Splicing , Animals , Base Sequence , Mutation , RNA Splice Sites
9.
Elife ; 92020 06 25.
Article in English | MEDLINE | ID: mdl-32584258

ABSTRACT

Little is known about co-transcriptional or post-transcriptional regulatory mechanisms linking noncoding variation to variation in organismal traits. To begin addressing this gap, we used 3' Seq to study the impact of genetic variation on alternative polyadenylation (APA) in the nuclear and total mRNA fractions of 52 HapMap Yoruba human lymphoblastoid cell lines. We mapped 602 APA quantitative trait loci (apaQTLs) at 10% FDR, of which 152 were nuclear specific. Effect sizes at intronic apaQTLs are negatively correlated with eQTL effect sizes. These observations suggest genetic variants can decrease mRNA expression levels by increasing usage of intronic PAS. We also identified 24 apaQTLs associated with protein levels, but not mRNA expression. Finally, we found that 19% of apaQTLs can be associated with disease. Thus, our work demonstrates that APA links genetic variation to variation in gene expression, protein expression, and disease risk, and reveals uncharted modes of genetic regulation.


Subject(s)
Gene Expression Regulation , Polyadenylation/genetics , Cell Line , Humans
10.
Neuroimage ; 168: 141-151, 2018 03.
Article in English | MEDLINE | ID: mdl-28069539

ABSTRACT

Ultra-high field magnetic resonance imaging (MRI) provides superior visualization of brain structures compared to lower fields, but images may be prone to severe geometric inhomogeneity. We propose to quantify local geometric distortion at ultra-high fields in in vivo datasets of human subjects scanned at both ultra-high field and lower fields. By using the displacement field derived from nonlinear image registration between images of the same subject, focal areas of spatial uncertainty are quantified. Through group and subject-specific analysis, we were able to identify regions systematically affected by geometric distortion at air-tissue interfaces prone to magnetic susceptibility, where the gradient coil non-linearity occurs in the occipital and suboccipital regions, as well as with distance from image isocenter. The derived displacement maps, quantified in millimeters, can be used to prospectively evaluate subject-specific local spatial uncertainty that should be taken into account in neuroimaging studies, and also for clinical applications like stereotactic neurosurgery where accuracy is critical. Validation with manual fiducial displacement demonstrated excellent correlation and agreement. Our results point to the need for site-specific calibration of geometric inhomogeneity. Our methodology provides a framework to permit prospective evaluation of the effect of MRI sequences, distortion correction techniques, and scanner hardware/software upgrades on geometric distortion.


Subject(s)
Brain/diagnostic imaging , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Adult , Brain/anatomy & histology , Female , Humans , Image Processing, Computer-Assisted/standards , Magnetic Fields , Magnetic Resonance Imaging/standards , Male , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...