Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
PLoS One ; 17(11): e0277680, 2022.
Article in English | MEDLINE | ID: mdl-36395175

ABSTRACT

The UK Biobank genotyped about 500k participants using Applied Biosystems Axiom microarrays. Participants were subsequently sequenced by the UK Biobank Exome Sequencing Consortium. Axiom genotyping was highly accurate in comparison to sequencing results, for almost 100,000 variants both directly genotyped on the UK Biobank Axiom array and via whole exome sequencing. However, in a study using the exome sequencing results of the first 50k individuals as reference (truth), it was observed that the positive predictive value (PPV) decreased along with the number of heterozygous array calls per variant. We developed a novel addition to the genotyping algorithm, Rare Heterozygous Adjusted (RHA), to significantly improve PPV in variants with minor allele frequency below 0.01%. The improvement in PPV was roughly equal when comparing to the exome sequencing of 50k individuals, or to the more recent ~200k individuals. Sensitivity was higher in the 200k data. The improved calling algorithm, along with enhanced quality control of array probesets, significantly improved the positive predictive value and the sensitivity of array data, making it suitable for the detection of ultra-rare variants.


Subject(s)
Exome , High-Throughput Nucleotide Sequencing , Humans , Genotype , High-Throughput Nucleotide Sequencing/methods , Retrospective Studies , Biological Specimen Banks , Polymorphism, Single Nucleotide , Algorithms , United Kingdom
2.
F1000Res ; 4: 121, 2015.
Article in English | MEDLINE | ID: mdl-26236466

ABSTRACT

Recently, the Mouse ENCODE Consortium reported that comparative gene expression data from human and mouse tend to cluster more by species rather than by tissue. This observation was surprising, as it contradicted much of the comparative gene regulatory data collected previously, as well as the common notion that major developmental pathways are highly conserved across a wide range of species, in particular across mammals. Here we show that the Mouse ENCODE gene expression data were collected using a flawed study design, which confounded sequencing batch (namely, the assignment of samples to sequencing flowcells and lanes) with species. When we account for the batch effect, the corrected comparative gene expression data from human and mouse tend to cluster by tissue, not by species.

3.
PLoS One ; 9(3): e90731, 2014.
Article in English | MEDLINE | ID: mdl-24618913

ABSTRACT

The composition of the human gut microbiome is influenced by many environmental factors. Diet is thought to be one of the most important determinants, though we have limited understanding of the extent to which dietary fluctuations alter variation in the gut microbiome between individuals. In this study, we examined variation in gut microbiome composition between winter and summer over the course of one year in 60 members of a founder population, the Hutterites. Because of their communal lifestyle, Hutterite diets are similar across individuals and remarkably stable throughout the year, with the exception that fresh produce is primarily served during the summer and autumn months. Our data indicate that despite overall gut microbiome stability within individuals over time, there are consistent and significant population-wide shifts in microbiome composition across seasons. We found seasonal differences in both (i) the abundance of particular taxa (false discovery rate <0.05), including highly abundant phyla Bacteroidetes and Firmicutes, and (ii) overall gut microbiome diversity (by Shannon diversity; P = 0.001). It is likely that the dietary fluctuations between seasons with respect to produce availability explain, at least in part, these differences in microbiome composition. For example, high levels of produce containing complex carbohydrates consumed during the summer months might explain increased abundance of Bacteroidetes, which contain complex carbohydrate digesters, and decreased levels of Actinobacteria, which have been negatively correlated to fiber content in food questionnaires. Our observations demonstrate the plastic nature of the human gut microbiome in response to variation in diet.


Subject(s)
Gastrointestinal Tract/microbiology , Metagenome , Microbiota , Seasons , Age Factors , Biodiversity , Feces/microbiology , Female , Humans , Male
4.
PLoS One ; 8(1): e53608, 2013.
Article in English | MEDLINE | ID: mdl-23308262

ABSTRACT

Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ∼8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.


Subject(s)
DNA Barcoding, Taxonomic/methods , Genes, Bacterial , Genes, rRNA , Metagenome , Microbial Consortia/genetics , RNA, Ribosomal, 16S/classification , Sequence Analysis, DNA/methods , Bayes Theorem , DNA Barcoding, Taxonomic/statistics & numerical data , High-Throughput Nucleotide Sequencing , Phylogeny , RNA, Ribosomal, 16S/genetics , Research Design , Sequence Analysis, DNA/statistics & numerical data
5.
PLoS Genet ; 8(10): e1003000, 2012.
Article in English | MEDLINE | ID: mdl-23071454

ABSTRACT

Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci ("rdQTLs"). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.


Subject(s)
Gene Expression , Genetic Variation , Quantitative Trait Loci , RNA Stability , Cell Line , Chromosome Mapping , Gene Expression Profiling , Gene Expression Regulation , Genome-Wide Association Study , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , RNA Interference
6.
Curr Biol ; 18(12): 883-9, 2008 Jun 24.
Article in English | MEDLINE | ID: mdl-18571414

ABSTRACT

What evolutionary forces shape genes that contribute to the risk of human disease? Do similar selective pressures act on alleles that underlie simple versus complex disorders [1-3]? Answers to these questions will shed light onto the origin of human disorders (e.g., [4]) and help to predict the population frequencies of alleles that contribute to disease risk, with important implications for the efficient design of mapping studies [5-7]. As a first step toward addressing these questions, we created a hand-curated version of the Mendelian Inheritance in Man database (OMIM). We then examined selective pressures on Mendelian-disease genes, genes that contribute to complex-disease risk, and genes known to be essential in mouse by analyzing patterns of human polymorphism and of divergence between human and rhesus macaque. We found that Mendelian-disease genes appear to be under widespread purifying selection, especially when the disease mutations are dominant (rather than recessive). In contrast, the class of genes that influence complex-disease risk shows little signs of evolutionary conservation, possibly because this category includes targets of both purifying and positive selection.


Subject(s)
Databases, Factual , Genes/genetics , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease/genetics , Genome, Human , Selection, Genetic , Animals , Computational Biology , Humans , Mice , Online Systems , Polymorphism, Genetic
7.
PLoS Genet ; 4(3): e1000018, 2008 Mar 07.
Article in English | MEDLINE | ID: mdl-18369443

ABSTRACT

Transcription factors (TFs) regulate gene expression through specific interactions with short promoter elements. The same regulatory protein may recognize a variety of related sequences. Moreover, once they are detected it is hard to predict whether highly similar sequence motifs will be recognized by the same TF and regulate similar gene expression patterns, or serve as binding sites for distinct regulatory factors. We developed computational measures to assess the functional implications of variations on regulatory motifs and to compare the functions of related sites. We have developed computational means for estimating the functional outcome of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. We predict the effects of nucleotide variations within motifs on gene expression patterns. In cases where such predictions could be compared to suitable published experimental evidence, we found very good agreement. We further accumulated statistics from multiple substitutions across various binding sites in an attempt to deduce general properties that characterize nucleotide substitutions that are more likely to alter expression. We found that substitutions involving Adenine are more likely to retain the expression pattern and that substitutions involving Guanine are more likely to alter expression compared to the rest of the substitutions. Our results should facilitate the prediction of the expression outcomes of binding site variations. One typical important implication is expected to be the ability to predict the phenotypic effect of variation in regulatory motifs in promoters.


Subject(s)
Gene Expression Regulation, Fungal , Genetic Variation , Base Sequence , Binding Sites/genetics , Conserved Sequence , DNA, Fungal/genetics , DNA, Fungal/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Databases, Nucleic Acid , Evolution, Molecular , Genome, Fungal , Phenotype , Point Mutation , Promoter Regions, Genetic , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
8.
PLoS One ; 2(8): e682, 2007 Aug 01.
Article in English | MEDLINE | ID: mdl-17668060

ABSTRACT

BACKGROUND: Olfactory receptors (ORs) are the largest gene family in mammalian genomes. Since nearly all OR genes are orphan receptors, inference of functional similarity or differences between odorant receptors typically relies on sequence comparisons. Based on the alignment of entire coding region sequence, OR genes are classified into families and subfamilies, a classification that is believed to be a proxy for OR gene functional variability. However, the assumption that overall protein sequence diversity is a good proxy for functional properties is untested. METHODOLOGY: Here, we propose an alternative sequence-based approach to infer the similarities and differences in OR binding capacity. Our approach is based on similarities and differences in the predicted binding pockets of OR genes, rather than on the entire OR coding region. CONCLUSIONS: Interestingly, our approach yields markedly different results compared to the analysis based on the entire OR coding-regions. While neither approach can be tested at this time, the discrepancy between the two calls into question the assumption that the current classification reliably reflects OR gene functional variability.


Subject(s)
Genetic Variation , Olfactory Receptor Neurons/physiology , Receptors, Odorant/genetics , Amino Acid Sequence , Animals , Cluster Analysis , Hemiterpenes , Humans , Mice , Molecular Sequence Data , Odorants , Pentanoic Acids/chemistry , Phylogeny , Receptors, Odorant/classification , Sequence Alignment
9.
Nat Genet ; 39(3): 415-21, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17277776

ABSTRACT

A major challenge in comparative genomics is to understand how phenotypic differences between species are encoded in their genomes. Phenotypic divergence may result from differential transcription of orthologous genes, yet less is known about the involvement of differential translation regulation in species phenotypic divergence. In order to assess translation effects on divergence, we analyzed approximately 2,800 orthologous genes in nine yeast genomes. For each gene in each species, we predicted translation efficiency, using a measure of the adaptation of its codons to the organism's tRNA pool. Mining this data set, we found hundreds of genes and gene modules with correlated patterns of translational efficiency across the species. One signal encompassed entire modules that are either needed for oxidative respiration or fermentation and are efficiently translated in aerobic or anaerobic species, respectively. In addition, the efficiency of translation of the mRNA splicing machinery strongly correlates with the number of introns in the various genomes. Altogether, we found extensive selection on synonymous codon usage that modulates translation according to gene function and organism phenotype. We conclude that, like factors such as transcription regulation, translation efficiency affects and is affected by the process of species divergence.


Subject(s)
Genes, Fungal , Phenotype , Protein Biosynthesis/genetics , Yeasts/classification , Yeasts/genetics , Biological Evolution , Codon/metabolism , RNA, Transfer/metabolism , Species Specificity
10.
Bioinformatics ; 21(16): 3435-8, 2005 Aug 15.
Article in English | MEDLINE | ID: mdl-15955783

ABSTRACT

An easy-to-use, versatile and freely available graphic web server, FoldIndex is described: it predicts if a given protein sequence is intrinsically unfolded implementing the algorithm of Uversky and co-workers, which is based on the average residue hydrophobicity and net charge of the sequence. FoldIndex has an error rate comparable to that of more sophisticated fold prediction methods. Sliding windows permit identification of large regions within a protein that possess folding propensities different from those of the whole protein.


Subject(s)
Algorithms , Models, Chemical , Models, Molecular , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , User-Computer Interface , Computer Graphics , Computer Simulation , Energy Transfer , Internet , Protein Conformation , Protein Folding , Proteins/analysis , Structure-Activity Relationship
11.
Genome Res ; 15(2): 224-30, 2005 Feb.
Article in English | MEDLINE | ID: mdl-15687286

ABSTRACT

Olfactory receptor (OR) genes constitute the basis of the sense of smell and are encoded by the largest mammalian gene superfamily, with >1000 members. In humans, but not in mice or dogs, the majority of OR genes have become pseudogenes, suggesting that OR genes in humans evolve under different selection pressures than in other mammals. To explore this further, we compare the OR gene repertoire of human with its closest living evolutionary relative, by taking advantage of the recently sequenced genome of the chimpanzee. In agreement with previous reports based on a small number of ORs, we find that humans have a significantly higher proportion of OR pseudogenes than chimpanzees. Moreover, we can reject the possibility that humans have been accumulating OR pseudogenes at a constant neutral rate since the divergence of human and chimpanzee. The comparison of the two repertoires reveals two chimpanzee-specific OR subfamily expansions and three expansions specific to humans. It also suggests that a subset of OR genes are under positive selection in either the human or the chimpanzee lineage. Thus, although overall there is relaxed constraint on human olfaction relative to chimpanzee, species-specific sensory requirements appear to have shaped the evolution of the functional OR gene repertoires in both species.


Subject(s)
Pan troglodytes/genetics , Receptors, Odorant/genetics , Animals , Evolution, Molecular , Genes/genetics , Humans , Phylogeny , Pseudogenes/genetics , Selection, Genetic
12.
Mol Biol Evol ; 22(3): 432-6, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15496549

ABSTRACT

Bitter taste perception is crucial for the survival of organisms because it enables them to avoid the ingestion of potentially harmful substances. Bitter taste receptors are encoded by a gene family that in humans has been shown to contain 25 putatively functional genes and 8 pseudogenes and in mouse 33 putatively functional genes and 3 pseudogenes. Lineage-specific expansions of bitter taste receptors have taken place in both mouse and human, but very little is known about the evolution of these receptors in primates. We report the analysis of the almost complete repertoires of bitter taste receptor genes in human, great apes, and two Old World monkeys. As a group, these genes seem to be under little selective constraint compared with olfactory receptors and other genes in the studied species. However, in contrast to the olfactory receptor gene repertoire, where humans have a higher proportion of pseudogenes than apes, there is no evidence that the rate of loss of bitter taste receptor genes varies among humans and apes.


Subject(s)
Evolution, Molecular , Haplorhini/genetics , Phylogeny , Receptors, G-Protein-Coupled/genetics , Animals , Humans
13.
Proteins ; 54(1): 20-40, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14705021

ABSTRACT

Availability of complete genome sequences allows in-depth comparison of single-residue and oligopeptide compositions of the corresponding proteomes. We have used principal component analysis (PCA) to study the landscape of compositional motifs across more than 70 genera from all three superkingdoms. Unexpectedly, the first two principal components clearly differentiate archaea, eubacteria, and eukaryota from each other. In particular, we contrast compositional patterns typical of the three superkingdoms and characterize differences between species and phyla, as well as among patterns shared by all compositional proteomic signatures. These species-specific patterns may even extend to subsets of the entire proteome, such as proteins pertaining to individual yeast chromosomes. We identify factors that affect compositional signatures, such as living habitat, and detect strong eukaryotic preference for homopeptides and palindromic tripeptides. We further detect oligopeptides that are either universally over- or underabundant across the whole proteomic landscape, as well as oligopeptides whose over- or underabundance is phylum- or species-specific. Finally, we report that species composition signatures preserve evolutionary memory, providing a new method to compare phylogenetic relationships among species that avoids problems of sequence alignment and ortholog detection.


Subject(s)
Oligopeptides/chemistry , Phylogeny , Proteomics/methods , Sequence Analysis, Protein/methods , Amino Acid Motifs , Archaea/classification , Bacteria/classification , Eukaryotic Cells/classification , Oligopeptides/classification , Principal Component Analysis , Proteome/chemistry
14.
Protein Sci ; 13(1): 240-54, 2004 Jan.
Article in English | MEDLINE | ID: mdl-14691239

ABSTRACT

Olfactory receptors (ORs) are a large family of proteins involved in the recognition and discrimination of numerous odorants. These receptors belong to the G-protein coupled receptor (GPCR) hyperfamily, for which little structural data are available. In this study we predict the binding site residues of OR proteins by analyzing a set of 1441 OR protein sequences from mouse and human. The central insight utilized is that functional contact residues would be conserved among pairs of orthologous receptors, but considerably less conserved among paralogous pairs. Using judiciously selected subsets of 218 ortholog pairs and 518 paralog pairs, we have identified 22 sequence positions that are both highly conserved among the putative orthologs and variable among paralogs. These residues are disposed on transmembrane helices 2 to 7, and on the second extracellular loop of the receptor. Strikingly, although the prediction makes no assumption about the location of the binding site, these amino acid positions are clustered around a pocket in a structural homology model of ORs, mostly facing the inner lumen. We propose that the identified positions constitute the odorant binding site. This conclusion is supported by the observation that all but one of the predicted binding site residues correspond to ligand-contact positions in other rhodopsin-like GPCRs.


Subject(s)
Receptors, Odorant/chemistry , Receptors, Odorant/metabolism , Amino Acid Sequence , Amino Acids/chemistry , Animals , Binding Sites , Cluster Analysis , Consensus Sequence , Conserved Sequence , Databases, Protein , Humans , Mice , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Binding , Receptors, Odorant/genetics , Sequence Homology, Amino Acid , Signal Transduction , Species Specificity
15.
Curr Opin Struct Biol ; 13(3): 353-8, 2003 Jun.
Article in English | MEDLINE | ID: mdl-12831887

ABSTRACT

Groups of related genes abound in large eukaryotic genomes. In such 'subgenomes', homology modeling carried out for a few genes will probably have relevance to the entire group. Subgenomes also afford unique ways of determining protein structural information. In addition to analyses based on the quantification of residue variability in paralogs, two-way comparisons, both within and among species, help to disclose functional amino acids. Comparative studies of gene families throughout the mammalian genome will also help elucidate the functional significance of single nucleotide polymorphisms in coding regions.


Subject(s)
Multigene Family/genetics , Receptors, Odorant/genetics , Sequence Homology , Animals , Binding Sites , Cluster Analysis , Conserved Sequence , Humans , Mammals/genetics , Polymorphism, Single Nucleotide , Synteny
16.
Nat Genet ; 34(2): 143-4, 2003 Jun.
Article in English | MEDLINE | ID: mdl-12730696

ABSTRACT

Of more than 1,000 human olfactory receptor genes, more than half seem to be pseudogenes. We investigated whether the most recent of these disruptions might still segregate with the intact form by genotyping 51 candidate genes in 189 ethnically diverse humans. The results show an unprecedented prevalence of segregating pseudogenes, identifying one of the most pronounced cases of functional population diversity in the human genome.


Subject(s)
Receptors, Odorant/genetics , Animals , Black People/genetics , Genome, Human , Genotype , Humans , Multigene Family , Pan troglodytes/genetics , Phenotype , Polymorphism, Single Nucleotide , Pseudogenes
17.
Proc Natl Acad Sci U S A ; 100(6): 3324-7, 2003 Mar 18.
Article in English | MEDLINE | ID: mdl-12612342

ABSTRACT

Olfactory receptor (OR) genes constitute the basis for the sense of smell and are encoded by the largest mammalian gene superfamily of >1,000 genes. In humans, >60% of these are pseudogenes. In contrast, the mouse OR repertoire, although of roughly equal size, contains only approximately 20% pseudogenes. We asked whether the high fraction of nonfunctional OR genes is specific to humans or is a common feature of all primates. To this end, we have compared the sequences of 50 human OR coding regions, regardless of their functional annotations, to those of their putative orthologs in chimpanzees, gorillas, orangutans, and rhesus macaques. We found that humans have accumulated mutations that disrupt OR coding regions roughly 4-fold faster than any other species sampled. As a consequence, the fraction of OR pseudogenes in humans is almost twice as high as in the non-human primates, suggesting a human-specific process of OR gene disruption, likely due to a reduced chemosensory dependence relative to apes.


Subject(s)
Receptors, Odorant/genetics , Animals , DNA/genetics , Evolution, Molecular , Gene Silencing , Gorilla gorilla/genetics , Humans , Macaca mulatta/genetics , Mice , Molecular Sequence Data , Multigene Family , Pan troglodytes/genetics , Pongo pygmaeus/genetics , Primates/genetics , Pseudogenes , Species Specificity
18.
Hum Mol Genet ; 11(12): 1381-90, 2002 Jun 01.
Article in English | MEDLINE | ID: mdl-12023980

ABSTRACT

We investigated the population differences in patterns of single nucleotide polymorphisms (SNPs) for a 400 kb olfactory receptor (OR) gene cluster on human chromosome 17p13.3. Samples were drawn from 35 individuals, of four different ethnogeographical origins: Pygmies, Bedouins, Yemenite Jews and Ashkenazi Jews. Of the 74 SNPs identified, two segregated between pseudogenized and intact ORs, while a third involved a change in a highly conserved motif proposed to mediate ligand-induced signal transduction. Linkage disequilibrium (LD) was computed based on phase inference across the cluster using Clark's haplotype subtraction algorithm. We also calculated LD directly from the genotypes using the expectation-maximization (EM) algorithm. Both methods yielded very similar results. Our analyses revealed substantial differences in nucleotide diversity, haplotype distribution and LD patterns among the different human populations. In particular, the two Jewish populations had low haplotype diversity and negligible decay of LD across the entire genomic region. Intriguingly, the three functional SNPs segregated at different frequencies in the different ethnogeographical groups, with the Pygmies having higher frequencies of the intact OR genes. Our data suggests that OR genes may have evolved to create different functional repertoires in distinct human populations.


Subject(s)
Chromosomes, Human, Pair 17 , Haplotypes , Multigene Family , Receptors, Odorant/genetics , Humans , Linkage Disequilibrium , Pseudogenes , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...