Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
PLoS Biol ; 22(6): e3002661, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38829909

ABSTRACT

Deuterostomes are a monophyletic group of animals that includes Hemichordata, Echinodermata (together called Ambulacraria), and Chordata. The diversity of deuterostome body plans has made it challenging to reconstruct their ancestral condition and to decipher the genetic changes that drove the diversification of deuterostome lineages. Here, we generate chromosome-level genome assemblies of 2 hemichordate species, Ptychodera flava and Schizocardium californicum, and use comparative genomic approaches to infer the chromosomal architecture of the deuterostome common ancestor and delineate lineage-specific chromosomal modifications. We show that hemichordate chromosomes (1N = 23) exhibit remarkable chromosome-scale macrosynteny when compared to other deuterostomes and can be derived from 24 deuterostome ancestral linkage groups (ALGs). These deuterostome ALGs in turn match previously inferred bilaterian ALGs, consistent with a relatively short transition from the last common bilaterian ancestor to the origin of deuterostomes. Based on this deuterostome ALG complement, we deduced chromosomal rearrangement events that occurred in different lineages. For example, a fusion-with-mixing event produced an Ambulacraria-specific ALG that subsequently split into 2 chromosomes in extant hemichordates, while this homologous ALG further fused with another chromosome in sea urchins. Orthologous genes distributed in these rearranged chromosomes are enriched for functions in various developmental processes. We found that the deeply conserved Hox clusters are located in highly rearranged chromosomes and that maintenance of the clusters are likely due to lower densities of transposable elements within the clusters. We also provide evidence that the deuterostome-specific pharyngeal gene cluster was established via the combination of 3 pre-assembled microsyntenic blocks. We suggest that since chromosomal rearrangement events and formation of new gene clusters may change the regulatory controls of developmental genes, these events may have contributed to the evolution of diverse body plans among deuterostomes.


Subject(s)
Chromosomes , Evolution, Molecular , Genome , Phylogeny , Animals , Chromosomes/genetics , Genome/genetics , Synteny , Genetic Linkage , Chordata/genetics
2.
Plant Genome ; 14(1): e20072, 2021 03.
Article in English | MEDLINE | ID: mdl-33605092

ABSTRACT

Hop (Humulus lupulus L. var Lupulus) is a diploid, dioecious plant with a history of cultivation spanning more than one thousand years. Hop cones are valued for their use in brewing and contain compounds of therapeutic interest including xanthohumol. Efforts to determine how biochemical pathways responsible for desirable traits are regulated have been challenged by the large (2.8 Gb), repetitive, and heterozygous genome of hop. We present a draft haplotype-phased assembly of the Cascade cultivar genome. Our draft assembly and annotation of the Cascade genome is the most extensive representation of the hop genome to date. PacBio long-read sequences from hop were assembled with FALCON and partially phased with FALCON-Unzip. Comparative analysis of haplotype sequences provides insight into selective pressures that have driven evolution in hop. We discovered genes with greater sequence divergence enriched for stress-response, growth, and flowering functions in the draft phased assembly. With improved resolution of long terminal retrotransposons (LTRs) due to long-read sequencing, we found that hop is over 70% repetitive. We identified a homolog of cannabidiolic acid synthase (CBDAS) that is expressed in multiple tissues. The approaches we developed to analyze the draft phased assembly serve to deepen our understanding of the genomic landscape of hop and may have broader applicability to the study of other large, complex genomes.


Subject(s)
Humulus , Diploidy , Genome, Plant , Genomics , Haplotypes , Humulus/genetics
3.
Sci Data ; 7(1): 399, 2020 11 17.
Article in English | MEDLINE | ID: mdl-33203859

ABSTRACT

The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


Subject(s)
High-Throughput Nucleotide Sequencing , Mice/genetics , Zea mays/genetics , Animals , Fragaria/genetics , Genome, Plant , Metagenome , Ranidae/genetics , Sequence Analysis, DNA
4.
G3 (Bethesda) ; 10(9): 2911-2925, 2020 09 02.
Article in English | MEDLINE | ID: mdl-32631951

ABSTRACT

In recent years, improved sequencing technology and computational tools have made de novo genome assembly more accessible. Many approaches, however, generate either an unphased or only partially resolved representation of a diploid genome, in which polymorphisms are detected but not assigned to one or the other of the homologous chromosomes. Yet chromosomal phase information is invaluable for the understanding of phenotypic trait inheritance in the cases of compound heterozygosity, allele-specific expression or cis-acting variants. Here we use a combination of tools and sequencing technologies to generate a de novo diploid assembly of the human primary cell line WI-38. First, data from PacBio single molecule sequencing and Bionano Genomics optical mapping were combined to generate an unphased assembly. Next, 10x Genomics linked reads were combined with the hybrid assembly to generate a partially phased assembly. Lastly, we developed and optimized methods to use short-read (Illumina) sequencing of flow cytometry-sorted metaphase chromosomes to provide phase information. The final genome assembly was almost fully (94%) phased with the addition of approximately 2.5-fold coverage of Illumina data from the sequenced metaphase chromosomes. The diploid nature of the final de novo genome assembly improved the resolution of structural variants between the WI-38 genome and the human reference genome. The phased WI-38 sequence data are available for browsing and download at wi38.research.calicolabs.com. Our work shows that assembling a completely phased diploid genome de novo from the DNA of a single individual is now readily achievable.


Subject(s)
Diploidy , Genome, Human , DNA , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA
5.
Nat Biotechnol ; 37(10): 1155-1162, 2019 10.
Article in English | MEDLINE | ID: mdl-31406327

ABSTRACT

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.


Subject(s)
DNA, Circular/genetics , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Base Sequence , Genetic Variation , Haplotypes , Humans
6.
Int J Eng Educ ; 34(2B): 599-608, 2018.
Article in English | MEDLINE | ID: mdl-30740001

ABSTRACT

A hands-on learning module was implemented at Marquette University in 2012 to teach biomedical engineering students about basic manufacturing processes, lean manufacturing principles, and design for manufacturability. It incorporates active and student-centered learning as part of in-class assembly line simulations. Since then, it has evolved from three class periods to five. The module begins with two classroom presentations on manufacturing operations and electronics design, assembly, and testing. Students then participate in an in-class assembly line simulation exercise where they build and test an actual product per written work instructions. They reflect on this experience, and suggest design and process changes to improve the assembly line process and quality, save time, and reduce cost and waste. At the end of the module students implement their suggested design and process improvements and repeat the exercise to determine the impact of their improvements. They learn of the importance of Design for Manufacturability, well-written work instructions, process design, and designing a product not only for the end user, but also for the assemblers and inspectors. Details of the module, and its implementation and assessment are presented along with student feedback and faculty observations.

7.
Nature ; 546(7659): 524-527, 2017 06 22.
Article in English | MEDLINE | ID: mdl-28605751

ABSTRACT

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.


Subject(s)
Genome, Plant/genetics , High-Throughput Nucleotide Sequencing/methods , Single Molecule Imaging/methods , Zea mays/genetics , Centromere/genetics , Chromosomes, Plant/genetics , Contig Mapping , Crops, Agricultural/genetics , DNA Transposable Elements/genetics , DNA, Intergenic/genetics , Genes, Plant/genetics , Molecular Sequence Annotation , Optics and Photonics , Phylogeny , RNA, Messenger/analysis , RNA, Messenger/genetics , Reference Standards , Sorghum/genetics
8.
Nat Methods ; 13(12): 1050-1054, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27749838

ABSTRACT

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.


Subject(s)
Diploidy , Genome, Fungal/genetics , Genome, Plant/genetics , Genomics/methods , Polymorphism, Single Nucleotide/genetics , Algorithms , Arabidopsis/genetics , Basidiomycota/genetics , DNA, Fungal/genetics , DNA, Plant/genetics , Haplotypes , Heterozygote , Humans , Sequence Analysis, DNA , Vitis/genetics
9.
Sci Data ; 1: 140045, 2014.
Article in English | MEDLINE | ID: mdl-25977796

ABSTRACT

Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.


Subject(s)
Arabidopsis/genetics , Drosophila melanogaster/genetics , Escherichia coli/genetics , Genome, Bacterial , Genome, Fungal , Genome, Insect , Genome, Plant , Neurospora crassa/genetics , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA , Animals , Models, Animal
10.
Genome Biol ; 14(1): R10, 2013 Jan 30.
Article in English | MEDLINE | ID: mdl-23363705

ABSTRACT

BACKGROUND: Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. RESULTS: Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. CONCLUSIONS: While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.


Subject(s)
Centromere/genetics , Evolution, Molecular , Tandem Repeat Sequences , Animals , Base Sequence , Molecular Sequence Data , Plants/genetics , Species Specificity
11.
Genome Res ; 23(1): 121-8, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23064752

ABSTRACT

The human fragile X mental retardation 1 (FMR1) gene contains a (CGG)(n) trinucleotide repeat in its 5' untranslated region (5'UTR). Expansions of this repeat result in a number of clinical disorders with distinct molecular pathologies, including fragile X syndrome (FXS; full mutation range, greater than 200 CGG repeats) and fragile X-associated tremor/ataxia syndrome (FXTAS; premutation range, 55-200 repeats). Study of these diseases has been limited by an inability to sequence expanded CGG repeats, particularly in the full mutation range, with existing DNA sequencing technologies. Single-molecule, real-time (SMRT) sequencing provides an approach to sequencing that is fundamentally different from other "next-generation" sequencing platforms, and is well suited for long, repetitive DNA sequences. We report the first sequence data for expanded CGG-repeat FMR1 alleles in the full mutation range that reveal the confounding effects of CGG-repeat tracts on both cloning and PCR. A unique feature of SMRT sequencing is its ability to yield real-time information on the rates of nucleoside addition by the tethered DNA polymerase; for the CGG-repeat alleles, we find a strand-specific effect of CGG-repeat DNA on the interpulse distance. This kinetic signature reveals a novel aspect of the repeat element; namely, that the particular G bias within the CGG/CCG-repeat element influences polymerase activity in a manner that extends beyond simple nearest-neighbor effects. These observations provide a baseline for future kinetic studies of repeat elements, as well as for studies of epigenetic and other chemical modifications thereof.


Subject(s)
Alleles , Fragile X Mental Retardation Protein/genetics , Sequence Analysis, DNA/methods , 5' Untranslated Regions , Base Sequence , Humans , Molecular Sequence Data , Mutation , Trinucleotide Repeat Expansion/genetics
12.
N Engl J Med ; 365(8): 709-17, 2011 Aug 25.
Article in English | MEDLINE | ID: mdl-21793740

ABSTRACT

BACKGROUND: A large outbreak of diarrhea and the hemolytic-uremic syndrome caused by an unusual serotype of Shiga-toxin-producing Escherichia coli (O104:H4) began in Germany in May 2011. As of July 22, a large number of cases of diarrhea caused by Shiga-toxin-producing E. coli have been reported--3167 without the hemolytic-uremic syndrome (16 deaths) and 908 with the hemolytic-uremic syndrome (34 deaths)--indicating that this strain is notably more virulent than most of the Shiga-toxin-producing E. coli strains. Preliminary genetic characterization of the outbreak strain suggested that, unlike most of these strains, it should be classified within the enteroaggregative pathotype of E. coli. METHODS: We used third-generation, single-molecule, real-time DNA sequencing to determine the complete genome sequence of the German outbreak strain, as well as the genome sequences of seven diarrhea-associated enteroaggregative E. coli serotype O104:H4 strains from Africa and four enteroaggregative E. coli reference strains belonging to other serotypes. Genomewide comparisons were performed with the use of these enteroaggregative E. coli genomes, as well as those of 40 previously sequenced E. coli isolates. RESULTS: The enteroaggregative E. coli O104:H4 strains are closely related and form a distinct clade among E. coli and enteroaggregative E. coli strains. However, the genome of the German outbreak strain can be distinguished from those of other O104:H4 strains because it contains a prophage encoding Shiga toxin 2 and a distinct set of additional virulence and antibiotic-resistance factors. CONCLUSIONS: Our findings suggest that horizontal genetic exchange allowed for the emergence of the highly virulent Shiga-toxin-producing enteroaggregative E. coli O104:H4 strain that caused the German outbreak. More broadly, these findings highlight the way in which the plasticity of bacterial genomes facilitates the emergence of new pathogens.


Subject(s)
Disease Outbreaks , Escherichia coli Infections/microbiology , Genome, Bacterial , Hemolytic-Uremic Syndrome/microbiology , Shiga-Toxigenic Escherichia coli/genetics , Bacterial Typing Techniques , Base Sequence , Diarrhea/epidemiology , Diarrhea/microbiology , Escherichia coli Infections/epidemiology , Feces/microbiology , Female , Germany/epidemiology , Hemolytic-Uremic Syndrome/epidemiology , Humans , Middle Aged , Phylogeny , Polymerase Chain Reaction , Sequence Analysis, DNA , Shiga-Toxigenic Escherichia coli/classification , Shiga-Toxigenic Escherichia coli/isolation & purification
13.
Nucleic Acids Res ; 38(15): e159, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20571086

ABSTRACT

A novel template design for single-molecule sequencing is introduced, a structure we refer to as a SMRTbell template. This structure consists of a double-stranded portion, containing the insert of interest, and a single-stranded hairpin loop on either end, which provides a site for primer binding. Structurally, this format resembles a linear double-stranded molecule, and yet it is topologically circular. When placed into a single-molecule sequencing reaction, the SMRTbell template format enables a consensus sequence to be obtained from multiple passes on a single molecule. Furthermore, this consensus sequence is obtained from both the sense and antisense strands of the insert region. In this article, we present a universal method for constructing these templates, as well as an application of their use. We demonstrate the generation of high-quality consensus accuracy from single molecules, as well as the use of SMRTbell templates in the identification of rare sequence variants.


Subject(s)
DNA/chemistry , Oligonucleotides/chemistry , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Base Sequence , Consensus Sequence , Staphylococcus aureus/genetics , Templates, Genetic
14.
Science ; 323(5910): 133-8, 2009 Jan 02.
Article in English | MEDLINE | ID: mdl-19023044

ABSTRACT

We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.


Subject(s)
DNA-Directed DNA Polymerase/metabolism , Sequence Analysis, DNA/methods , Base Sequence , Consensus Sequence , DNA/biosynthesis , DNA, Circular/chemistry , DNA, Single-Stranded/chemistry , Deoxyribonucleotides/metabolism , Enzymes, Immobilized , Fluorescent Dyes , Kinetics , Nanostructures , Spectrometry, Fluorescence
15.
Mol Pharmacol ; 67(4): 1360-8, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15662043

ABSTRACT

Transcriptional profiling via microarrays holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different microarray platforms, protocols, and informatics often hinders the meaningful comparison of transcriptional profiling data across laboratories. One solution to this problem is to provide a low-cost and centralized resource that enables researchers to share toxicogenomic data that has been generated on a common platform. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols to simplify the analysis of liver gene expression in the mouse model. This resource, referred to as EDGE, was then used to generate a training set of 117 publicly accessible transcriptional profiles that can be accessed at http://edge.oncology.wisc.edu/. The Web-accessible database was also linked to an informatics suite that allows on-line clustering and K-means analyses as well as Boolean and sequence-based searches of the data. We propose that EDGE can serve as a prototype resource for the sharing of toxicogenomics information and be used to develop algorithms for efficient chemical classification and hazard prediction.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis/methods , Toxicogenetics , Animals , Lipopolysaccharides/pharmacology , Liver/drug effects , Liver/metabolism , Mice , PPAR alpha/agonists , Receptors, Aryl Hydrocarbon/agonists
16.
Eur J Hum Genet ; 12(4): 321-32, 2004 Apr.
Article in English | MEDLINE | ID: mdl-14560315

ABSTRACT

Genes involved in the testosterone biosynthetic pathway - such as CYP17A1, CYP3A4, and SRD5A2 - represent strong candidates for affecting prostate cancer. Previous work has detected associations between individual variants in these three genes and prostate cancer risk and aggressiveness. To more comprehensively evaluate CYP17A1, CYP3A4, and SRD5A2, we undertook a two-phase study of the relationship between their genotypes/haplotypes and prostate cancer. Phase I of the study first searched for single-nucleotide polymorphisms (SNPs) in these genes by resequencing 24 individuals from the Coriell Polymorphism Discovery Resource, 92-110 men from prostate cancer case-control sibships, and by leveraging public databases. In all, 87 SNPs were discovered and genotyped in 276 men from case-control sibships. Those SNPs exhibiting preliminary case-control allele frequency differences, or distinguishing (ie, 'tagging') common haplotypes across the genes, were identified for further study (24 SNPs in total). In Phase II of the study, the 24 SNPs were genotyped in an additional 841 men from case-control sibships. Finally, associations between genotypes/haplotypes in CYP17A1, CYP3A4, and SRD5A2 and prostate cancer were evaluated in the total case-control sample of 1117 brothers from 506 sibships. Family-based analyses detected associations between prostate cancer risk or aggressiveness and a number of CYP3A4 SNPs (P-values between 0.006 and 0.05), a CYP3A4 haplotype (P-values 0.05 and 0.009 in nonstratified and stratified analysis, respectively), and two SRD5A2 SNPs in strong linkage disequilibrium (P=0.02). Undertaking a two-phase study comprising SNP discovery, haplotype tagging, and association analyses allowed us to more fully decipher the relation between CYP17A1, CYP3A4, and SRD5A2 and prostate cancer.


Subject(s)
3-Oxo-5-alpha-Steroid 4-Dehydrogenase/genetics , Cytochrome P-450 Enzyme System/genetics , Haplotypes , Prostatic Neoplasms/genetics , Steroid 17-alpha-Hydroxylase/genetics , Case-Control Studies , Cytochrome P-450 CYP3A , Genotype , Humans , Male , Polymorphism, Single Nucleotide
18.
Bioinformatics ; 18(8): 1064-72, 2002 Aug.
Article in English | MEDLINE | ID: mdl-12176829

ABSTRACT

MOTIVATION: In many microarray experiments, relatively few intra- and inter-array replicate measurements are made due to significant cost limitations and sample availability. Compounding this problem is a lack of robust statistical methods for analyzing gene expression data with limited experimental replicates. As a result, the interpretation of the results of these experiments are difficult with little understanding of the probability of type I and type II errors. RESULTS: The variability in a series of replicate microarray measurements was modelled using a combination of parametric and non-parametric methods. A 3-dimensional surface was created for the conditional distribution of the variability given the mean signal intensity in both the Cy3 and Cy5 channels. The results were used as the basis for developing statistical methods for analyzing gene expression data with limited experimental replicates. AVAILABILITY: The statistical analysis scripts are available upon request.


Subject(s)
DNA/genetics , Gene Expression , Models, Genetic , Models, Statistical , Oligonucleotide Array Sequence Analysis/methods , Computer Simulation , DNA Replication/genetics , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation , Humans , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Reproducibility of Results , Sensitivity and Specificity
19.
Pharmacogenetics ; 12(2): 151-63, 2002 Mar.
Article in English | MEDLINE | ID: mdl-11875369

ABSTRACT

The Ahr locus encodes for the aryl hydrocarbon receptor (AHR), which plays an important toxicological and developmental role. Sequence variation in this gene was studied in 13 different mouse lines that included eight laboratory strains, two Mus musculus subspecies and three additional Mus species. The data presented represent the largest study of sequence variation across multiple mouse lines in a single gene (approximately equal to 15.9 kb/mouse line). Among all mice, the average frequency of all polymorphisms in the intronic regions was 20.3 variants/kb and the average exonic frequency was 14.1 variants/kb. For substitutions alone, the average frequencies in the intronic and exonic regions for all mice were 13.3 and 8.9 substitutions/kb, respectively. Between laboratory strains, the average intronic and exonic frequencies for all polymorphisms dropped to 5.4 and 2.9 variants/kb, respectively. There were 111 non-synonymous polymorphisms that resulted in 42 different amino acid changes, of which only 10 amino acid changes had been previously identified. Based on the nucleotide sequence, the phylogenetic history of the gene showed mice from the Ahr(b2) and Ahr(d) alleles in separate branches while mice from the Ahr(b1) and Ahr(b3) alleles exhibited a more complex history. Evolutionarily, the AHR protein as a whole appears to be under purifying selective pressure (K(a) : K(s) ratio = 0.237). Despite significant functional constraint in the basic helix-loop-helix and PAS domains, ligand binding is not constrained to the high-affinity allele, which supports further the role of the AHR in development and its importance beyond the adaptive response to environmental toxicants.


Subject(s)
Genetic Variation , Mice, Inbred Strains/genetics , Polymorphism, Genetic , Receptors, Aryl Hydrocarbon/genetics , Amino Acid Sequence , Animals , Evolution, Molecular , Genetic Linkage , Mice , Molecular Sequence Data , Phylogeny , Selection, Genetic , Sequence Homology, Amino Acid , Species Specificity
20.
Environ Health Perspect ; 110 Suppl 6: 919-23, 2002 Dec.
Article in English | MEDLINE | ID: mdl-12634120

ABSTRACT

Traditional models of toxicity have relied on dissecting chemical action into pharmacokinetic and pharmacodynamic processes. However, the integration of genomic information with toxicology will enhance our basic understanding of these processes and significantly change the way we apply toxicological information to risk assessment and regulatory problems. In this article, we summarize the application of gene expression information and polymorphism discovery to four areas in toxicology: toxicity testing, cross-species extrapolation, understanding mechanism of action, and susceptibility.


Subject(s)
Gene Expression Regulation , Genomics , Polymorphism, Genetic , Toxicology/trends , Animals , Disease Models, Animal , Environmental Pollutants/adverse effects , Forecasting , Humans , Oligonucleotide Array Sequence Analysis , Toxicity Tests
SELECTION OF CITATIONS
SEARCH DETAIL
...