Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Nat Genet ; 37(7): 683-91, 2005 Jul.
Article in English | MEDLINE | ID: mdl-15937480

ABSTRACT

The Human Genome Project and its spin-offs are making it increasingly feasible to determine the genetic basis of complex traits using genome-wide association studies. The statistical challenge of analyzing such studies stems from the severe multiple-comparison problem resulting from the analysis of thousands of SNPs. Our methodology for genome-wide family-based association studies, using single SNPs or haplotypes, can identify associations that achieve genome-wide significance. In relation to developing guidelines for our screening tools, we determined lower bounds for the estimated power to detect the gene underlying the disease-susceptibility locus, which hold regardless of the linkage disequilibrium structure present in the data. We also assessed the power of our approach in the presence of multiple disease-susceptibility loci. Our screening tools accommodate genomic control and use the concept of haplotype-tagging SNPs. Our methods use the entire sample and do not require separate screening and validation samples to establish genome-wide significance, as population-based designs do.


Subject(s)
Genetic Predisposition to Disease , Linkage Disequilibrium , Pedigree , Asthma/genetics , Computer Simulation , Genome, Human , Haplotypes , Humans , Interleukin-10/genetics , Polymorphism, Single Nucleotide , Software
2.
BMC Genet ; 6: 7, 2005 Feb 15.
Article in English | MEDLINE | ID: mdl-15713228

ABSTRACT

BACKGROUND: The identification of disease-associated genes using single nucleotide polymorphisms (SNPs) has been increasingly reported. In particular, the Affymetrix Mapping 10 K SNP microarray platform uses one PCR primer to amplify the DNA samples and determine the genotype of more than 10,000 SNPs in the human genome. This provides the opportunity for large scale, rapid and cost-effective genotyping assays for linkage analysis. However, the analysis of such datasets is nontrivial because of the large number of markers, and visualizing the linkage scores in the context of genome maps remains less automated using the current linkage analysis software packages. For example, the haplotyping results are commonly represented in the text format. RESULTS: Here we report the development of a novel software tool called CompareLinkage for automated formatting of the Affymetrix Mapping 10 K genotype data into the "Linkage" format and the subsequent analysis with multi-point linkage software programs such as Merlin and Allegro. The new software has the ability to visualize the results for all these programs in dChip in the context of genome annotations and cytoband information. In addition we implemented a variant of the Lander-Green algorithm in the dChipLinkage module of dChip software (V1.3) to perform parametric linkage analysis and haplotyping of SNP array data. These functions are integrated with the existing modules of dChip to visualize SNP genotype data together with LOD score curves. We have analyzed three families with recessive and dominant diseases using the new software programs and the comparison results are presented and discussed. CONCLUSIONS: The CompareLinkage and dChipLinkage software packages are freely available. They provide the visualization tools for high-density oligonucleotide SNP array data, as well as the automated functions for formatting SNP array data for the linkage analysis programs Merlin and Allegro and calling these programs for linkage analysis. The results can be visualized in dChip in the context of genes and cytobands. In addition, a variant of the Lander-Green algorithm is provided that allows parametric linkage analysis and haplotyping.


Subject(s)
Genetic Linkage , Oligonucleotides/genetics , Polymorphism, Single Nucleotide , Software , Family Health , Genotype , Haplotypes , Humans , Lod Score
3.
Am J Hum Genet ; 75(6): 948-65, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15514889

ABSTRACT

Prostate cancer is one of the most common cancers among men and has long been recognized to occur in familial clusters. Brothers and sons of affected men have a 2-3-fold increased risk of developing prostate cancer. However, identification of genetic susceptibility loci for prostate cancer has been extremely difficult. Although the suggestion of linkage has been reported for many chromosomes, the most promising regions have been difficult to replicate. In this study, we compare genome linkage scans using microsatellites with those using single-nucleotide polymorphisms (SNPs), performed in 467 men with prostate cancer from 167 families. For the microsatellites, the ABI Prism Linkage Mapping Set version 2, with 402 microsatellite markers, was used, and, for the SNPs, the Early Access Affymetrix Mapping 10K array was used. Our results show that the presence of linkage disequilibrium (LD) among SNPs can lead to inflated LOD scores, and this seems to be an artifact due to the assumption of linkage equilibrium that is required by the current genetic-linkage software. After excluding SNPs with high LD, we found a number of new LOD-score peaks with values of at least 2.0 that were not found by the microsatellite markers: chromosome 8, with a maximum model-free LOD score of 2.2; chromosome 2, with a LOD score of 2.1; chromosome 6, with a LOD score of 4.2; and chromosome 12, with a LOD score of 3.9. The LOD scores for chromosomes 6 and 12 are difficult to interpret, because they occurred only at the extreme ends of the chromosomes. The greatest gain provided by the SNP markers was a large increase in the linkage information content, with an average information content of 61% for the SNPs, versus an average of 41% for the microsatellite markers. The strengths and weaknesses of microsatellite versus SNP markers are illustrated by the results of our genome linkage scans.


Subject(s)
Genetic Linkage/genetics , Genetic Predisposition to Disease/genetics , Genetic Testing/methods , Microsatellite Repeats/genetics , Polymorphism, Single Nucleotide/genetics , Prostatic Neoplasms/genetics , Chromosome Mapping , Humans , Lod Score , Male
4.
Genomics ; 84(4): 623-30, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15475239

ABSTRACT

Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the "dose-response" reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed. The dose-response reference curve was monotone and almost linear with a large slope. The method was able to estimate accurately the error rate under various pedigree structures and error models and under heterogeneous error rates. Using this method, we found that the average genotyping error rate of the GeneChip Mapping 10K array was about 0.1%. Our method provides a quick and unbiased solution to address the genotype error rate in pedigree data. It behaves well in a wide range of settings and can be easily applied in other genetic projects. The robust estimation of genotyping error rate allows us to estimate power and sample size and conduct unbiased genetic tests. The GeneChip Mapping 10K array has a low overall error rate, which is consistent with the results obtained from alternative genotyping assays.


Subject(s)
Polymorphism, Single Nucleotide , Recombination, Genetic , Computer Simulation , Evaluation Studies as Topic , Female , Genetic Linkage , Genotype , Humans , Likelihood Functions , Male , Oligonucleotide Array Sequence Analysis , Pedigree
5.
Eur J Hum Genet ; 12(12): 1001-6, 2004 Dec.
Article in English | MEDLINE | ID: mdl-15367915

ABSTRACT

Population-based association design is often compromised by false or nonreplicable findings, partially due to population stratification. Genomic control (GC) approaches were proposed to detect and adjust for this confounder. To date, the performance of this strategy has not been extensively evaluated on real data. More than 10 000 single-nucleotide polymorphisms (SNPs) were genotyped on subjects from four populations (including an Asian, an African-American and two Caucasian populations) using GeneChip Mapping 10 K array. On these data, we tested the performance of two GC approaches in different scenarios including various numbers of GC markers and different degrees of population stratification. In the scenario of substantial population stratification, both GC approaches are sensitive using only 20-50 random SNPs, and the mixed subjects can be separated into homogeneous subgroups. In the scenario of moderate stratification, both GC approaches have poor sensitivities. However, the bias in association test can still be corrected even when no statistical significant population stratification is detected. We conducted extensive benchmark analyses on GC approaches using SNPs over the whole human genome. We found GC method can cluster subjects to homogeneous subgroups if there is a substantial difference in genetic background. The inflation factor, estimated by GC markers, can effectively adjust for the confounding effect of population stratification regardless of its extent. We also suggest that as low as 50 random SNPs with heterozygosity >40% should be sufficient as genomic controls.


Subject(s)
Chromosome Mapping , Chromosomes, Human , Genetics, Population , Genetic Markers , Genotype , Humans , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide
6.
Mol Microbiol ; 50(4): 1111-24, 2003 Nov.
Article in English | MEDLINE | ID: mdl-14622403

ABSTRACT

Hfq, a bacterial member of the Sm family of RNA-binding proteins, is required for the action of many small regulatory RNAs that act by basepairing with target mRNAs. Hfq binds this family of small RNAs efficiently. We have used co-immunoprecipitation with Hfq and direct detection of the bound RNAs on genomic microarrays to identify members of this small RNA family. This approach was extremely sensitive; even Hfq-binding small RNAs expressed at low levels were readily detected. At least 15 of 46 known small RNAs in E. coli interact with Hfq. In addition, high signals in other intergenic regions suggested up to 20 previously unidentified small RNAs bind Hfq; five were confirmed by Northern analysis. Strong signals within genes and operons also were detected, some of which correspond to known Hfq targets. Within the argX-hisR-leuT-proM operon, Hfq appears to compete with RNase E and modulate RNA processing and degradation. Thus Hfq immunoprecipitation followed by microarray analysis is a highly effective method for detecting a major class of small RNAs as well as identifying new Hfq functions.


Subject(s)
Escherichia coli Proteins/metabolism , Host Factor 1 Protein/metabolism , RNA, Bacterial/metabolism , RNA, Messenger/metabolism , Escherichia coli/genetics , Escherichia coli/metabolism , Oligonucleotide Array Sequence Analysis , Operon , Precipitin Tests , Protein Binding
7.
Genome Res ; 13(2): 216-23, 2003 Feb.
Article in English | MEDLINE | ID: mdl-12566399

ABSTRACT

Subgenic-resolution oligonucleotide microarrays were used to study global RNA degradation in wild-type Escherichia coli MG1655. RNA chemical half-lives were measured for 1036 open reading frames (ORFs) and for 329 known and predicted operons. The half-life of total mRNA was 6.8 min under the conditions tested. We also observed significant relationships between gene functional assignments and transcript stability. Unexpectedly, transcription of a single operon (tdcABCDEFG) was relatively rifampicin-insensitive and showed significant increases 2.5 min after rifampicin addition. This supports a novel mechanism of transcription for the tdc operon, whose promoter lacks any recognizable sigma binding sites. Probe by probe analysis of all known and predicted operons showed that the 5' ends of operons degrade, on average, more quickly than the rest of the transcript, with stability increasing in a 3' direction, supporting and further generalizing the current model of a net 5' to 3' directionality of degradation. Hierarchical clustering analysis of operon degradation patterns revealed that this pattern predominates but is not exclusive. We found a weak but highly significant correlation between the degradation of adjacent operon regions, suggesting that stability is determined by a combination of local and operon-wide stability determinants. The 16 ORF dcw gene cluster, which has a complex promoter structure and a partially characterized degradation pattern, was studied at high resolution, allowing a detailed and integrated description of its abundance and degradation. We discuss the application of subgenic resolution DNA microarray analysis to study global mechanisms of RNA transcription and processing.


Subject(s)
Escherichia coli/genetics , Gene Expression Profiling/methods , Genome, Bacterial , RNA Stability/genetics , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , Transcription, Genetic/genetics , 3' Untranslated Regions/genetics , 5' Untranslated Regions/genetics , Drug Resistance, Microbial/genetics , Gene Expression Profiling/trends , Multigene Family/genetics , Operon/genetics , Promoter Regions, Genetic/drug effects , Rifampin/metabolism , Time Factors
8.
Nucleic Acids Res ; 30(17): 3732-8, 2002 Sep 01.
Article in English | MEDLINE | ID: mdl-12202758

ABSTRACT

Microarrays traditionally have been used to analyze the expression behavior of large numbers of coding transcripts. Here we present a comprehensive approach for high-throughput transcript discovery in Escherichia coli focused mainly on intergenic regions which, together with analysis of coding transcripts, provides us with a more complete insight into the organism's transcriptome. Using a whole genome array, we detected expression for 4052 coding transcripts and identified 1102 additional transcripts in the intergenic regions of the E.coli genome. Further classification reveals 317 novel transcripts with unknown function. Our results show that, despite sophisticated approaches to genome annotation, many cellular transcripts remain unidentified. Through the experimental identification of all RNAs expressed under a specific condition, we gain a more thorough understanding of all cellular processes.


Subject(s)
Escherichia coli/genetics , Oligonucleotide Array Sequence Analysis/methods , Transcription, Genetic/genetics , 3' Untranslated Regions/genetics , 5' Untranslated Regions/genetics , Gene Expression Regulation, Bacterial , Operon/genetics , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , Reverse Transcriptase Polymerase Chain Reaction
9.
Bioinformatics ; 18 Suppl 1: S337-44, 2002.
Article in English | MEDLINE | ID: mdl-12169564

ABSTRACT

Microarrays traditionally have been used to assay the transcript expression of coding regions of genes. Here, we use Escherichia coli oligonucleotide microarrays to assay transcript expression of both open reading frames (ORFs) and intergenic regions. We then use hidden Markov models to analyse this expression data and estimate transcription boundaries of genes. This approach allows us to identify 5' untranslated regions (5' UTRs) of transcripts as well as genes that are likely to be operon members. The operon elements we identify correspond to documented operons with 99% specificity and 63% sensitivity. Similarly we find that our 5' UTR results accurately coincide with experimentally verified promoter regions for most genes.


Subject(s)
5' Untranslated Regions/genetics , Escherichia coli/genetics , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Operon/genetics , RNA, Bacterial/genetics , Transcription Factors/genetics , Algorithms , Base Sequence , Chromosome Mapping/methods , Gene Expression Regulation, Bacterial/genetics , Genome, Bacterial , Models, Genetic , Models, Statistical , Molecular Sequence Data , Prokaryotic Cells , Sequence Analysis, DNA/methods , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...