Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters











Publication year range
2.
Proc Natl Acad Sci U S A ; 101(26): 9734-9, 2004 Jun 29.
Article in English | MEDLINE | ID: mdl-15210992

ABSTRACT

Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which regions of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase region containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and difference. We show that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure.


Subject(s)
Genetic Variation/genetics , Haplotypes/genetics , Mice, Inbred Strains/genetics , Alleles , Animals , Anxiety/genetics , Mice , Microsatellite Repeats/genetics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA
3.
Genome Res ; 11(12): 1996-2008, 2001 Dec.
Article in English | MEDLINE | ID: mdl-11731489

ABSTRACT

Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.


Subject(s)
Drosophila Proteins/chemistry , Drosophila Proteins/physiology , Drosophila melanogaster/chemistry , Evolution, Molecular , Eye Proteins , Glycoproteins , Repetitive Sequences, Amino Acid , Amino Acid Sequence/genetics , Amino Acid Transport Systems, Neutral , Animals , Antigens, Differentiation, B-Lymphocyte/chemistry , Antigens, Differentiation, B-Lymphocyte/genetics , Antigens, Differentiation, B-Lymphocyte/physiology , Aspartate-tRNA Ligase/chemistry , Aspartate-tRNA Ligase/genetics , Aspartate-tRNA Ligase/physiology , Cystinosis/genetics , Drosophila Proteins/genetics , Drosophila melanogaster/enzymology , Drosophila melanogaster/genetics , Exons/genetics , GTP-Binding Proteins , Gene Duplication , Glycoside Hydrolases/chemistry , Glycoside Hydrolases/genetics , Glycoside Hydrolases/physiology , Histocompatibility Antigens Class II/chemistry , Histocompatibility Antigens Class II/genetics , Histocompatibility Antigens Class II/physiology , Humans , Insect Proteins/chemistry , Insect Proteins/genetics , Insect Proteins/physiology , Intracellular Signaling Peptides and Proteins , Membrane Proteins/chemistry , Membrane Proteins/genetics , Membrane Proteins/physiology , Membrane Transport Proteins , Molecular Sequence Data , Muscular Dystrophies/genetics , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , Proteins/physiology , Retinitis Pigmentosa/genetics , Signal Transduction/genetics , Species Specificity , Tandem Repeat Sequences
4.
Protein Sci ; 10(2): 285-92, 2001 Feb.
Article in English | MEDLINE | ID: mdl-11266614

ABSTRACT

Sequence similarity is the most common measure currently used to infer homology between proteins. Typically, homologous protein domains show sequence similarity over their entire lengths. Here we identify Asp box motifs, initially found as repeats in sialidases and neuraminidases, in new structural and sequence contexts. These motifs represent significantly similar sequences, localized to beta hairpins within proteins that are otherwise different in sequence and three-dimensional structure. By performing a combined sequence- and structure-based analysis we detect Asp boxes in more than nine protein families, including bacterial ribonucleases, sulfite oxidases, reelin, netrins, some lipoprotein receptors, and a variety of glycosyl hydrolases. Although the function common to each of these proteins, if any, remains unclear, we discuss possible functions of Asp boxes on the basis of previously determined experimental results and discuss different evolutionary scenarios for the origin of Asp-box containing proteins.


Subject(s)
Aspartic Acid/chemistry , Neuraminidase/chemistry , Acetylglucosaminidase/chemistry , Amino Acid Motifs , Amino Acid Sequence , Databases, Factual , Evolution, Molecular , Models, Molecular , Molecular Sequence Data , Protein Folding , Protein Structure, Tertiary , Ribonucleases/chemistry , Sequence Homology, Amino Acid , Water/chemistry
5.
Nature ; 408(6810): 331-6, 2000 Nov 16.
Article in English | MEDLINE | ID: mdl-11099034

ABSTRACT

Genome sequencing projects generate a wealth of information; however, the ultimate goal of such projects is to accelerate the identification of the biological function of genes. This creates a need for comprehensive studies to fill the gap between sequence and function. Here we report the results of a functional genomic screen to identify genes required for cell division in Caenorhabditis elegans. We inhibited the expression of approximately 96% of the approximately 2,300 predicted open reading frames on chromosome III using RNA-mediated interference (RNAi). By using an in vivo time-lapse differential interference contrast microscopy assay, we identified 133 genes (approximately 6%) necessary for distinct cellular processes in early embryos. Our results indicate that these genes represent most of the genes on chromosome III that are required for proper cell division in C. elegans embryos. The complete data set, including sample time-lapse recordings, has been deposited in an open access database. We found that approximately 47% of the genes associated with a differential interference contrast phenotype have clear orthologues in other eukaryotes, indicating that this screen provides putative gene functions for other species as well.


Subject(s)
Caenorhabditis elegans/genetics , Cell Division/genetics , Genes, Helminth , RNA, Helminth , Animals , Caenorhabditis elegans/physiology , Chromosomes , Genomics , Open Reading Frames
6.
J Mol Biol ; 303(4): 627-41, 2000 Nov 03.
Article in English | MEDLINE | ID: mdl-11054297

ABSTRACT

We provide statistically reliable sequence evidence indicating that at least 12 of 23 SCOP (betaalpha)(8) (TIM) barrel superfamilies share a common origin. This includes all but one of the known and predicted TIM barrels found in central metabolism. The statistical evidence is complemented by an examination of the details of protein structure, with certain structural locations favouring catalytic residues even though the nature of their molecular function may change. The combined analysis of sequence, structure and function also enables us to propose a phylogeny of TIM barrels. Based on these data, we are able to examine differing theories of pathway and enzyme evolution, by mapping known TIM barrel folds to the pathways of central metabolism. The results favour widespread recruitment of enzymes between pathways, rather than a "backwards evolution" model, and support the idea that modern proteins may have arisen from common ancestors that bound key metabolites.


Subject(s)
Enzymes/chemistry , Enzymes/metabolism , Evolution, Molecular , Protein Structure, Tertiary , Aldehyde-Lyases/chemistry , Aldehyde-Lyases/metabolism , Amino Acid Sequence , Animals , Binding Sites , Computational Biology , Databases as Topic , Humans , Models, Molecular , Molecular Sequence Data , Multigene Family , Phosphates/metabolism , Phosphopyruvate Hydratase/chemistry , Phosphopyruvate Hydratase/metabolism , Phylogeny , Protein Structure, Secondary , Pyruvate Kinase/chemistry , Pyruvate Kinase/metabolism , Sequence Alignment , Sequence Homology, Amino Acid
7.
Nat Genet ; 25(2): 201-4, 2000 Jun.
Article in English | MEDLINE | ID: mdl-10835637

ABSTRACT

Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.


Subject(s)
Computational Biology/methods , Expressed Sequence Tags , Proteins/genetics , Proteins/metabolism , Signal Transduction , Amino Acid Sequence , Automation , Catalytic Domain , Cloning, Molecular/methods , Databases, Factual , Genome, Human , Humans , Internet , Molecular Sequence Data , Monomeric GTP-Binding Proteins/chemistry , Monomeric GTP-Binding Proteins/genetics , Monomeric GTP-Binding Proteins/metabolism , Protein Structure, Tertiary , Proteins/chemistry , Sequence Alignment , Sequence Homology, Amino Acid , Software
9.
Nucleic Acids Res ; 28(1): 231-4, 2000 Jan 01.
Article in English | MEDLINE | ID: mdl-10592234

ABSTRACT

SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures (http://SMART.embl-heidelberg.de ). More than 400 domain families found in signalling, extra-cellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.


Subject(s)
Database Management Systems , Internet , Sequence Alignment , Information Storage and Retrieval , Proteins/chemistry
10.
Proteins ; Suppl 3: 141-8, 1999.
Article in English | MEDLINE | ID: mdl-10526363

ABSTRACT

We applied a succession of sequence search and structure prediction methods to the targets in the fold recognition part of the CASP3 experiment. For each target, we expanded an initial sequence space, obtained through PSI-BLAST, by searching for statistically significant relationships to low-scoring sequences and then by searching for conserved sequence patterns. We then divided the proteins in the sequence space into families and built an alignment hierarchically, using the multiple alignment program MACAW. If no significant similarity to a protein of known structure was apparent at this point, we submitted the alignment to the Jpred server for consensus secondary structure prediction and searched the structure space using the secondary structure mapping program MAP. Failing this, we compared the structural properties that we believed we recognized in the aligned proteins to the folds in the SCOP database, using visual inspection. If all these methods failed to uncover a plausible match, we predicted that the target would adopt a novel fold. This procedure yielded correct answers for seven of twenty-one targets and a partly correct answer for one. A retrospective analysis shows that automating the sequence search procedures would have represented a significant improvement, with at least three additional correct predictions.


Subject(s)
Protein Folding , Protein Structure, Secondary , Proteins/chemistry , Algorithms , Amino Acid Sequence , Bacterial Proteins/chemistry , Models, Molecular , Molecular Sequence Data , Sequence Alignment
12.
Curr Opin Struct Biol ; 9(3): 408-15, 1999 Jun.
Article in English | MEDLINE | ID: mdl-10361098

ABSTRACT

The complete sequence of the nematode worm Caenorhabditis elegans contains the genetic machinery that is required to undertake the core biological processes of single cells. However, the genome also encodes proteins that are associated with multicellularity, as well as others that are lineage-specific expansions of phylogenetically widespread families and yet more that are absent in non-nematodes. Ongoing analysis is beginning to illuminate the similarities and differences among human proteins and proteins that are encoded by the genomes of the multicellular worm and the unicellular yeast, and will be essential in determining the reliability of transferring experimental data among phylogenetically distant species.


Subject(s)
Multigene Family , Proteins/chemistry , Proteins/genetics , Animals , Conserved Sequence , Genome , Humans , Intracellular Fluid/physiology , Phylogeny , Proteins/physiology , Signal Transduction/genetics
15.
J Mol Biol ; 259(3): 349-65, 1996 Jun 14.
Article in English | MEDLINE | ID: mdl-8676374

ABSTRACT

A strategy is presented for protein fold recognition from secondary structure assignments (alpha-helix and beta-strand). The method can detect similarities between protein folds in the absence of sequence similarity. Secondary structure mapping first identifies all possible matches (maps) between a query string of secondary structures and the secondary structures of protein domains of known three-dimensional structure. The maps are then passed through a series of structural filters to remove those that do not obey simple rules of protein structure. The surviving maps are ranked by scores from the alignment of predicted and experimental accessibilities. Searches made with secondary structure assignments for a test set of 11 fold-families put the correct sequence-dissimilar fold in the first rank 8/11 times. With cross-validated predictions of secondary structure this drops to 4/11 which compares favourably with the widely used THREADER program (1/11). The structural class is correctly predicted 10/11 times by the method in contrast to 5/11 for THREADER. The new technique obtains comparable accuracy in the alignment of amino acid residues and secondary structure elements. Searches are also performed with published secondary structure predictions for the von-Willebrand factor type A domain, the proteasome 20 S alpha subunit and the phosphotyrosine interaction domain. These searches demonstrate how the method can find the correct fold for a protein from a carefully constructed secondary structure prediction, multiple sequence alignment and distant restraints. Scans with experimentally determined secondary structures and accessibility, recognise the correct fold with high alignment accuracy (86% on secondary structures). This suggests that the accuracy of mapping will improve alongside any improvements in the prediction of secondary structure or accessibility. Application to NMR structure determination is also discussed.


Subject(s)
Models, Molecular , Protein Folding , Protein Structure, Secondary , Algorithms , Amino Acid Sequence , Cysteine Endopeptidases/chemistry , Molecular Sequence Data , Multienzyme Complexes/chemistry , Phosphotyrosine/metabolism , Proteasome Endopeptidase Complex , Proteins/chemistry , Proteins/metabolism , Sequence Alignment/methods , Software , von Willebrand Factor/chemistry
16.
J Mol Biol ; 242(4): 321-9, 1994 Sep 30.
Article in English | MEDLINE | ID: mdl-7932692

ABSTRACT

The high resolution X-ray structures of 38 proteins that bind phosphate containing groups and 36 proteins binding sulphate ions were analysed to characterise the structural features of anion binding sites in proteins. 34 of the 66 phosphates found were in close proximity to the amino terminus of an alpha-helix. 27% of phosphate groups bind to only one amino acid, but there is a wide distribution, with 3% of phosphates binding to seven residues. Similarly, there is a large variability in the number of contacts each phosphate group makes to the protein. This ranges from none (3% of phosphates) to nine (3% of phosphates). The most common number of contacts is two (23% of phosphates). The most commonly found residue at helix-type binding sites is glycine, followed by Arg, Thr, Ser and Lys. At non-helix binding sites, the most commonly found residue is Arg followed by Tyr, His, Lys and Ser. There is no typical phosphate binding site. There are marked differences between propensities for phosphate binding at helix and non-helix type binding sites. Non-helix binding sites show more discrimination between the types of residues involved in binding when compared to the helix set. The propensities for binding of the amino acids reveal the expected trend of positively charged and polar residues being good at binding (although that for lysine is unexpectedly low) with the bulky non-polar residues being poor at binding. Bulky residues are less likely to bind with the amide nitrogen. Sulphate binding sites show similar trends. Analysis of multiple sequence alignments that include phosphate and sulphate binding proteins reveals the degree of conservation at the binding site residues compared to the average conservation of residues in the protein. Phosphate binding site residues are more conserved than sulphate binding sites.


Subject(s)
Phosphates/metabolism , Protein Binding , Sulfates/metabolism , Crystallography, X-Ray , Protein Conformation
SELECTION OF CITATIONS
SEARCH DETAIL