Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
J Theor Biol ; 240(2): 200-8, 2006 May 21.
Article in English | MEDLINE | ID: mdl-16289610

ABSTRACT

As a powerful tool for gene function prediction, gene fusion has been widely studied in prokaryotes and certain groups of eukaryotes, but it has been little applied in studies of mammalian genomes. With the first fully sequenced mammalian genomes (human, mouse, rat) now available, we defined and collected a set of fusion/fission event-linked segments (FFLS) based on structured organized genomic alignment. The statistics of the sequence features highlighted the FFLSs against their random context. We found that there are three groups of FFLSs with different component pairs (i.e. gene-gene, gene-noncoding and noncoding-noncoding) in all three mammalian genomes. The proteins encoded by the components of FFLSs in the first group shown a strong tendency to interact with each other. The segmental components in the last two groups which did not contain any protein-coding genes, were found not only to be transcribed to some level, but also more conserved than the random background. Thus, these segments are possibly carrying certain biologically functional elements. We propose that FFLS may be a potential tool for prediction and analysis of function and functional interaction of genetic elements, including both genes and noncoding elements, in mammalian genomes. The full list of the FFLSs in the genomes of the three mammals is available as supporting information at doi:10.1016/j.jtbi.2005.09.016.


Subject(s)
DNA , Evolution, Molecular , Genome , Mammals/genetics , Models, Genetic , Animals , Base Composition , Base Sequence , DNA Cleavage , Gene Fusion , Genetic Code , Genome, Human , Genomics , Humans , Mice , Molecular Sequence Data , Rats
2.
Article in English | MEDLINE | ID: mdl-17357475

ABSTRACT

The crystal structural data of TACE, MMP-1, MMP-2, MMP-3 and MMP-9 were obtained from PDB database, and then their catalytic domains' properties including conformation, molecular surface hydrophobicity and electrostatic potential were analyzed and compared by using Insight II molecular modeling software. It was found that the conformation and molecular surface hydrophobicity of catalytic domains of TACE and MMPs were not obviously different, but the molecular surface electrostatic potential of catalytic domain of TACE and MMPs had obvious differences. The findings are helpful in the Rational Drug Design of TACE selective inhibitor.


Subject(s)
ADAM Proteins/chemistry , Matrix Metalloproteinases/chemistry , ADAM17 Protein , Catalytic Domain , Humans
3.
J Hum Genet ; 49(7): 339-348, 2004.
Article in English | MEDLINE | ID: mdl-15173934

ABSTRACT

Y-chromosomes from 76 Chinese men covering 33 ethnic minorities throughout China as well as the Han majority were collected as genetic material for the study of Chinese nonrecombinant Y-chromosome (NRY) phylogeny. Of the accepted worldwide NRY haplogroups, three (haplogroups D, C, O) were significant in this sample, extending previous assessments of Chinese genetic diversity. Based on geographic, linguistic, and ethnohistorical information, the 33 Chinese ethnic minorities in our survey were divided into the following four subgroups: North, Tibet, West, and South. Inferred from the distribution of the newly found immediate ancestor lineage haplogroup O*, which has M214 but not M175, we argue that the southern origin scenario of this most common Chinese Y haplogroup is not very likely. We tentatively propose a West/North-origin hypothesis, suggesting that haplogroup O originated in West/North China and mainly evolved in China and thence spread further throughout eastern Eurasia. The nested cladistic analysis revealed in detail a multilayered, multidirectional, and continuous history of ethnic admixture that has shaped the contemporary Chinese population. Our results give some new clues to the evolution and migration of the Chinese population and its subsequence moving about in this land, which are in accordance with the historical records.


Subject(s)
Biological Evolution , Chromosomes, Human, Y , Emigration and Immigration , China , Genetic Markers , Haplotypes , Humans , Male , Phylogeny
4.
Bioinformatics ; 20(5): 599-603, 2004 Mar 22.
Article in English | MEDLINE | ID: mdl-15033865

ABSTRACT

MOTIVATION: Small RNA (sRNA) genes in Escherichia coli have been in focus recently, as 44 out of 55 experimentally confirmed sRNA genes have been precisely located in the genome. The object of this study is to analyze quantitatively the conservation of these sRNA genes and compare it with the conservation of protein-encoding genes, function-unknown regions and tRNA genes. RESULTS: The results show that within an evolutionary distance of 0.26, both sRNA genes and protein-encoding genes display a similar tendency in their degrees of conservation at the nucleotide level. In addition, the conservation of sRNA genes is much stronger than function-unknown regions, but much weaker than tRNA genes. Based on the conservation of studied sRNA genes, we also give clues to estimate the total number of sRNA genes in E.coli. SUPPLEMENTARY INFORMATION: Supplementary information is available at http://www.bioinfo.org.cn/SM/sRNAconservation.htm


Subject(s)
Conserved Sequence/genetics , Escherichia coli/genetics , Gene Expression Profiling/methods , Genome, Bacterial , RNA, Bacterial/genetics , RNA, Ribosomal, 16S/genetics , Sequence Analysis, RNA/methods , Algorithms , Base Sequence , Evolution, Molecular , Molecular Sequence Data , Phylogeny , Sequence Alignment/methods , Sequence Homology, Nucleic Acid
5.
BMC Infect Dis ; 4: 3, 2004 Feb 06.
Article in English | MEDLINE | ID: mdl-15028113

ABSTRACT

BACKGROUND: A new respiratory infectious epidemic, severe acute respiratory syndrome (SARS), broke out and spread throughout the world. By now the putative pathogen of SARS has been identified as a new coronavirus, a single positive-strand RNA virus. RNA viruses commonly have a high rate of genetic mutation. It is therefore important to know the mutation rate of the SARS coronavirus as it spreads through the population. Moreover, finding a date for the last common ancestor of SARS coronavirus strains would be useful for understanding the circumstances surrounding the emergence of the SARS pandemic and the rate at which SARS coronavirus diverge. METHODS: We propose a mathematical model to estimate the evolution rate of the SARS coronavirus genome and the time of the last common ancestor of the sequenced SARS strains. Under some common assumptions and justifiable simplifications, a few simple equations incorporating the evolution rate (K) and time of the last common ancestor of the strains (T0) can be deduced. We then implemented the least square method to estimate K and T0 from the dataset of sequences and corresponding times. Monte Carlo stimulation was employed to discuss the results. RESULTS: Based on 6 strains with accurate dates of host death, we estimated the time of the last common ancestor to be about August or September 2002, and the evolution rate to be about 0.16 base/day, that is, the SARS coronavirus would on average change a base every seven days. We validated our method by dividing the strains into two groups, which coincided with the results from comparative genomics. CONCLUSION: The applied method is simple to implement and avoid the difficulty and subjectivity of choosing the root of phylogenetic tree. Based on 6 strains with accurate date of host death, we estimated a time of the last common ancestor, which is coincident with epidemic investigations, and an evolution rate in the same range as that reported for the HIV-1 virus.


Subject(s)
Severe Acute Respiratory Syndrome/epidemiology , Severe acute respiratory syndrome-related coronavirus/genetics , China/epidemiology , Chronology as Topic , Disease Outbreaks , Evolution, Molecular , HIV-1/genetics , Humans , Models, Biological , Severe Acute Respiratory Syndrome/virology
6.
J Biol Phys ; 30(4): 305-12, 2004 Jan.
Article in English | MEDLINE | ID: mdl-23345874

ABSTRACT

To describe eukaryotic autosomes quantitatively and determine differences between them in terms of amino acid sequences of genes, functional classification of proteins, and complete DNA sequences, we applied two theoretical methods, the Proteome-vector method and the function of degree of disagreement (FDOD) method, that are based on function and sequence similarity respectively, to autosomes from nine eukaryotes. No matter what aspect of the autosome is considered, the autosomal differences within each organism were less than that between species. Our results show that eukaryotic autosomes resemble each other within a species while those from different organisms differ. We propose a hypothesis (named intra-species autosomal random shuffling) as an explanation for our results and suggest that lateral gene transfer (LGT) did not occur frequently during the evolution of eukarya.

7.
J Protein Chem ; 22(4): 335-44, 2003 May.
Article in English | MEDLINE | ID: mdl-13678297

ABSTRACT

A three-dimensional structure of the human melanocortin 4 receptor (hMC4R) is constructed in this study using a computer-aided molecular modeling approach. Human melanocortin 4 receptor is a G Protein-Coupled Receptor (GPCR). We structurally aligned transmembrane helices with bovine rhodopsin transmembrane domains, simulated both intracellular and extracellular loop domains on homologous loop regions in other proteins of known 3D structure and modeled the C terminus on the corresponding part of bovine rhodopsin. Then tandem minimization and dynamics calculations were run to refine the crude structure. The simulative model was tested by docking with a triplet peptide (RFF) ligand. It was found that the ligand is located among transmembrane regions TM3, TM4, TM5, and TM6 of hMC4R. In consistence with mutational and biochemical data, binding site is mainly formed as a hydrophobic and negatively charged pocket. The model constructed here might provide a structural framework for making rational predictions in relevant fields.


Subject(s)
Computer Simulation , Models, Molecular , Receptor, Melanocortin, Type 4/chemistry , Receptor, Melanocortin, Type 4/metabolism , Amino Acid Sequence , Animals , Binding Sites , Cattle , Humans , Hydrophobic and Hydrophilic Interactions , Ligands , Molecular Sequence Data , Protein Conformation , Rhodopsin/chemistry , Sequence Alignment , Thermodynamics
8.
Nucleic Acids Res ; 31(9): 2443-50, 2003 May 01.
Article in English | MEDLINE | ID: mdl-12711690

ABSTRACT

Interaction detection methods have led to the discovery of thousands of interactions between proteins, and discerning relevance within large-scale data sets is important to present-day biology. Here, a spectral method derived from graph theory was introduced to uncover hidden topological structures (i.e. quasi-cliques and quasi-bipartites) of complicated protein-protein interaction networks. Our analyses suggest that these hidden topological structures consist of biologically relevant functional groups. This result motivates a new method to predict the function of uncharacterized proteins based on the classification of known proteins within topological structures. Using this spectral analysis method, 48 quasi-cliques and six quasi-bipartites were isolated from a network involving 11,855 interactions among 2617 proteins in budding yeast, and 76 uncharacterized proteins were assigned functions.


Subject(s)
Algorithms , Proteins/metabolism , Saccharomycetales/metabolism , Models, Biological , Protein Binding , Proteins/chemistry
9.
Chin Sci Bull ; 48(12): 1175-1178, 2003.
Article in English | MEDLINE | ID: mdl-32214702

ABSTRACT

SARS-CoV, as the pathogeny of severe acute respiratory syndrome (SARS), is a mystery that the origin of the virus is still unknown even a few isolates of the virus were completely sequenced. To explore the genesis of SARS-CoV, the FDOD method previously developed by us was applied to comparing complete genomes from 12 SARS-CoV isolates to those from 12 previously identified coronaviruses and an unrooted phylogenetic tree was constructed. Our results show that all SARS-CoV isolates were clustered into a clique and previously identified coronaviruses formed the other clique. Meanwhile, the three groups of coronaviruses depart from each other clearly in our tree that is consistent with the results of prevenient papers. Differently, from the topology of the phylogenetic tree we found that SARS-CoV is more close to group 1 within genus coronavirus. The topology map also shows that the 12 SARS-CoV isolates may be divided into two groups determined by the association with the SARS-CoV from the Hotel M in Hong Kong that may give some information about the infectious relationship of the SARS.

10.
Mol Phylogenet Evol ; 25(1): 101-11, 2002 Oct.
Article in English | MEDLINE | ID: mdl-12383754

ABSTRACT

A 17-dimensional vector named the proteome vector is defined to represent an organism. The components of the vector reflect the relative contents of protein-encoding genes of the 17 cluster of orthologous groups of proteins (COGs) classes in the whole genome of the relevant organism. Based on the definition of this proteome vector, the fuzzy clustering of 36 completely sequenced organisms (8 archaea, 24 bacteria, and 4 eukarya) was performed and a proteome tree was constructed. Our results show that (1) the 36 organisms can be 100% correctly classified into three clusters corresponding to the three primary kingdoms, (2) our proteome tree is remarkably similar to that derived from 16S rRNA, and (3) the chromosomes and/or plasmids belonging to the same organism have very similar gene composition. Based on these results, we argue that the 17-dimensional proteome vector could be a good criterion for clustering approaches and to a large extent reveals the phylogenetic properties of organisms; the Three Primary Kingdoms Hypothesis is trustworthy although the existence of lateral gene transfer (LGT) brings controversy to the construction of the "universal tree of life."


Subject(s)
Phylogeny , Proteins/genetics , Algorithms , Animals , Archaea/genetics , Bacteria/genetics , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Models, Genetic , Proteome/analysis , Saccharomyces cerevisiae/genetics
11.
Yi Chuan Xue Bao ; 29(5): 377-83, 2002 May.
Article in Chinese | MEDLINE | ID: mdl-12043562

ABSTRACT

We have identified and characterized a novel human serine-arginine-rich (SR) splicing regulatory protein 508 (SRrp508) gene that is related to other members of the growing SR superfamily, but only homologous to rat (Rattus norvegicus) serine-arginine-rich splicing regulatory protein 86 (SRrp86) gene. The full-length cDNA of 3811 bp for human SRrp508 was cloned through a blast search of public databases following the identification of a cDNA contig of 658 bp obtained by EST assembly with full robotization in supercomputer in large-scale. Structurally, human SRrp508 encodes a polypeptide of 508 amino acids, which contains a single amino-terminal RNA recognition motif (RRM) and two carboxy-terminal domains rich in serine-arginine dipeptides that are highly conserved among other members of the SR superfamily. The conserved SR and RRM domains emphasize the biological importance of this gene. The SRrp508 gene, which contains 12 exons ranging from 0.096 to 2.093 kb and 11 introns ranging from 0.14 to 5.153 kb, is mapped to the human cytogenetic region 5q11.2-q12.1 using the bioinformatic analysis, and it does not link to any other genes. Furthermore, we have experimentally cloned and sequenced a cDNA fragment of 1680 bp containing the full-length ORF of 1527 bp in this novel human gene by RT-PCR from the single-stranded human pancreas cDNA library (Clontech), which is fully identical with that of the in silico cloning determined by the nucleotide sequencing. Thus, we in silico cloned his gene with GenBank accession number of AF459094 identified solely by bioinformatic analysis of the nucleotide and protein. This novel gene has promotors, TATA-box, several stop codons in the upstream of ORF, and PolyA signal in the downstream of ORF. Based on the above results, it can be concluded that we have obtained a complete novel human gene. The gene sequence exhibits good overall homology to that of rat SRrp86 gene, with 84% and 86% identity over the full-length nucleotide and protein, respectively, and with 96% and 86% identity over the serine-rich domain (RS) or arginine-rich domain (RA), respectively. The full-length sequence exhibits little overall homology to any other known protein at either the nucleotide or the amino acid level. The other two most closely related proteins, with 34% and 35% identity over the full-length protein, respectively, or with 51% and 54% identity over the full-length nucleotide of ORF, respectively, are drosophila serine-arginine-rich protein 54 (SRp54) and human arginine-rich nuclear protein 54 (p54). When comparisons are restricted to the RS or RA domains, the percent identity increased for both SRp54 and p54 are 44% and 54% or 38% and 43%, respectively. These results well demonstrate that only the novel human protein of 508 amino acids cloned is the human homolog of rat SRrp86, thus correcting the standpoint made by Barnard and Patton (Barnard DC, Patton JG. Identification and Characterization of a Novel Serine-Arginine-Rich Splicing Regulatory Protein. Molecular and Cellular Biology, 2000, 20(9): 3049-3057) that human arginine-rich nuclear protein 54 (p54) is the human homolog of the rat SRrp86, and suggesting that human SRrp508 is a new member of this growing superfamily of SR proteins. SRrp508 has an extensive expression profile, and may be a transcriptional factor. On the basis of its sequence and functional properties, we have named this protein SRrp508 for SR-related splicing regulatory protein of 508 amino acids. In summary, by combining bioinformatic analysis with experimental verification, we have successfully cloned the human cDNA homolog of rat SRrp86, which is verified by a series of theoretical and experimental evidence. The HGNC has just given SRrp508 gene entry the nomenclature information containing APPROVED SYMBOL: SFRS12; NAME: splicing factor, arginine/serine-rich 12; and ALIAS: DKFZp564B176, SRrp86. We have cloned this gene for near one year with no person landing the GenBank for registering the same gene. Our newly-established technique line will be helpful in discovering much more novel human genes.


Subject(s)
Chromosomes, Human, Pair 5/genetics , RNA-Binding Proteins/genetics , Amino Acid Sequence , Animals , Chromosome Mapping , Cloning, Molecular , DNA, Complementary/chemistry , DNA, Complementary/genetics , Genes/genetics , Humans , Molecular Sequence Data , Nuclear Proteins , Phylogeny , RNA-Binding Proteins/metabolism , Rats , Sequence Alignment , Sequence Analysis, DNA , Sequence Homology, Amino Acid , Serine-Arginine Splicing Factors
12.
Genome Res ; 12(5): 689-700, 2002 May.
Article in English | MEDLINE | ID: mdl-11997336

ABSTRACT

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4(T) (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the rest, 824 CDS (31.8%), are functionally unknown. One of the interesting features of the T. tengcongensis genome is that 86.7% of its genes are encoded on the leading strand of DNA replication. Based on protein sequence similarity, the T. tengcongensis genome is most similar to that of Bacillus halodurans, a mesophilic eubacterium, among all fully sequenced prokaryotic genomes up to date. Computational analysis on genes involved in basic metabolic pathways supports the experimental discovery that T. tengcongensis metabolizes sugars as principal energy and carbon source and utilizes thiosulfate and element sulfur, but not sulfate, as electron acceptors. T. tengcongensis, as a gram-negative rod by empirical definitions (such as staining), shares many genes that are characteristics of gram-positive bacteria whereas it is missing molecular components unique to gram-negative bacteria. A strong correlation between the G + C content of tDNA and rDNA genes and the optimal growth temperature is found among the sequenced thermophiles. It is concluded that thermophiles are a biologically and phylogenetically divergent group of prokaryotes that have converged to sustain extreme environmental conditions over evolutionary timescale.


Subject(s)
Bacillaceae/genetics , Genome, Bacterial , Bacillaceae/cytology , Bacillaceae/metabolism , Bacillaceae/physiology , Base Composition/genetics , Codon/genetics , DNA Repair/genetics , DNA Replication/genetics , GC Rich Sequence/genetics , Genes, Bacterial/genetics , Genomics/methods , Hot Temperature , Ion Transport/genetics , Molecular Sequence Data , Oxygen Consumption/genetics , Protein Biosynthesis/genetics , Recombination, Genetic/genetics , Repetitive Sequences, Nucleic Acid/genetics , Replication Origin/genetics , Sequence Analysis, DNA/methods , Sulfur/metabolism , Transcription, Genetic
13.
J Biol Phys ; 28(1): 55-62, 2002 Mar.
Article in English | MEDLINE | ID: mdl-23345757

ABSTRACT

With the development of genome sequencing more whole genomes of microorganisms were completed, many methods wereintroduced to reconstruct the phylogenetic tree of those microorganismswith the information extracted from the whole genomes through variousways of transforming or mapping the whole genome sequences into otherforms which can describe the evolutionary distance in a new way. We thinkit might be possible that there exists information buried in the wholegenome transferred along lineage, which remains stable and is moreessential than sequence conservation of individual genes or the arrangementof some genes of a selected set. We need to find one measurement that caninvolve as many phylogenetic features as possible that are beyond thegenome sequence itself. We converted each genome sequence of themicroorganisms into another linear sequence to represent the functionalstructure of the sequence, and we used a new information function tocalculate the discrepancy of sequences and to get one distance matrix of thegenomes, and built one phylogenetic tree with a neighbor joining method.The resulting tree shows that the major lineages are consistent with theresult based on their 16srRNA sequences. Our method discovered onephylogenetic feature derived from the genome sequences and the encodedgenes that can rebuild the phylogenetic tree correctly. The mapping of onegenome sequence to its new form representing the relative positions of thefunctional genes provides a new way to measure the phylogeneticrelationships, and with the more specific classification of gene functions theresult could be more sensitive.

SELECTION OF CITATIONS
SEARCH DETAIL
...