Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
2.
Plant Genome ; 16(2): e20317, 2023 06.
Article in English | MEDLINE | ID: mdl-36896476

ABSTRACT

Fully understanding traditional Chinese medicines (TCMs) is still challenging because of the extreme complexity of their chemical components and mechanisms of action. The TCM Plant Genome Project aimed to obtain genetic information, determine gene functions, discover regulatory networks of herbal species, and elucidate the molecular mechanisms involved in the disease prevention and treatment, thereby accelerating the modernization of TCMs. A comprehensive database that contains TCM-related information will provide a vital resource. Here, we present an integrative genome database of TCM plants (IGTCM) that contains 14,711,220 records of 83 annotated TCM-related herb genomes, including 3,610,350 genes, 3,534,314 proteins and corresponding coding sequences, and 4,032,242 RNAs, as well as 1033 non-redundant component records for 68 herbs, downloaded and integrated from the GenBank and RefSeq databases. For minimal interconnectivity, each gene, protein, and component was annotated using the eggNOG-mapper tool and Kyoto Encyclopedia of Genes and Genomes database to acquire pathway information and enzyme classifications. These features can be linked across several species and different components. The IGTCM database also provides visualization and sequence similarity search tools for data analyses. These annotated herb genome sequences in IGTCM database are a necessary resource for systematically exploring genes related to the biosynthesis of compounds that have significant medicinal activities and excellent agronomic traits that can be used to improve TCM-related varieties through molecular breeding. It also provides valuable data and tools for future research on drug discovery and the protection and rational use of TCM plant resources. The IGTCM database is freely available at http://yeyn.group:96/.


Subject(s)
Drugs, Chinese Herbal , Medicine, Chinese Traditional , Drugs, Chinese Herbal/chemistry , Drugs, Chinese Herbal/pharmacology , Drugs, Chinese Herbal/therapeutic use
3.
Front Endocrinol (Lausanne) ; 13: 882279, 2022.
Article in English | MEDLINE | ID: mdl-36176465

ABSTRACT

Background: This study aimed to establish and validate an accurate prognostic model, based on demographic and clinical parameters, for predicting the cancer-specific survival (CSS) of patients with poorly differentiated thyroid carcinoma (PDTC). Materials and methods: Patients diagnosed with PDTC between 2004 to 2015 were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Randomly split the data into training and validation sets. Kaplan-Meier analysis with the log-rank test was performed to compare the survival distribution among cases. Univariate and multivariate Cox proportional hazards regression analyses were used to identify independent prognostic factors, which were subsequently utilized to construct a nomogram for predicting the 5- and 10-year cancer-specific survival of patients with PDTC. The discriminative ability and calibration of the nomogram model were assessed using the concordance index and calibration plots, respectively. In addition, we performed a decision curve analysis to assess the clinical value of the nomogram. Simultaneously, we compared the predictive performance of the nomogram model against that of the American Joint Committee on Cancer (AJCC) T-, N-, M-stage. Results: A total of 970 eligible patients were randomly assigned to either a training cohort (n = 679) or a validation cohort (n = 291). The Kaplan-Meier analysis revealed that there were no significant differences in cumulative survival based on the race, radiation, and marital status of patients. The stepwise Cox regression model showed that the model was optimal when the following five variables were included: age, tumor size, T-, N-, and M-stage. A nomogram was developed as a graphical representation of the model and exhibited good calibration and discriminative ability in the study. Compared to the T-, N-, and M-stage, the C-index of nomogram (training group: 0.807, validation group: 0.802), the areas under the receiver operating characteristic curve of the training set (5-year AUC: 0.843, 10-year AUC:0.834) and the validation set (5-year AUC:0.878, 10-year AUC:0.811), and the calibration plots of this model all exhibited better performance. At last, compared with T-, N-, and M-stage, the decision curve analysis indicated that the nomogram had excellent clinical net benefit. Conclusions: The nomogram developed by us can accurately predict the CSS of PDTC patients. It can help clinicians determine appropriate treatment strategies for poorly differentiated thyroid carcinoma patients.


Subject(s)
Adenocarcinoma , Thyroid Neoplasms , Adenocarcinoma/pathology , Humans , Neoplasm Staging , Nomograms , SEER Program , Thyroid Neoplasms/epidemiology , Thyroid Neoplasms/therapy
4.
Front Endocrinol (Lausanne) ; 13: 830760, 2022.
Article in English | MEDLINE | ID: mdl-35360080

ABSTRACT

Purpose: Anaplastic thyroid carcinoma (ATC) and primary squamous cell carcinoma of the thyroid (PSCCTh) have similar histological findings and are currently treated using the same approaches; however, the characteristics and prognosis of these cancers are poorly researched. The objective of this study was to determine the differences in characteristics between ATC and PSCCTh and establish prognostic models. Patients and Methods: All variables of patients with ATC and PSCCTh, diagnosed from 2004-2015, were retrieved from the Surveillance, Epidemiology, and End Results Program (SEER) database. Percentage differences for categorical data were compared using the Chi-square test. Kaplan-Meier curves, log-rank test, and Cox-regression for survival analysis, and C-index value was used to evaluate the performance of the prognostic models. Results: After application of the inclusion and exclusion criteria, a total of 1164 ATC and 124 PSCCTh patients, diagnosed from 2004 to 2015, were included in the study. There were no differences in sex, ethnicity, age, marital status, or percentage of proximal metastases between the two cancers; however, radiotherapy, chemotherapy, incidence of surgical treatment, and presence of multiple primary tumors were higher in patients with ATC than those with PSCCTh. Further cancer-specific survival (CSS) of patients with PSCCTh was better than that of patients with ATC. Prognostic factors were not identical for the two cancers. Multivariate Cox model analysis indicated that age, sex, radiotherapy, chemotherapy, surgery, multiple primary tumors, marital status, and distant metastasis status are independent prognostic factors for CSS in patients with ATC, while for patients with PSCCTh, the corresponding factors are age, radiotherapy, multiple primary tumors, and surgery. The C-index values of the two models were both > 0.8, indicating that the models exhibited good discriminative ability. Conclusion: Prognostic factors influencing CSS were not identical in patients with ATC and PSCCTh. These findings indicate that different clinical treatment and management plans are required for patients with these two types of thyroid cancer.


Subject(s)
Carcinoma, Squamous Cell , Thyroid Carcinoma, Anaplastic , Thyroid Neoplasms , Carcinoma, Squamous Cell/epidemiology , Carcinoma, Squamous Cell/therapy , Epithelial Cells/pathology , Humans , Prognosis , Thyroid Carcinoma, Anaplastic/epidemiology , Thyroid Carcinoma, Anaplastic/therapy , Thyroid Neoplasms/diagnosis , Thyroid Neoplasms/epidemiology , Thyroid Neoplasms/therapy
5.
Comput Struct Biotechnol J ; 19: 4042-4048, 2021.
Article in English | MEDLINE | ID: mdl-34527183

ABSTRACT

Studies on codon property would deepen our understanding of the origin of primitive life and enlighten biotechnical application. Here, we proposed a quantitative measurement of codon-amino acid association and found that seven out of 13 physicochemical properties have stronger associations with the nucleotide identity at the second codon position, indicating that protein structure and function may associate more closely with it than the other two sites. When extending the effect of codon-amino acid association to protein level, it was found that the correlation between the second codon position (measured by the relative frequencies of nucleobase T and A at this codon site) and hydrophobicity (by the form of GRAVY value) became stronger with 96% genomes having R > 0.90 and p < 1e-60. Furthermore, we revealed that informational genes encoding proteins have lower GRAVY values than operational proteins (p < 3e-37) in both prokaryotic and eukaryotic genomes. The above results reveal a complete link from codon identity (A2 versus T2) to amino acid property (hydrophilic versus hydrophobic) and then to protein functions (informational versus operational). Hence, our work may help to understand how the nucleotide sequence determines protein function.

6.
Database (Oxford) ; 20202020 12 11.
Article in English | MEDLINE | ID: mdl-33306800

ABSTRACT

Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand-protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL: http://cefg.uestc.cn/ceg.


Subject(s)
Eukaryota , Genes, Essential , Databases, Factual , Genes, Essential/genetics , Genomics , Humans , Proteins
7.
Int J Biol Sci ; 15(7): 1396-1403, 2019.
Article in English | MEDLINE | ID: mdl-31337970

ABSTRACT

Dendritic cells (DCs) are the most potent specialized antigen-presenting cells as now known, which play a crucial role in initiating and amplifying both the innate and adaptive immune responses. Immunologically, the motilities and T cell activation capabilities of DCs are closely related to the resulting immune responses. However, due to the complexity of the immune system, the dynamic changes in the number of cells during the peripheral tissue (e.g. skin and mucosa) immune response induced by DCs are still poorly understood. Therefore, this study simulated dynamic number changes of DCs and T cells in this process by constructing several ordinary differential equations and setting the initial conditions of the functions and parameters. The results showed that these equations could simulate dynamic numerical changes of DCs and T cells in peripheral tissue and lymph node, which was in accordance with the physiological conditions such as the duration of immune response, the proliferation rates and the motilities of DCs and T cells. This model provided a theoretical reference for studying the immunologic functions of DCs and practical guidance for the clinical DCs-based therapy against immune-related diseases.


Subject(s)
Dendritic Cells/cytology , Immunity, Cellular , Models, Theoretical , T-Lymphocytes/cytology , Antigens/immunology , Cell Movement , Cell Proliferation , Humans , Immunotherapy , Inflammation , Lymph Nodes/pathology , Lymphocyte Activation
8.
Genome Biol Evol ; 10(8): 2072-2085, 2018 08 01.
Article in English | MEDLINE | ID: mdl-30060177

ABSTRACT

Pandemic cholera is a major concern for public health because of its high mortality and morbidity. Mutation accumulation (MA) experiments were performed on a representative strain of the current cholera pandemic. Although the base-pair substitution mutation rates in Vibrio cholerae (1.24 × 10-10 per site per generation for wild-type lines and 3.29 × 10-8 for mismatch repair deficient lines) are lower than that previously reported in other bacteria using MA analysis, we discovered specific high rates (8.31 × 10-8 site/generation for wild-type lines and 1.82 × 10-6 for mismatch repair deficient lines) of base duplication or deletion driven by large-scale copy number variations (CNVs). These duplication-deletions are located in two pathogenic islands, IMEX and the large integron island. Each element of these islands has discrepant rate in rapid integration and excision, which provides clues to the pandemicity evolution of V. cholerae. These results also suggest that large-scale structural variants such as CNVs can accumulate rapidly during short-term evolution. Mismatch repair deficient lines exhibit a significantly increased mutation rate in the larger chromosome (Chr1) at specific regions, and this pattern is not observed in wild-type lines. We propose that the high frequency of GATC sites in Chr1 improves the efficiency of MMR, resulting in similar rates of mutation in the wild-type condition. In addition, different mutation rates and spectra were observed in the MA lines under distinct growth conditions, including minimal media, rich media and antibiotic treatments.


Subject(s)
Base Pairing/genetics , Cholera/epidemiology , Cholera/microbiology , Gene Deletion , Gene Duplication , Pandemics , Vibrio cholerae/genetics , Chromosomes, Bacterial/genetics , Culture Media , DNA Replication Timing/drug effects , Genomic Islands , Humans , Mutation Rate , Reproducibility of Results , Rifampin/pharmacology , Vibrio cholerae/drug effects
9.
BMC Syst Biol ; 11(1): 50, 2017 04 19.
Article in English | MEDLINE | ID: mdl-28420402

ABSTRACT

BACKGROUND: Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. DESCRIPTION: Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. CONCLUSION: SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .


Subject(s)
Computational Biology/methods , Databases, Factual , Metabolic Flux Analysis
10.
Environ Microbiol ; 19(3): 1266-1280, 2017 03.
Article in English | MEDLINE | ID: mdl-28028888

ABSTRACT

Laribacter hongkongensis is a fish-borne pathogen associated with invasive infections and gastroenteritis. Its adaptive mechanisms to oxygen-limiting conditions in various environmental niches remain unclear. In this study, we compared the transcriptional profiles of L. hongkongensis under aerobic and anaerobic conditions using RNA-sequencing. Expression of genes involved in arginine metabolism significantly increased under anoxic conditions. Arginine was exploited as the sole energy source in L. hongkongensis for anaerobic respiration via the arginine catabolism pathway: specifically via the arginine deiminase (ADI) pathway. A transcriptional regulator FNR was identified to coordinate anaerobic metabolism by tightly regulating the expression of arginine metabolism genes. FNR executed its regulatory function by binding to FNR boxes in arc operons promoters. Survival of isogenic fnr mutant in macrophages decreased significantly when compared with wild-type; and expression level of fnr increased 8 h post-infection. Remarkably, FNR directly interacted with ArgR, another regulator that influences the biological fitness and intracellular survival of L. hongkongensis by regulating arginine metabolism genes. Our results demonstrated that FNR and ArgR work in coordination to respond to oxygen changes in both extracellular and intracellular environments, by finely regulating the ADI pathway and arginine anabolism pathways, thereby optimizing bacterial fitness in various environmental niches.


Subject(s)
Arginine/metabolism , Bacterial Proteins/metabolism , Betaproteobacteria/physiology , Gene Expression Regulation, Bacterial , Iron-Sulfur Proteins/metabolism , Acclimatization , Adaptation, Physiological , Anaerobiosis , Bacterial Proteins/genetics , Betaproteobacteria/genetics , Hydrolases/metabolism , Iron-Sulfur Proteins/genetics , Metabolic Networks and Pathways , Operon , Promoter Regions, Genetic
11.
Sci Rep ; 6: 35082, 2016 10 07.
Article in English | MEDLINE | ID: mdl-27713529

ABSTRACT

A minimal gene set (MGS) is critical for the assembly of a minimal artificial cell. We have developed a proposal of simplifying bacterial gene set to approximate a bacterial MGS by the following procedure. First, we base our simplified bacterial gene set (SBGS) on experimentally determined essential genes to ensure that the genes included in the SBGS are critical. Second, we introduced a half-retaining strategy to extract persistent essential genes to ensure stability. Third, we constructed a viable metabolic network to supplement SBGS. The proposed SBGS includes 327 genes and required 431 reactions. This report describes an SBGS that preserves both self-replication and self-maintenance systems. In the minimized metabolic network, we identified five novel hub metabolites and confirmed 20 known hubs. Highly essential genes were found to distribute the connecting metabolites into more reactions. Based on our SBGS, we expanded the pool of targets for designing broad-spectrum antibacterial drugs to reduce pathogen resistance. We also suggested a rough semi-de novo strategy to synthesize an artificial cell, with potential applications in industry.


Subject(s)
Artificial Cells/metabolism , Genes, Bacterial/genetics , Genes, Essential/genetics , Metabolic Networks and Pathways/genetics , Bacterial Proteins/genetics , Escherichia coli/genetics , Escherichia coli/metabolism , Gene Expression Regulation, Bacterial , Genomics/methods , Haemophilus influenzae/genetics , Mycoplasma genitalium/genetics
12.
Mol Biosyst ; 12(9): 2893-900, 2016 08 16.
Article in English | MEDLINE | ID: mdl-27410247

ABSTRACT

Pseudo dinucleotide composition (PseDNC) and Z curve showed excellent performance in the classification issues of nucleotide sequences in bioinformatics. Inspired by the principle of Z curve theory, we improved PseDNC to give the phase-specific PseDNC (psPseDNC). In this study, we used the prediction of recombination spots as a case to illustrate the capability of psPseDNC and also PseDNC fused with Z curve theory based on a novel machine learning method named large margin distribution machine (LDM). We verified that combining the two widely used approaches could generate better performance compared to only using PseDNC with a support vector machine based (SVM-based) model. The best Mathew's correlation coefficient (MCC) achieved by our LDM-based model was 0.7037 through the rigorous jackknife test and improved by ∼6.6%, ∼3.2%, and ∼2.4% compared with three previous studies. Similarly, the accuracy was improved by 3.2% compared with our previous iRSpot-PseDNC web server through an independent data test. These results demonstrate that the joint use of PseDNC and Z curve enhances performance and can extract more information from a biological sequence. To facilitate research in this area, we constructed a user-friendly web server for predicting hot/cold spots, HcsPredictor, which can be freely accessed from . In summary, we provided a united algorithm by integrating Z curve with PseDNC. We hope this united algorithm could be extended to other classification issues in DNA elements.


Subject(s)
Computational Biology/methods , DNA/chemistry , DNA/genetics , Nucleotides , Algorithms , Genome, Fungal , ROC Curve , Recombination, Genetic , Reproducibility of Results , Sensitivity and Specificity , Support Vector Machine , Web Browser
13.
Nucleic Acids Res ; 44(W1): W550-6, 2016 Jul 08.
Article in English | MEDLINE | ID: mdl-27150808

ABSTRACT

In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/.


Subject(s)
Computer Simulation , Drug Discovery/methods , Drug Evaluation, Preclinical/methods , Internet , Ligands , Proteins/chemistry , Software , Binding Sites , Imaging, Three-Dimensional , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology , User-Computer Interface
14.
Int J Mol Sci ; 16(9): 23111-26, 2015 Sep 23.
Article in English | MEDLINE | ID: mdl-26404268

ABSTRACT

Composition bias from Chargaff's second parity rule (PR2) has long been found in sequenced genomes, and is believed to relate strongly with the replication process in microbial genomes. However, some disagreement on the underlying reason for strand composition bias remains. We performed an integrative analysis of various genomic features that might influence composition bias using a large-scale dataset of 1111 genomes. Our results indicate (1) the bias was stronger in obligate intracellular bacteria than in other free-living species (p-value=0.0305); (2) Fusobacteria and Firmicutes had the highest average bias among the 24 microbial phyla analyzed; (3) the strength of selected codon usage bias and generation times were not observably related to strand composition bias (p-value=0.3247); (4) significant negative relationships were found between GC content, genome size, rearrangement frequency, Clusters of Orthologous Groups (COG) functional subcategories A, C, I, Q, and composition bias (p-values<1.0×10(-8)); (5) gene density and COG functional subcategories D, F, J, L, and V were positively related with composition bias (p-value<2.2×10(-16)); and (6) gene density made the most important contribution to composition bias, indicating transcriptional bias was associated strongly with strand composition bias. Therefore, strand composition bias was found to be influenced by multiple factors with varying weights.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Base Composition , Gene Dosage , Genes, Bacterial , Principal Component Analysis , Recombination, Genetic
15.
Methods Mol Biol ; 1279: 205-17, 2015.
Article in English | MEDLINE | ID: mdl-25636621

ABSTRACT

Essential genes are those genes indispensable for the survival of any living cell. Bacterial essential genes constitute the cornerstones of synthetic biology and are often attractive targets in the development of antibiotics and vaccines. Because identification of essential genes with wet-lab ways often means expensive economic costs and tremendous labor, scientists changed to seek for alternative way of computational prediction. Aiming to help to solve this issue, our research group (CEFG: group of Computational, Comparative, Evolutionary and Functional Genomics, http://cefg.uestc.edu.cn) has constructed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. To ensure reliable predictions, the investigated species should belong to the same family (for EGP) or phylum (for CEG_Match and Geptop) with one of the reference species, respectively. As the pilot software for the issue, predicting accuracies of them have been assessed and compared with existing algorithms, and note that all of other published algorithms have not any formed online services. We hope these services at CEFG will help scientists and researchers in the field of essential genes.


Subject(s)
Computational Biology/methods , Genes, Bacterial , Genes, Essential , Area Under Curve , Base Sequence , Databases, Genetic , Escherichia coli K12/genetics , Evolution, Molecular , Genomics , Multigene Family
16.
Article in English | MEDLINE | ID: mdl-24923821

ABSTRACT

Knowledge of an organism's fitness for survival is important for a complete understanding of microbial genetics and effective drug design. Current essential gene databases provide only binary essentiality data from genome-wide experiments. We therefore developed a new database that Integrates quantitative Fitness Information for Microbial genes (IFIM). The IFIM database currently contains data from 16 experiments and 2186 theoretical predictions. The highly significant correlation between the experiment-derived fitness data and our computational simulations demonstrated that the computer-generated predictions were often as reliable as the experimental data. The data in IFIM can be accessed easily, and the interface allows users to browse through the gene fitness information that it contains. IFIM is the first resource that allows easy access to fitness data of microbial genes. We believe this database will contribute to a better understanding of microbial genetics and will be useful in designing drugs to resist microbial pathogens, especially when experimental data are unavailable. Database URL: http://cefg.uestc.edu.cn/ifim/ or http://cefg.cn/ifim/


Subject(s)
Databases, Genetic , Genes, Microbial , Genetic Fitness , Computational Biology , Data Collection , Electronic Data Processing , Gene Dosage , Genes, Bacterial , Software , User-Computer Interface
17.
Mol Biol Evol ; 31(5): 1302-8, 2014 May.
Article in English | MEDLINE | ID: mdl-24531082

ABSTRACT

Mutation is the ultimate source of genetic variation and evolution. Mutation accumulation (MA) experiments are an alternative approach to study de novo mutation events directly. We have constructed a resource of Spontaneous Mutation Accumulation Lines (SMAL; http://cefg.uestc.edu.cn/smal), which contains all the current publicly available MA lines identified by high-throughput sequencing. We have relocated and mapped the mutations based on the most recent genome annotations. A total of 5,608 single base mutations and 540 other mutations were obtained and are recorded in the current version of the SMAL database. The integrated data in SMAL provide detailed information that can be used in new theoretical analyses. We believe that the SMAL resource will help researchers better understand the processes of genetic variation and the incidence of disease.


Subject(s)
Databases, Genetic , Mutation , Animals , Drosophila melanogaster/genetics , Escherichia coli/genetics , Evolution, Molecular , Female , Genetic Drift , Genetic Fitness , Genomics , High-Throughput Nucleotide Sequencing , Humans , Male , Models, Genetic , Salmonella typhimurium/genetics
18.
BMC Genomics ; 14: 769, 2013 Nov 09.
Article in English | MEDLINE | ID: mdl-24209780

ABSTRACT

BACKGROUND: Essential genes are indispensable for the survival of living entities. They are the cornerstones of synthetic biology, and are potential candidate targets for antimicrobial and vaccine design. DESCRIPTION: Here we describe the Cluster of Essential Genes (CEG) database, which contains clusters of orthologous essential genes. Based on the size of a cluster, users can easily decide whether an essential gene is conserved in multiple bacterial species or is species-specific. It contains the similarity value of every essential gene cluster against human proteins or genes. The CEG_Match tool is based on the CEG database, and was developed for prediction of essential genes according to function. The database is available at http://cefg.uestc.edu.cn/ceg. CONCLUSIONS: Properties contained in the CEG database, such as cluster size, and the similarity of essential gene clusters against human proteins or genes, are very important for evolutionary research and drug design. An advantage of CEG is that it clusters essential genes based on function, and therefore decreases false positive results when predicting essential genes in comparison with using the similarity alignment method.


Subject(s)
Databases, Genetic , Genes, Essential , Internet , Algorithms , Humans , Microarray Analysis , Software , Species Specificity
19.
PLoS One ; 8(8): e72343, 2013.
Article in English | MEDLINE | ID: mdl-23977285

ABSTRACT

Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.


Subject(s)
Genes, Essential , Genome, Bacterial , Gram-Negative Bacteria/genetics , Gram-Positive Bacteria/genetics , Software , Area Under Curve , Bacterial Proteins/genetics , Gram-Negative Bacteria/classification , Gram-Positive Bacteria/classification , Molecular Sequence Annotation , Phylogeny , Protein Interaction Mapping , ROC Curve , Reproducibility of Results
20.
DNA Res ; 19(6): 477-85, 2012 Dec.
Article in English | MEDLINE | ID: mdl-23132389

ABSTRACT

There has been significant progress in understanding the process of protein translation in recent years. One of the best examples is the discovery of usage bias in successive synonymous codons and its role in eukaryotic translation efficiency. We observed here a similar type of bias in the other two life domains, bacteria and archaea, although the bias strength was much smaller than in eukaryotes. Among 136 prokaryotic genomes, 98 were found to have significant bias from random use of successive synonymous codons with Z scores larger than three. Furthermore, significantly different bias strengths were found between prokaryotes grouped by various genomic or biochemical characteristics. Interestingly, the bias strength measured by a general Z score could be fitted well (R = 0.83, P < 10(-15)) by three genomic variables: genome size, G + C content, and tRNA gene number based on multiple linear regression. A different distribution of synonymous codon pairs between protein-coding genes and intergenic sequences suggests that bias is caused by translation selection. The present results indicate that protein translation is tuned by codon (pair) usage, and the intensity of the regulation is associated with genome size, tRNA gene number, and G + C content.


Subject(s)
Archaea/genetics , Bacteria/genetics , Codon/genetics , Eukaryota/genetics , Genome/genetics , Protein Biosynthesis/genetics , Animals , Base Composition , Evolution, Molecular , Gene Dosage , Genome Size , Humans , Linear Models , RNA, Transfer/genetics , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...