Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 45
Filter
Add more filters










Publication year range
1.
BMC Genomics ; 25(1): 671, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38970011

ABSTRACT

BACKGROUND: The dirigent (DIR) genes encode proteins that act as crucial regulators of plant lignin biosynthesis. In Solanaceae species, members of the DIR gene family are intricately related to plant growth and development, playing a key role in responding to various biotic and abiotic stresses. It will be of great application significance to analyze the DIR gene family and expression profile under various pathogen stresses in Solanaceae species. RESULTS: A total of 57 tobacco NtDIRs and 33 potato StDIRs were identified based on their respective genome sequences. Phylogenetic analysis of DIR genes in tobacco, potato, eggplant and Arabidopsis thaliana revealed three distinct subgroups (DIR-a, DIR-b/d and DIR-e). Gene structure and conserved motif analysis showed that a high degree of conservation in both exon/intron organization and protein motifs among tobacco and potato DIR genes, especially within members of the same subfamily. Total 8 pairs of tandem duplication genes (3 pairs in tobacco, 5 pairs in potato) and 13 pairs of segmental duplication genes (6 pairs in tobacco, 7 pairs in potato) were identified based on the analysis of gene duplication events. Cis-regulatory elements of the DIR promoters participated in hormone response, stress responses, circadian control, endosperm expression, and meristem expression. Transcriptomic data analysis under biotic stress revealed diverse response patterns among DIR gene family members to pathogens, indicating their functional divergence. After 96 h post-inoculation with Ralstonia solanacearum L. (Ras), tobacco seedlings exhibited typical symptoms of tobacco bacterial wilt. The qRT-PCR analysis of 11 selected NtDIR genes displayed differential expression pattern in response to the bacterial pathogen Ras infection. Using line 392278 of potato as material, typical symptoms of potato late blight manifested on the seedling leaves under Phytophthora infestans infection. The qRT-PCR analysis of 5 selected StDIR genes showed up-regulation in response to pathogen infection. Notably, three clustered genes (NtDIR2, NtDIR4, StDIR3) exhibited a robust response to pathogen infection, highlighting their essential roles in disease resistance. CONCLUSION: The genome-wide identification, evolutionary analysis, and expression profiling of DIR genes in response to various pathogen infection in tobacco and potato have provided valuable insights into the roles of these genes under various stress conditions. Our results could provide a basis for further functional analysis of the DIR gene family under pathogen infection conditions.


Subject(s)
Evolution, Molecular , Multigene Family , Nicotiana , Phylogeny , Plant Proteins , Solanum tuberosum , Solanum tuberosum/genetics , Solanum tuberosum/microbiology , Nicotiana/genetics , Nicotiana/microbiology , Plant Proteins/genetics , Gene Expression Regulation, Plant , Plant Diseases/microbiology , Plant Diseases/genetics , Stress, Physiological/genetics , Promoter Regions, Genetic , Gene Duplication , Ralstonia solanacearum , Genes, Plant
2.
J Proteome Res ; 23(1): 494-499, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38069805

ABSTRACT

Plant-pathogen protein-protein interactions (PPIs) play crucial roles in the arm race between plants and pathogens. Therefore, the identification of these interspecies PPIs is very important for the mechanistic understanding of pathogen infection and plant immunity. Computational prediction methods can complement experimental efforts, but their predictive performance still needs to be improved. Motivated by the rapid development of natural language processing and its successful applications in the field of protein bioinformatics, here we present an improved XGBoost-based plant-pathogen PPI predictor (i.e., AraPathogen2.0), in which sequence encodings from the pretrained protein language model ESM2 and Arabidopsis PPI network-related node representations from the graph embedding technique struc2vec are used as input. Stringent benchmark experiments showed that AraPathogen2.0 could achieve a better performance than its precedent version, especially for processing the test data set with novel proteins unseen in the training data.


Subject(s)
Arabidopsis , Protein Interaction Mapping , Protein Interaction Mapping/methods , Natural Language Processing , Plants , Proteins/metabolism , Arabidopsis/metabolism
3.
Int J Mol Sci ; 24(6)2023 Mar 15.
Article in English | MEDLINE | ID: mdl-36982696

ABSTRACT

Transcription factors (TFs) play critical roles in mediating the plant response to various abiotic stresses, particularly heat stress. Plants respond to elevated temperatures by modulating the expression of genes involved in diverse metabolic pathways, a regulatory process primarily governed by multiple TFs in a networked configuration. Many TFs, such as WRKY, MYB, NAC, bZIP, zinc finger protein, AP2/ERF, DREB, ERF, bHLH, and brassinosteroids, are associated with heat shock factor (Hsf) families, and are involved in heat stress tolerance. These TFs hold the potential to control multiple genes, which makes them ideal targets for enhancing the heat stress tolerance of crop plants. Despite their immense importance, only a small number of heat-stress-responsive TFs have been identified in rice. The molecular mechanisms underpinning the role of TFs in rice adaptation to heat stress still need to be researched. This study identified three TF genes, including OsbZIP14, OsMYB2, and OsHSF7, by integrating transcriptomic and epigenetic sequencing data analysis of rice in response to heat stress. Through comprehensive bioinformatics analysis, we demonstrated that OsbZIP14, one of the key heat-responsive TF genes, contained a basic-leucine zipper domain and primarily functioned as a nuclear TF with transcriptional activation capability. By knocking out the OsbZIP14 gene in the rice cultivar Zhonghua 11, we observed that the knockout mutant OsbZIP14 exhibited dwarfism with reduced tiller during the grain-filling stage. Under high-temperature treatment, it was also demonstrated that in the OsbZIP14 mutant, the expression of the OsbZIP58 gene, a key regulator of rice seed storage protein (SSP) accumulation, was upregulated. Furthermore, bimolecular fluorescence complementation (BiFC) experiments uncovered a direct interaction between OsbZIP14 and OsbZIP58. Our results suggested that OsbZIP14 acts as a key TF gene through the concerted action of OsbZIP58 and OsbZIP14 during rice filling under heat stress. These findings provide good candidate genes for genetic improvement of rice but also offer valuable scientific insights into the mechanism of heat tolerance stress in rice.


Subject(s)
Oryza , Humans , Oryza/metabolism , RNA-Seq , Chromatin Immunoprecipitation Sequencing , Transcription Factors/genetics , Transcription Factors/metabolism , Heat-Shock Response/genetics , Stress, Physiological/genetics , Plants/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism , Gene Expression Regulation, Plant
4.
Genes (Basel) ; 13(8)2022 07 28.
Article in English | MEDLINE | ID: mdl-36011264

ABSTRACT

The availability of large-scale genomic data resources makes it very convenient to mine and analyze genes that are related to important agricultural traits in rice. Pan-genomes have been constructed to provide insight into the genome diversity and functionality of different plants, which can be used in genome-assisted crop improvement. Thus, a pan-genome comprising all genetic elements is crucial for comprehensive variation study among the heat-resistant and -susceptible rice varieties. In this study, a rice pan-genome was firstly constructed by using 45 heat-tolerant and 15 heat-sensitive rice varieties. A total of 38,998 pan-genome genes were identified, including 37,859 genes in the reference and 1141 in the non-reference contigs. Genomic variation analysis demonstrated that a total of 76,435 SNPs were detected and identified as the heat-tolerance-related SNPs, which were specifically present in the highly heat-resistant rice cultivars and located in the genic regions or within 2 kbp upstream and downstream of the genes. Meanwhile, 3214 upregulated and 2212 downregulated genes with heat stress tolerance-related SNPs were detected in one or multiple RNA-seq datasets of rice under heat stress, among which 24 were located in the non-reference contigs of the rice pan-genome. We then mapped the DEGs with heat stress tolerance-related SNPs to the heat stress-resistant QTL regions. A total of 1677 DEGs, including 990 upregulated and 687 downregulated genes, were mapped to the 46 heat stress-resistant QTL regions, in which 2 upregulated genes with heat stress tolerance-related SNPs were identified in the non-reference sequences. This pan-genome resource is an important step towards the effective and efficient genetic improvement of heat stress resistance in rice to help meet the rapidly growing needs for improved rice productivity under different environmental stresses. These findings provide further insight into the functional validation of a number of non-reference genes and, especially, the two genes identified in the heat stress-resistant QTLs in rice.


Subject(s)
Oryza , Thermotolerance , Genes, Plant , Oryza/genetics , Quantitative Trait Loci/genetics , Thermotolerance/genetics , Transcriptome
5.
Genes (Basel) ; 13(2)2022 01 25.
Article in English | MEDLINE | ID: mdl-35205268

ABSTRACT

Due to global warming, high temperature is a significant environmental stress for rice production. Rice (Oryza sativa L.), one of the most crucial cereal crops, is also seriously devastated by Magnaporthe oryzae. Therefore, it is essential to breed new rice cultivars with blast and heat tolerance. Although progress had been made in QTL mapping and RNA-seq analysis in rice in response to blast and heat stresses, there are few reports on simultaneously mining blast-resistant and heat-tolerant genes. In this study, we separately conducted meta-analysis of 839 blast-resistant and 308 heat-tolerant QTLs in rice. Consequently, 7054 genes were identified in 67 blast-resistant meta-QTLs with an average interval of 1.00 Mb. Likewise, 6425 genes were obtained in 40 heat-tolerant meta-QTLs with an average interval of 1.49 Mb. Additionally, using differentially expressed genes (DEGs) in the previous research and GO enrichment analysis, 55 DEGs were co-located on the common regions of 16 blast-resistant and 14 heat-tolerant meta-QTLs. Among, OsChib3H-c, OsJAMyb, Pi-k, OsWAK1, OsMT2b, OsTPS3, OsHI-LOX, OsACLA-2 and OsGS2 were the significant candidate genes to be further investigated. These results could provide the gene resources for rice breeding with excellent resistance to these 2 stresses, and help to understand how plants response to the combination stresses of blast fungus and high temperature.


Subject(s)
Oryza , Thermotolerance , Oryza/genetics , Oryza/microbiology , Plant Breeding , Quantitative Trait Loci , RNA-Seq , Thermotolerance/genetics
6.
Arch Insect Biochem Physiol ; 105(1): e21693, 2020 Sep.
Article in English | MEDLINE | ID: mdl-32436316

ABSTRACT

The fruit fly Drosophila melanogaster can be used as a model organism for studying various problems in biomedicine and pest management. A large number of fruit fly transcriptomes have been profiled in various cell types, tissues, development stages, toxicological exposures, and other conditions by microarray. Until now, there are still no database developed for exploring those precious data. Microarray data for 4,367 samples from National Center for Biotechnology Information Gene Expression Omnibus was collected, and analyzed by weighted gene coexpression network analysis algorithm. Fifty one gene coexpression modules that are related to cell types, tissues, development stages, and other experimental conditions were identified. The high dimensional gene expression was reduced to tens of modules that were associated with experiments/traits, representing signatures for phenotypes. Six modules were enriched with genomic regions of clustered genes. Hub genes could also be screened by intramodule connectivity. By analyzing higher order module networks, we found that cell signaling modules are more connected than other modules. Module-based gene function identification may help to discover novel gene function. An easy-to-use database was developed, which provides a new source for gene function study in the fruit fly (http://bioinformatics.fafu.edu.cn/fly/).


Subject(s)
Databases, Genetic , Drosophila melanogaster/genetics , Gene Expression , Animals
7.
Mol Genet Genomics ; 295(4): 1055-1062, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32222838

ABSTRACT

DrugMatrix is a valuable toxicogenomic dataset, which provides in vivo transcriptome data corresponding to hundreds of chemical drugs. However, the relationships between drugs and how those drugs affect the biological process are still unknown. The high dimensionality of the microarray data hinders its application. The aims of this study are to (1) represent the transcriptome data by lower-dimensional vectors, (2) compare drug similarity, (3) represent drug combinations by adding vectors and (4) infer drug mechanism of action (MoA) and genotoxicity features. We borrowed the latent semantic analysis (LSA) technique from natural language processing to represent treatments (drugs with multiple concentrations and time points) by dense vectors, each dimension of which is an orthogonal biological feature. The gProfiler enrichment tool was used for the 100-dimensional vector feature annotation. The similarity between treatments vectors was calculated by the cosine function. Adding vectors may represent drug combinations, treatment times or treatment doses that are not presented in the original data. Drug-drug interaction pairs had a higher similarity than random drug pairs in the hepatocyte data. The vector features helped to reveal the MoA. Differential feature expression was also implicated for genotoxic and non-genotoxic carcinogens. An easy-to-use Web tool was developed by Shiny Web application framework for the exploration of treatment similarities and drug combinations (https://bioinformatics.fafu.edu.cn/drugmatrix/). We represented treatments by vectors and provided a tool that is useful for hypothesis generation in toxicogenomic, such as drug similarity, drug repurposing, combination therapy and MoA.


Subject(s)
Drug Combinations , Drug Interactions , Software , Toxicogenetics/methods , Algorithms , Databases, Pharmaceutical/trends , Hepatocytes/drug effects , Hepatocytes/metabolism , Humans , Transcriptome/genetics
8.
Rice (N Y) ; 12(1): 97, 2019 Dec 23.
Article in English | MEDLINE | ID: mdl-31872320

ABSTRACT

BACKGROUND: Rice (Oryza sativa L.) yield is limited inherently by environmental stresses, including biotic and abiotic stresses. Thus, it is of great importance to perform in-depth explorations on the genes that are closely associated with the stress-resistant traits in rice. The existing rice SNP databases have made considerable contributions to rice genomic variation information but none of them have a particular focus on integrating stress-resistant variation and related phenotype data into one web resource. RESULTS: Rice Stress-Resistant SNP database (http://bioinformatics.fafu.edu.cn/RSRS) mainly focuses on SNPs specific to biotic and abiotic stress-resistant ability in rice, and presents them in a unified web resource platform. The Rice Stress-Resistant SNP (RSRS) database contains over 9.5 million stress-resistant SNPs and 797 stress-resistant candidate genes in rice, which were detected from more than 400 stress-resistant rice varieties. We incorporated the SNPs function, genome annotation and phenotype information into this database. Besides, the database has a user-friendly web interface for users to query, browse and visualize a specific SNP efficiently. RSRS database allows users to query the SNP information and their relevant annotations for individual variety or more varieties. The search results can be visualized graphically in a genome browser or displayed in formatted tables. Users can also align SNPs between two or more rice accessions. CONCLUSION: RSRS database shows great utility for scientists to further characterize the function of variants related to environmental stress-resistant ability in rice.

9.
Planta ; 249(5): 1487-1501, 2019 May.
Article in English | MEDLINE | ID: mdl-30701323

ABSTRACT

MAIN CONCLUSION: A comprehensive network of the Arabidopsis transcriptome was analyzed and may serve as a valuable resource for candidate gene function investigations. A web tool to explore module information was also provided. Arabidopsis thaliana is a widely studied model plant whose transcriptome has been substantially profiled in various tissues, development stages and other conditions. These data can be reused for research on gene function through a systematic analysis of gene co-expression relationships. We collected microarray data from National Center for Biotechnology Information Gene Expression Omnibus, identified modules of co-expressed genes and annotated module functions. These modules were associated with experiments/traits, which provided potential signature modules for phenotypes. Novel heat shock proteins were implicated according to guilt by association. A higher-order module networks analysis suggested that the Arabidopsis network can be further organized into 15 meta-modules and that a chloroplast meta-module has a distinct gene expression pattern from the other 14 meta-modules. A comparison with the rice transcriptome revealed preserved modules and KEGG pathways. All the module gene information was available from an online tool at http://bioinformatics.fafu.edu.cn/arabi/ . Our findings provide a new source for future gene discovery in Arabidopsis.


Subject(s)
Arabidopsis/genetics , Transcriptome/genetics , Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/methods , Gene Expression Regulation, Plant/genetics , Gene Expression Regulation, Plant/physiology , Promoter Regions, Genetic/genetics
10.
Plants (Basel) ; 8(2)2019 Jan 23.
Article in English | MEDLINE | ID: mdl-30678057

ABSTRACT

Rice blast, caused by the fungus, Magnaporthe grisea (M. grisea), lead to the decrease of rice yields widely and destructively, threatening global food security. Although many resistant genes had been isolated and identified in various rice varieties, it is still not enough to clearly understand the mechanism of race-specific resistant ability in rice, especially on the protein level. In this research, proteomic methods were employed to analyze the differentially expressed proteins (DEPs) in susceptible rice variety CO39 and its two near isogenic lines (NILs), CN-4a and CN-4b, in response to the infection of two isolates with different pathogenicity, GUY11 and 81278ZB15. A total of 50 DEPs with more than 1.5-fold reproducible change were identified. At 24 and 48 hpi of GUY11, 32 and 16 proteins in CN-4b were up-regulated, among which 16 and five were paralleled with the expression of their corresponding RNAs. Moreover, 13 of 50 DEPs were reported to be induced by M. grisea in previous publications. Considering the phenotypes of the three tested rice varieties, we found that 21 and 23 up-regulated proteins were responsible for the rice resistant ability to the two different blast isolates, 81278ZB15 and GUY11, respectively. Two distinct branches corresponding to GUY11 and 81278ZB15 were observed in the expression and function of the module cluster of DEPs, illuminating that the DEPs could be responsible for race-specific resistant ability in rice. In other words, DEPs in rice are involved in different patterns and functional modules' response to different pathogenic race infection, inducing race-specific resistant ability in rice.

11.
Brief Bioinform ; 20(1): 274-287, 2019 01 18.
Article in English | MEDLINE | ID: mdl-29028906

ABSTRACT

The identification of plant-pathogen protein-protein interactions (PPIs) is an attractive and challenging research topic for deciphering the complex molecular mechanism of plant immunity and pathogen infection. Considering that the experimental identification of plant-pathogen PPIs is time-consuming and labor-intensive, computational methods are emerging as an important strategy to complement the experimental methods. In this work, we first evaluated the performance of traditional computational methods such as interolog, domain-domain interaction and domain-motif interaction in predicting known plant-pathogen PPIs. Owing to the low sensitivity of the traditional methods, we utilized Random Forest to build an inter-species PPI prediction model based on multiple sequence encodings and novel network attributes in the established plant PPI network. Critical assessment of the features demonstrated that the integration of sequence information and network attributes resulted in significant and robust performance improvement. Additionally, we also discussed the influence of Gene Ontology and gene expression information on the prediction performance. The Web server implementing the integrated prediction method, named InterSPPI, has been made freely available at http://systbio.cau.edu.cn/intersppi/index.php. InterSPPI could achieve a reasonably high accuracy with a precision of 73.8% and a recall of 76.6% in the independent test. To examine the applicability of InterSPPI, we also conducted cross-species and proteome-wide plant-pathogen PPI prediction tests. Taken together, we hope this work can provide a comprehensive understanding of the current status of plant-pathogen PPI predictions, and the proposed InterSPPI can become a useful tool to accelerate the exploration of plant-pathogen interactions.


Subject(s)
Plant Proteins/metabolism , Plants/metabolism , Plants/microbiology , Protein Interaction Mapping/methods , Algorithms , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis/microbiology , Arabidopsis Proteins/genetics , Arabidopsis Proteins/immunology , Arabidopsis Proteins/metabolism , Computational Biology/methods , Databases, Protein/statistics & numerical data , Gene Expression Profiling/statistics & numerical data , Gene Ontology , Host-Pathogen Interactions/genetics , Host-Pathogen Interactions/immunology , Machine Learning , Models, Biological , Plant Diseases/genetics , Plant Diseases/immunology , Plant Diseases/microbiology , Plant Immunity/genetics , Plant Proteins/genetics , Plant Proteins/immunology , Plants/genetics , Protein Interaction Mapping/statistics & numerical data
12.
Brief Bioinform ; 20(2): 448-456, 2019 03 22.
Article in English | MEDLINE | ID: mdl-29040362

ABSTRACT

Rice blast disease caused by the fungus Magnaporthe grisea (M. grisea) is one of the most serious diseases for the cultivated rice Oryza sativa (O. sativa). A key factor causing rice blast disease and defense might be protein-protein interactions (PPIs) between rice and fungus. In this research, we have developed a computational pipeline to predict PPIs between blast fungus and rice. After cross-prediction by interolog-based and domain-based method, we achieved 532 potential PPIs between 27 fungus proteins and 236 rice proteins. Accuracy of jackknife test, 10-fold cross-validation test and independent test for these PPIs were 90.43, 93.85 and 84.67%, respectively, by using support vector machine classification method. Meanwhile, the pathogenic genes of blast fungus were enriched in the predicted PPIs network when compared with 1000 random interaction networks. The rice regulatory network was downloaded and divided into 228 subnetworks with over six nodes, and the top seven subnetworks affected by blast fungus through PPIs were investigated. The results indicated that 34 upregulated and 12 downregulated master regulators in rice interacting with the fungus proteins in response to the infection of blast fungus. The common master regulators in rice in response to the infection of M. grisea, Xanthomonas oryzae pv.oryzae and rice stripe virus were analyzed. The ubiquitin proteasome pathway was the common pathway in rice regulated by these three pathogens, while apoptosis signaling pathway was induced by fungus and bacteria. In summary, the results in this article provide insight into the process of blast fungus infection.


Subject(s)
Computational Biology/methods , Fungal Proteins/metabolism , Gene Regulatory Networks , Magnaporthe/metabolism , Oryza/metabolism , Plant Proteins/metabolism , Protein Interaction Maps , Fungal Proteins/genetics , Magnaporthe/pathogenicity , Oryza/microbiology , Plant Proteins/genetics
13.
Exp Ther Med ; 16(2): 493-500, 2018 Aug.
Article in English | MEDLINE | ID: mdl-30112021

ABSTRACT

The Connectivity Map (CMap) is a tool that has been extensively utilized to study drug repositioning and side-effect prediction. However, most of these analyses rely on signature genes, ignoring the pathways by which those genes are regulated, as well as the functional overlap of redundant genes. The present study utilized a systems biology approach referred to as Weighted Gene Co-expression Network Analysis (WGCNA) to dissect the transcriptional profiles of CMap and reveal these hidden factors. Seven common modules associated with protein binding, extracellular matrix organization and translation were identified. Furthermore, drugs were clustered based on module expression to infer their mechanism of action (MoA) based on common activity profiles. As an extension of this, an example of disease-based module projection to identify novel drugs was provided. The analysis developed in the present study may provide a novel framework for drug repositioning or discovering MoAs.

14.
Oncol Rep ; 39(6): 2527-2536, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29620224

ABSTRACT

Public transcriptome databases provide a valuable resource for genome­wide co­expression network analysis and investigation of the molecular mechanisms that underlie pathogenesis. To discover genes that may affect patient survival, a large­scale analysis of human colorectal cancer (CRC) datasets that were retrieved from the NCBI Gene Expression Omnibus was performed. A gene co­expression network was constructed using weighted gene co­expression network analysis (WGCNA). A total of 18 co­expressed gene modules were identified, of which two genes corresponded to cell migration and the cell cycle, two genes were involved in immune responses, two genes corresponded to mitochondrial function, and one gene corresponded to RNA splicing. A total of eight hub genes in the cell migration/extracellular matrix module were associated with poor prognosis in CRC, and the P­value for collagen type VI α3 chain (COL6A3) was the lowest. In silico analysis of cell type­specific gene expression and COL6A3 knockout experiments indicated the clinical relevance of COL6A3 in the development of CRC. In summary, the present analysis provides a basis for understanding the molecular characterization of CRC at the transcription level. COL6A3 may be a promising biomarker or target for the prognosis and treatment of CRC.


Subject(s)
Collagen Type VI/genetics , Colorectal Neoplasms/genetics , Down-Regulation , Gene Expression Profiling/methods , Apoptosis , Biomarkers, Tumor/genetics , Cell Line, Tumor , Cell Movement , Cell Proliferation , Computer Simulation , Databases, Genetic , Gene Expression Regulation, Neoplastic , Gene Knockout Techniques , Gene Regulatory Networks , Humans , Prognosis , Survival Analysis
15.
Oncol Lett ; 15(4): 4351-4357, 2018 Apr.
Article in English | MEDLINE | ID: mdl-29541203

ABSTRACT

The stromal and immune cells that form the tumor microenvironment serve a key role in the aggressiveness of tumors. Current tumor-centric interpretations of cancer transcriptome data ignore the roles of stromal and immune cells. The aim of the present study was to investigate the clinical utility of stromal and immune cells in tissue-based transcriptome data. The 'Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data' (ESTIMATE) algorithm was used to probe diverse cancer datasets and the fraction of stromal and immune cells in tumor tissues was scored. The association between the ESTIMATE scores and patient survival data was asessed; it was indicated that the two scores have implications for patient survival, metastasis and recurrence. Analysis of a colorectal cancer progression dataset revealed that decreased levels immune cells could serve an important role in cancer progression. The results of the present study indicated that trasncriptome-derived stromal and immune scores may be a useful indicator of cancer prognosis.

16.
Cells ; 7(3)2018 Mar 08.
Article in English | MEDLINE | ID: mdl-29518040

ABSTRACT

Network-based systems biology has become an important method for analyzing high-throughput gene expression data and gene function mining. Escherichia coli (E. coli) has long been a popular model organism for basic biological research. In this paper, weighted gene co-expression network analysis (WGCNA) algorithm was applied to construct gene co-expression networks in E. coli. Thirty-one gene co-expression modules were detected from 1391 microarrays of E. coli data. Further characterization of these modules with the database for annotation, visualization, and integrated discovery (DAVID) tool showed that these modules are associated with several kinds of biological processes, such as carbohydrate catabolism, fatty acid metabolism, amino acid metabolism, transportation, translation, and ncRNA metabolism. Hub genes were also screened by intra-modular connectivity. Genes with unknown functions were annotated by guilt-by-association. Comparison with a previous prediction tool, EcoliNet, suggests that our dataset can expand gene predictions. In summary, 31 functional modules were identified in E. coli, 24 of which were functionally annotated. The analysis provides a resource for future gene discovery.

17.
Biomed Rep ; 7(2): 153-158, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28804628

ABSTRACT

Network-based systems biology has become an important method for analyzing high-throughput gene expression data and gene function mining. Yeast has long been a popular model organism for biomedical research. In the current study, a weighted gene co-expression network analysis algorithm was applied to construct a gene co-expression network in Saccharomyces cerevisiae. Seventeen stable gene co-expression modules were detected from 2,814 S. cerevisiae microarray data. Further characterization of these modules with the Database for Annotation, Visualization and Integrated Discovery tool indicated that these modules were associated with certain biological processes, such as heat response, cell cycle, translational regulation, mitochondrion oxidative phosphorylation, amino acid metabolism and autophagy. Hub genes were also screened by intra-modular connectivity. Finally, the module conservation was evaluated in a human disease microarray dataset. Functional modules were identified in budding yeast, some of which are associated with patient survival. The current study provided a paradigm for single cell microorganisms and potentially other organisms.

18.
Brief Bioinform ; 18(2): 270-278, 2017 03 01.
Article in English | MEDLINE | ID: mdl-26970777

ABSTRACT

Heterotrimeric G protein signaling cascades are one of the primary metazoan sensing mechanisms linking a cell to environment. However, the number of experimentally identified effectors of G protein in plant is limited. We have therefore studied which tools are best suited for predicting G protein effectors in rice. Here, we compared the predicting performance of four classifiers with eight different encoding schemes on the effectors of G proteins by using 10-fold cross-validation. Four methods were evaluated: random forest, naive Bayes, K-nearest neighbors and support vector machine. We applied these methods to experimentally identified effectors of G proteins and randomly selected non-effector proteins, and tested their sensitivity and specificity. The result showed that random forest classifier with composition of K-spaced amino acid pairs and composition of motif or domain (CKSAAP_PROSITE_200) combination method yielded the best performance, with accuracy and the Mathew's correlation coefficient reaching 74.62% and 0.49, respectively. We have developed G-Effector, an online predictor, which outperforms BLAST, PSI-BLAST and HMMER on predicting the effectors of G proteins. This provided valuable guidance for the researchers to select classifiers combined with different feature selection encoding schemes. We used G-Effector to screen the effectors of G protein in rice, and confirmed the candidate effectors by gene co-expression data. Interestingly, one of the top 15 candidates, which did not appear in the training data set, was validated in a previous research work. Therefore, the candidate effectors list in this article provides both a clue for researchers as to their function and a framework of validation for future experimental work. It is accessible at http://bioinformatics.fafu.edu.cn/geffector.


Subject(s)
Oryza , Bayes Theorem , Heterotrimeric GTP-Binding Proteins , Plant Proteins , Support Vector Machine
19.
Int J Mol Sci ; 17(11)2016 Nov 11.
Article in English | MEDLINE | ID: mdl-27845739

ABSTRACT

Single nucleotide polymorphisms (SNPs) are widely used in functional genomics and genetics research work. The high-quality sequence of rice genome has provided a genome-wide SNP and proteome resource. However, the impact of SNPs on protein phosphorylation status in rice is not fully understood. In this paper, we firstly updated rice SNP resource based on the new rice genome Ver. 7.0, then systematically analyzed the potential impact of Non-synonymous SNPs (nsSNPs) on the protein phosphorylation status. There were 3,897,312 SNPs in Ver. 7.0 rice genome, among which 9.9% was nsSNPs. Whilst, a total 2,508,261 phosphorylated sites were predicted in rice proteome. Interestingly, we observed that 150,197 (39.1%) nsSNPs could influence protein phosphorylation status, among which 52.2% might induce changes of protein kinase (PK) types for adjacent phosphorylation sites. We constructed a database, SNP_rice, to deposit the updated rice SNP resource and phosSNPs information. It was freely available to academic researchers at http://bioinformatics.fafu.edu.cn. As a case study, we detected five nsSNPs that potentially influenced heterotrimeric G proteins phosphorylation status in rice, indicating that genetic polymorphisms showed impact on the signal transduction by influencing the phosphorylation status of heterotrimeric G proteins. The results in this work could be a useful resource for future experimental identification and provide interesting information for better rice breeding.


Subject(s)
Heterotrimeric GTP-Binding Proteins/metabolism , Oryza/genetics , Plant Proteins/metabolism , Polymorphism, Single Nucleotide , Protein Processing, Post-Translational , Amino Acid Sequence , Chromosomes, Plant/genetics , Genes, Plant , Genetic Association Studies , Genetic Loci , Oryza/metabolism , Phosphorylation , Plant Breeding , Signal Transduction
20.
J Bioinform Comput Biol ; 14(6): 1650033, 2016 12.
Article in English | MEDLINE | ID: mdl-27696927

ABSTRACT

During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90.95% and 0.80, respectively. SVM algorithm combined with 19 chemical compounds selected by R-PSO achieved the best assessment performance of the aromatic quality of tobacco leaves, where the PCC and MSE were 0.594 and 0.263, respectively. Finally, we constructed two online tools to classify the fragrant style and evaluate the aromatic quality of flue-cured tobacco leaf samples. These tools can be accessed at http://bioinformatics.fafu.edu.cn/tobacco .


Subject(s)
Algorithms , Machine Learning , Nicotiana/chemistry , Odorants/analysis , Plant Leaves/chemistry , Volatile Organic Compounds/analysis , Gas Chromatography-Mass Spectrometry/methods , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL
...