Search | VHL Regional Portal

1.

Unveiling orphan receptor-like kinases in plants: novel client discovery using high-confidence library predictions in the Kinase-Client (KiC) assay.

Jorge, Gabriel Lemes; Kim, Daewon; Xu, Chunhui; Cho, Sung-Hwan; Su, Lingtao; Xu, Dong; Bartley, Laura E; Stacey, Gary; Thelen, Jay J.

Front Plant Sci ; 15: 1372361, 2024.

Article in English | MEDLINE | ID: mdl-38633461

ABSTRACT

Plants are remarkable in their ability to adapt to changing environments, with receptor-like kinases (RLKs) playing a pivotal role in perceiving and transmitting environmental cues into cellular responses. Despite extensive research on RLKs from the plant kingdom, the function and activity of many kinases, i.e., their substrates or "clients", remain uncharted. To validate a novel client prediction workflow and learn more about an important RLK, this study focuses on P2K1 (DORN1), which acts as a receptor for extracellular ATP (eATP), playing a crucial role in plant stress resistance and immunity. We designed a Kinase-Client (KiC) assay library of 225 synthetic peptides, incorporating previously identified P2K phosphorylated peptides and novel predictions from a deep-learning phosphorylation site prediction model (MUsite) and a trained hidden Markov model (HMM) based tool, HMMER. Screening the library against purified P2K1 cytosolic domain (CD), we identified 46 putative substrates, including 34 novel clients, 27 of which may be novel peptides, not previously identified experimentally. Gene Ontology (GO) analysis among phosphopeptide candidates revealed proteins associated with important biological processes in metabolism, structure development, and response to stress, as well as molecular functions of kinase activity, catalytic activity, and transferase activity. We offer selection criteria for efficient further in vivo experiments to confirm these discoveries. This approach not only expands our knowledge of P2K1's substrates and functions but also highlights effective prediction algorithms for identifying additional potential substrates. Overall, the results support use of the KiC assay as a valuable tool in unraveling the complexities of plant phosphorylation and provide a foundation for predicting the phosphorylation landscape of plant species based on peptide library results.

2.

High-Resolution Translatome Analysis Reveals Cortical Cell Programs During Early Soybean Nodulation.

Song, Jae Hyo; Montes-Luz, Bruna; Tadra-Sfeir, Michelle Zibetti; Cui, Yaya; Su, Lingtao; Xu, Dong; Stacey, Gary.

Front Plant Sci ; 13: 820348, 2022.

Article in English | MEDLINE | ID: mdl-35498680

ABSTRACT

Nodule organogenesis in legumes is regulated temporally and spatially through gene networks. Genome-wide transcriptome, proteomic, and metabolomic analyses have been used previously to define the functional role of various plant genes in the nodulation process. However, while significant progress has been made, most of these studies have suffered from tissue dilution since only a few cells/root regions respond to rhizobial infection, with much of the root non-responsive. To partially overcome this issue, we adopted translating ribosome affinity purification (TRAP) to specifically monitor the response of the root cortex to rhizobial inoculation using a cortex-specific promoter. While previous studies have largely focused on the plant response within the root epidermis (e.g., root hairs) or within developing nodules, much less is known about the early responses within the root cortex, such as in relation to the development of the nodule primordium or growth of the infection thread. We focused on identifying genes specifically regulated during early nodule organogenesis using roots inoculated with Bradyrhizobium japonicum. A number of novel nodulation gene candidates were discovered, as well as soybean orthologs of nodulation genes previously reported in other legumes. The differential cortex expression of several genes was confirmed using a promoter-GUS analysis, and RNAi was used to investigate gene function. Notably, a number of differentially regulated genes involved in phytohormone signaling, including auxin, cytokinin, and gibberellic acid (GA), were also discovered, providing deep insight into phytohormone signaling during early nodule development.

3.

A Multi-Level Iterative Bi-Clustering Method for Discovering miRNA Co-regulation Network of Abiotic Stress Tolerance in Soybeans.

Chang, Haowu; Zhang, Hao; Zhang, Tianyue; Su, Lingtao; Qin, Qing-Ming; Li, Guihua; Li, Xueqing; Wang, Li; Zhao, Tianheng; Zhao, Enshuang; Zhao, Hengyi; Liu, Yuanning; Stacey, Gary; Xu, Dong.

Front Plant Sci ; 13: 860791, 2022.

Article in English | MEDLINE | ID: mdl-35463453

ABSTRACT

Although growing evidence shows that microRNA (miRNA) regulates plant growth and development, miRNA regulatory networks in plants are not well understood. Current experimental studies cannot characterize miRNA regulatory networks on a large scale. This information gap provides an excellent opportunity to employ computational methods for global analysis and generate valuable models and hypotheses. To address this opportunity, we collected miRNA-target interactions (MTIs) and used MTIs from Arabidopsis thaliana and Medicago truncatula to predict homologous MTIs in soybeans, resulting in 80,235 soybean MTIs in total. A multi-level iterative bi-clustering method was developed to identify 483 soybean miRNA-target regulatory modules (MTRMs). Furthermore, we collected soybean miRNA expression data and corresponding gene expression data in response to abiotic stresses. By clustering these data, 37 MTRMs related to abiotic stresses were identified, including stress-specific MTRMs and shared MTRMs. These MTRMs have gene ontology (GO) enrichment in resistance response, iron transport, positive growth regulation, etc. Our study predicts soybean MTRMs and miRNA-GO networks under different stresses, and provides miRNA targeting hypotheses for experimental analyses. The method can be applied to other biological processes and other plants to elucidate miRNA co-regulation mechanisms.

4.

Large-Scale Integrative Analysis of Soybean Transcriptome Using an Unsupervised Autoencoder Model.

Su, Lingtao; Xu, Chunhui; Zeng, Shuai; Su, Li; Joshi, Trupti; Stacey, Gary; Xu, Dong.

Front Plant Sci ; 13: 831204, 2022.

Article in English | MEDLINE | ID: mdl-35310659

ABSTRACT

Plant tissues are distinguished by their gene expression patterns, which can help identify tissue-specific highly expressed genes and their differential functional modules. For this purpose, large-scale soybean transcriptome samples were collected and processed starting from raw sequencing reads in a uniform analysis pipeline. To address the gene expression heterogeneity in different tissues, we utilized an adversarial deconfounding autoencoder (AD-AE) model to map gene expressions into a latent space and adapted a standard unsupervised autoencoder (AE) model to help effectively extract meaningful biological signals from the noisy data. As a result, four groups of 1,743, 914, 2,107, and 1,451 genes were found highly expressed specifically in leaf, root, seed and nodule tissues, respectively. To obtain key transcription factors (TFs), hub genes and their functional modules in each tissue, we constructed tissue-specific gene regulatory networks (GRNs), and differential correlation networks by using corrected and compressed gene expression data. We validated our results from the literature and gene enrichment analysis, which confirmed many identified tissue-specific genes. Our study represents the largest gene expression analysis in soybean tissues to date. It provides valuable targets for tissue-specific research and helps uncover broader biological patterns. Code is publicly available with open source at https://github.com/LingtaoSu/SoyMeta.

5.

Evolutionary Dynamics of Indels in SARS-CoV-2 Spike Glycoprotein.

Rao, R Shyama Prasad; Ahsan, Nagib; Xu, Chunhui; Su, Lingtao; Verburgt, Jacob; Fornelli, Luca; Kihara, Daisuke; Xu, Dong.

Evol Bioinform Online ; 17: 11769343211064616, 2021.

Article in English | MEDLINE | ID: mdl-34898980

ABSTRACT

SARS-CoV-2, responsible for the current COVID-19 pandemic that claimed over 5.0 million lives, belongs to a class of enveloped viruses that undergo quick evolutionary adjustments under selection pressure. Numerous variants have emerged in SARS-CoV-2, posing a serious challenge to the global vaccination effort and COVID-19 management. The evolutionary dynamics of this virus are only beginning to be explored. In this work, we have analysed 1.79 million spike glycoprotein sequences of SARS-CoV-2 and found that the virus is fine-tuning the spike with numerous amino acid insertions and deletions (indels). Indels seem to have a selective advantage as the proportions of sequences with indels steadily increased over time, currently at over 89%, with similar trends across countries/variants. There were as many as 420 unique indel positions and 447 unique combinations of indels. Despite their high frequency, indels resulted in only minimal alteration of N-glycosylation sites, including both gain and loss. As indels and point mutations are positively correlated and sequences with indels have significantly more point mutations, they have implications in the evolutionary dynamics of the SARS-CoV-2 spike glycoprotein.

6.

Detecting Cancer Survival Related Gene Markers Based on Rectified Factor Network.

Su, Lingtao; Liu, Guixia; Wang, Juexin; Gao, Jianjiong; Xu, Dong.

Front Bioeng Biotechnol ; 8: 349, 2020.

Article in English | MEDLINE | ID: mdl-32426342

ABSTRACT

Detecting gene sets that serve as biomarkers for differentiating patient survival groups may help diagnose diseases robustly and develop multi-gene targeted therapies. However, due to the exponential growth of search space imposed by gene combinations, the performance of existing methods is still far from satisfactory. In this study, we developed a new method called BISG (BIclustering based Survival-related Gene sets detection) based on a rectified factor network (RFN) model, which allows efficiently biclustering gene subsets. By correlating genes in each significant bicluster with patient survival outcomes using a log-rank test and multi-sampling strategy, multiple survival-related gene sets can be detected. We applied BISG on three different cancer types, and the resulting gene sets were tested as biomarkers for survival analyses. Secondly, we systematically analyzed 12 different cancer datasets. Our analysis shows that the genes in all the survival-related gene sets are mainly from five gene families: microRNA protein coding host genes, zinc fingers C2H2-type, solute carriers, CD (cluster of differentiation) molecules, and ankyrin repeat domain containing genes. Moreover, we found that they are mainly enriched in heme metabolism, apoptosis, hypoxia and inflammatory response-related pathways. We compared BISG with two other methods, GSAS and IPSOV. Results show that BISG can better differentiate patient survival groups in different datasets. The identified biomarkers suggested by our study provide useful hypotheses for further investigation. BISG is publicly available with open source at https://github.com/LingtaoSu/BISG.

7.

A rectified factor network based biclustering method for detecting cancer-related coding genes and miRNAs, and their interactions.

Su, Lingtao; Liu, Guixia; Wang, Juexin; Xu, Dong.

Methods ; 166: 22-30, 2019 08 15.

Article in English | MEDLINE | ID: mdl-31121299

ABSTRACT

Detecting cancer-related genes and their interactions is a crucial task in cancer research. For this purpose, we proposed an efficient method, to detect coding genes, microRNAs (miRNAs), and their interactions related to a particular cancer or a cancer subtype using their expression data from the same set of samples. Firstly, biclusters specific to a particular type of cancer are detected based on rectified factor networks and ranked according to their associations with general cancers. Secondly, coding genes and miRNAs in each bicluster are prioritized by considering their differential expression and differential correlation values, protein-protein interaction data, and potential cancer markers. Finally, a rank fusion process is used to obtain the final comprehensive rank by combining multiple ranking results. We applied our proposed method on breast cancer datasets. Results show that our method outperforms other methods in detecting breast cancer-related coding genes and miRNAs. Furthermore, our method is very efficient in computing time, which can handle tens of thousands genes/miRNAs and hundreds of patients in hours on a desktop. This work may aid researchers in studying the genetic architecture of complex diseases, and improving the accuracy of diagnosis.

Subject(s)

Breast Neoplasms/genetics , Computational Biology , MicroRNAs/genetics , Algorithms , Breast Neoplasms/pathology , Female , Gene Expression Regulation, Neoplastic/genetics , Gene Regulatory Networks/genetics , Humans , RNA, Messenger/genetics

8.

Predicting overlapping protein complexes based on core-attachment and a local modularity structure.

Wang, Rongquan; Liu, Guixia; Wang, Caixia; Su, Lingtao; Sun, Liyan.

BMC Bioinformatics ; 19(1): 305, 2018 Aug 22.

Article in English | MEDLINE | ID: mdl-30134824

ABSTRACT

BACKGROUND: In recent decades, detecting protein complexes (PCs) from protein-protein interaction networks (PPINs) has been an active area of research. There are a large number of excellent graph clustering methods that work very well for identifying PCs. However, most of existing methods usually overlook the inherent core-attachment organization of PCs. Therefore, these methods have three major limitations we should concern. Firstly, many methods have ignored the importance of selecting seed, especially without considering the impact of overlapping nodes as seed nodes. Thus, there may be false predictions. Secondly, PCs are generally supposed to be dense subgraphs. However, the subgraphs with high local modularity structure usually correspond to PCs. Thirdly, a number of available methods lack handling noise mechanism, and miss some peripheral proteins. In summary, all these challenging issues are very important for predicting more biological overlapping PCs. RESULTS: In this paper, to overcome these weaknesses, we propose a clustering method by core-attachment and local modularity structure, named CALM, to detect overlapping PCs from weighted PPINs with noises. Firstly, we identify overlapping nodes and seed nodes. Secondly, for a node, we calculate the support function between a node and a cluster. In CALM, a cluster which initially consists of only a seed node, is extended by adding its direct neighboring nodes recursively according to the support function, until this cluster forms a locally optimal modularity subgraph. Thirdly, we repeat this process for the remaining seed nodes. Finally, merging and removing procedures are carried out to obtain final predicted clusters. The experimental results show that CALM outperforms other classical methods, and achieves ideal overall performance. Furthermore, CALM can match more complexes with a higher accuracy and provide a better one-to-one mapping with reference complexes in all test datasets. Additionally, CALM is robust against the high rate of noise PPIN. CONCLUSIONS: By considering core-attachment and local modularity structure, CALM could detect PCs much more effectively than some representative methods. In short, CALM could potentially identify previous undiscovered overlapping PCs with various density and high modularity.

Subject(s)

Algorithms , Protein Interaction Mapping/methods , Cluster Analysis , Databases, Protein , Protein Interaction Maps , Proteins/chemistry

9.

MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization.

Su, Lingtao; Liu, Guixia; Bai, Tian; Meng, Xiangyu; Ma, Qingshan.

BMC Bioinformatics ; 19(1): 215, 2018 06 05.

Article in English | MEDLINE | ID: mdl-29871590

ABSTRACT

BACKGROUND: Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. RESULTS: In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. CONCLUSIONS: This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy.

Subject(s)

Algorithms , Gene Regulatory Networks , Genes, Neoplasm , Breast Neoplasms/genetics , Female , Gene Ontology , Heuristics , Humans , Male , Prostatic Neoplasms/genetics

10.

LPRP: A Gene-Gene Interaction Network Construction Algorithm and Its Application in Breast Cancer Data Analysis.

Su, Lingtao; Meng, Xiangyu; Ma, Qingshan; Bai, Tian; Liu, Guixia.

Interdiscip Sci ; 10(1): 131-142, 2018 Mar.

Article in English | MEDLINE | ID: mdl-27640171

ABSTRACT

The importance of the construction of gene-gene interaction (GGI) network to better understand breast cancer has previously been highlighted. In this study, we propose a novel GGI network construction method called linear and probabilistic relations prediction (LPRP) and used it for gaining system level insight into breast cancer mechanisms. We construct separate genome-wide GGI networks for tumor and normal breast samples, respectively, by applying LPRP on their gene expression datasets profiled by The Cancer Genome Atlas. According to our analysis, a large loss of gene interactions in the tumor GGI network was observed (7436; 88.7 % reduction), which also contained fewer functional genes (4757; 32 % reduction) than the normal network. Tumor GGI network was characterized by a bigger network diameter and a longer characteristic path length but a smaller clustering coefficient and much sparse network connections. In addition, many known cancer pathways, especially immune response pathways, are enriched by genes in the tumor GGI network. Furthermore, potential cancer genes are filtered in this study, which may act as drugs targeting genes. These findings will allow for a better understanding of breast cancer mechanisms.

Subject(s)

Algorithms , Breast Neoplasms/genetics , Epistasis, Genetic , Gene Regulatory Networks , Statistics as Topic , Cluster Analysis , Down-Regulation/genetics , Female , Gene Expression Regulation, Neoplastic , Humans , Molecular Sequence Annotation , Software , Up-Regulation/genetics

11.

OMPcontact: An Outer Membrane Protein Inter-Barrel Residue Contact Prediction Method.

Zhang, Li; Wang, Han; Yan, Lun; Su, Lingtao; Xu, Dong.

J Comput Biol ; 24(3): 217-228, 2017 Mar.

Article in English | MEDLINE | ID: mdl-27513917

ABSTRACT

In the two transmembrane protein types, outer membrane proteins (OMPs) perform diverse important biochemical functions, including substrate transport and passive nutrient uptake and intake. Hence their 3D structures are expected to reveal these functions. Because experimental structures are scarce, predicted 3D structures are more adapted to OMP research instead, and the inter-barrel residue contact is becoming one of the most remarkable features, improving prediction accuracy by describing the structural information of OMPs. To predict OMP structures accurately, we explored an OMP inter-barrel residue contact prediction method: OMPcontact. Multiple OMP-specific features were integrated in the method, including residue evolutionary covariation, topology-based transmembrane segment relative residue position, OMP lipid layer accessibility, and residue evolution conservation. These features describe the properties of a residue pair in different respects: sequential, structural, evolutionary, and biochemical. Within a 3-residues slide window, a Support Vector Machine (SVM) could accurately determinate the inter-barrel contact residue pair using above features. A 5-fold cross-valuation process was applied in testing the OMPcontact performance against a non-redundant OMP set with 75 samples inside. The tests compared four evolutionary covariation methods and screen analyzed the adaptive ones for inter-barrel contact prediction. The results showed our method not only efficiently realized the prediction, but also scored the possibility for residue pairs reliably. This is expected to improve OMP tertiary structure prediction. Therefore, OMPcontact will be helpful in compiling a structural census of outer membrane protein.

Subject(s)

Bacterial Outer Membrane Proteins/chemistry , Membrane Lipids/chemistry , Protein Interaction Domains and Motifs , Support Vector Machine , Biological Transport , Databases, Protein , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Structure, Tertiary

12.

Research on single nucleotide polymorphisms interaction detection from network perspective.

Su, Lingtao; Liu, Guixia; Wang, Han; Tian, Yuan; Zhou, Zhihui; Han, Liang; Yan, Lun.

PLoS One ; 10(3): e0119146, 2015.

Article in English | MEDLINE | ID: mdl-25763929

ABSTRACT

Single Nucleotide Polymorphisms (SNPs) found in Genome-Wide Association Study (GWAS) mainly influence the susceptibility of complex diseases, but they still could not comprehensively explain the relationships between mutations and diseases. Interactions between SNPs are considered so important for deeply understanding of those relationships that several strategies have been proposed to explore such interactions. However, part of those methods perform poorly when marginal effects of disease loci are weak or absent, others may lack of considering high-order SNPs interactions, few methods have achieved the requirements in both performance and accuracy. Considering the above reasons, not only low-order, but also high-order SNP interactions as well as main-effect SNPs, should be taken into account in detection methods under an acceptable computational complexity. In this paper, a new pairwise (or low-order) interaction detection method IG (Interaction Gain) is introduced, in which disease models are not required and parallel computing is utilized. Furthermore, high-order SNP interactions were proposed to be detected by finding closely connected function modules of the network constructed from IG detection results. Tested by a wide range of simulated datasets and four WTCCC real datasets, the proposed methods accurately detected both low-order and high-order SNP interactions as well as disease-associated main-effect SNPS and it surpasses all competitors in performances. The research will advance complex diseases research by providing more reliable SNP interactions.

Subject(s)

Computational Biology/methods , Models, Genetic , Polymorphism, Single Nucleotide , Algorithms , Epistasis, Genetic , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans

13.

GECluster: a novel protein complex prediction method.

Su, Lingtao; Liu, Guixia; Wang, Han; Tian, Yuan; Zhou, Zhihui; Han, Liang; Yan, Lun.

Biotechnol Biotechnol Equip ; 28(4): 753-761, 2014 Jul 04.

Article in English | MEDLINE | ID: mdl-26019559

ABSTRACT

Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein-protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL