Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 174: 108392, 2024 May.
Article in English | MEDLINE | ID: mdl-38608321

ABSTRACT

Proteins must be sorted to specific subcellular compartments to perform their functions. Abnormal protein subcellular localizations are related to many diseases. Although many efforts have been made in predicting protein subcellular localization from various static information, including sequences, structures and interactions, such static information cannot predict protein mis-localization events in diseases. On the contrary, the IHC (immunohistochemistry) images, which have been widely applied in clinical diagnosis, contains information that can be used to find protein mis-localization events in disease states. In this study, we create the Vislocas method, which is capable of finding mis-localized proteins from IHC images as markers of cancer subtypes. By combining CNNs and vision transformer encoders, Vislocas can automatically extract image features at both global and local level. Vislocas can be trained with full-sized IHC images from scratch. It is the first attempt to create an end-to-end IHC image-based protein subcellular location predictor. Vislocas achieved comparable or better performances than state-of-the-art methods. We applied Vislocas to find significant protein mis-localization events in different subtypes of glioma, melanoma and skin cancer. The mis-localized proteins, which were found purely from IHC images by Vislocas, are in consistency with clinical or experimental results in literatures. All codes of Vislocas have been deposited in a Github repository (https://github.com/JingwenWen99/Vislocas). All datasets of Vislocas have been deposited in Zenodo (https://zenodo.org/records/10632698).


Subject(s)
Immunohistochemistry , Humans , Neoplasms/metabolism , Neoplasms/classification , Neoplasms/pathology , Neoplasm Proteins/metabolism , Biomarkers, Tumor/metabolism , Image Processing, Computer-Assisted/methods
2.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-36920063

ABSTRACT

Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).


Subject(s)
MicroRNAs , Neoplasms , RNA, Long Noncoding , Humans , Animals , Mice , Protein Interaction Maps , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , MicroRNAs/metabolism , Neural Networks, Computer
3.
Comput Biol Med ; 157: 106775, 2023 05.
Article in English | MEDLINE | ID: mdl-36921458

ABSTRACT

The aberrant protein sorting has been observed in many conditions, including complex diseases, drug treatments, and environmental stresses. It is important to systematically identify protein mis-localization events in a given condition. Experimental methods for finding mis-localized proteins are always costly and time consuming. Predicting protein subcellular localizations has been studied for many years. However, only a handful of existing works considered protein subcellular location alterations. We proposed a computational method for identifying alterations of protein subcellular locations under drug treatments. We took three drugs, including TSA (trichostain A), bortezomib and tacrolimus, as instances for this study. By introducing dynamic protein-protein interaction networks, graph neural network algorithms were applied to aggregate topological information under different conditions. We systematically reported potential protein mis-localization events under drug treatments. As far as we know, this is the first attempt to find protein mis-localization events computationally in drug treatment conditions. Literatures validated that a number of proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. We name our method as PLA-GNN (Protein Localization Alteration by Graph Neural Networks). It can be extended to other drugs and other conditions. All datasets and codes of this study has been deposited in a GitHub repository (https://github.com/quinlanW/PLA-GNN).


Subject(s)
Algorithms , Neural Networks, Computer , Proteins/metabolism , Protein Interaction Maps , Polyesters/metabolism
4.
Interdiscip Sci ; 15(3): 433-438, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37000408

ABSTRACT

Over the last few years, an increasing number of protein mis-localization events have been reported under various conditions. It is important to understand these events and their relationship with complex disorders. Although many efforts had been made in establishing models with statistical or machine learning algorithms, a comprehensive database resource is still missing. Since the records of experimental-validated protein mis-localization events spread across many literatures, a collection of all these reports in a unique website is demanded. In this paper, we created the dbMisLoc database by manually curating conditional protein mis-localization events from various literatures. The dbMisLoc database records the protein localizations, mis-localizations, conditions for mis-localization, and the original reports. The dbMisLoc database allows the users to intuitively view, search, visualize and download protein mis-localization records. The dbMisLoc database integrates a BLAST search engine, which can search mis-localized proteins that are similar to user queries. The dbMisLoc database can be accessed directly through ( https://dbml.pufengdu.org ). The source code of dbMisLoc database is available from the GitHub repository ( https://github.com/quinlanW/dbMisLoc ) for free. Users can host their own mirrors of dbMisLoc database on their own servers. dbMisLoc is database for manually curated protein mis-localization events. It contains mis-localization events in 14 categories of conditions such as diseases, drug treatments and environmental stresses.


Subject(s)
Proteins , Software , Proteins/metabolism , Algorithms , Databases, Factual , Machine Learning
5.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38168841

ABSTRACT

Silencers are repressive cis-regulatory elements that play crucial roles in transcriptional regulation. Experimental methods for identifying silencers are always costly and time-consuming. Computational methods, which relies on genomic sequence features, have been introduced as alternative approaches. However, silencers do not have significant epigenomic signature. Therefore, we explore a new way to computationally identify silencers, by incorporating chromatin structural information. We propose the SilenceREIN method, which focuses on finding silencers on anchors of chromatin loops. By using graph neural networks, we extracted chromatin structural information from a regulatory element interaction network. SilenceREIN integrated the chromatin structural information with linear genomic signatures to find silencers. The predictive performance of SilenceREIN is comparable or better than other states-of-the-art methods. We performed a genome-wide scanning to systematically find silencers in human genome. Results suggest that silencers are widespread on anchors of chromatin loops. In addition, enrichment analysis of transcription factor binding motif support our prediction results. As far as we can tell, this is the first attempt to incorporate chromatin structural information in finding silencers. All datasets and source codes of SilenceREIN have been deposited in a GitHub repository (https://github.com/JianHPan/SilenceREIN).


Subject(s)
Chromatin , Silencer Elements, Transcriptional , Humans , Chromatin/genetics , Regulatory Sequences, Nucleic Acid , Genome, Human , Neural Networks, Computer
6.
Comput Struct Biotechnol J ; 20: 2657-2663, 2022.
Article in English | MEDLINE | ID: mdl-35685362

ABSTRACT

Long non-coding RNAs (lncRNAs) play important roles in many biological processes. Knocking out or knocking down some lncRNAs will lead to lethality or infertility. These lncRNAs are called essential lncRNAs. Knowledges of essential lncRNAs are important in establishing minimal genomes of living cells, developing drug therapies and early diagnostic approaches for complex diseases. However, existing databases focus on collecting essential coding genes. Essential non-coding gene records are rare in existing databases. A comprehensive collection of essential non-coding genes, particularly essential lncRNA genes, is demanded. We manually curated 207 essential lncRNAs from literatures for establishing a database on essential lncRNAs, which is named as dbEssLnc (Database of essential lncRNAs). The dbEssLnc database has a web-based user-friendly interface for the users to browse, to search, to visualize and to blast search records in the database. The dbEssLnc database is freely accessible at https://esslnc.pufengdu.org. All data and source codes for mirroring the dbEssLnc database have been deposited in GitHub (https://github.com/yyZhang14/dbEssLnc).

7.
Front Genet ; 13: 896925, 2022.
Article in English | MEDLINE | ID: mdl-35591855

ABSTRACT

5-Hydroxymethylcytosine (5hmC), one of the most important RNA modifications, plays an important role in many biological processes. Accurately identifying RNA modification sites helps understand the function of RNA modification. In this work, we propose a computational method for identifying 5hmC-modified regions using machine learning algorithms. We applied a sequence feature embedding method based on the dna2vec algorithm to represent the RNA sequence. The results showed that the performance of our model is better that of than state-of-art methods. All dataset and source codes used in this study are available at: https://github.com/liu-h-y/5hmC_model.

8.
Front Genet ; 13: 877409, 2022.
Article in English | MEDLINE | ID: mdl-35419029

ABSTRACT

MicroRNAs (miRNAs) play vital roles in gene expression regulations. Identification of essential miRNAs is of fundamental importance in understanding their cellular functions. Experimental methods for identifying essential miRNAs are always costly and time-consuming. Therefore, computational methods are considered as alternative approaches. Currently, only a handful of studies are focused on predicting essential miRNAs. In this work, we proposed to predict essential miRNAs using the XGBoost framework with CART (Classification and Regression Trees) on various types of sequence-based features. We named this method as XGEM (XGBoost for essential miRNAs). The prediction performance of XGEM is promising. In comparison with other state-of-the-art methods, XGEM performed the best, indicating its potential in identifying essential miRNAs.

9.
Front Genet ; 13: 864564, 2022.
Article in English | MEDLINE | ID: mdl-35386279

ABSTRACT

Long noncoding RNAs (lncRNAs) play important roles in a variety of biological processes. Knocking out or knocking down some lncRNA genes can lead to death or infertility. These lncRNAs are called essential lncRNAs. Identifying the essential lncRNA is of importance for complex disease diagnosis and treatments. However, experimental methods for identifying essential lncRNAs are always costly and time consuming. Therefore, computational methods can be considered as an alternative approach. We propose a method to identify essential lncRNAs by combining network centrality measures and lncRNA sequence information. By constructing a lncRNA-protein-protein interaction network, we measure the essentiality of lncRNAs from their role in the network and their sequence together. We name our method as the systematic gene importance index (SGII). As far as we can tell, this is the first attempt to identify essential lncRNAs by combining sequence and network information together. The results of our method indicated that essential lncRNAs have similar roles in the LPPI network as the essential coding genes in the PPI network. Another encouraging observation is that the network information can significantly boost the predictive performance of sequence-based method. All source code and dataset of SGII have been deposited in a GitHub repository (https://github.com/ninglolo/SGII).

10.
Comput Struct Biotechnol J ; 20: 1345-1351, 2022.
Article in English | MEDLINE | ID: mdl-35356545

ABSTRACT

The application of network pharmacology has greatly promoted the scientific interpretation of disease treatment mechanism of traditional Chinese medicine (TCM). However, the data required by network pharmacology analysis were scattered in different resources. In the present work, by integrating and reorganizing the data from multiple resources, we developed the intelligent network pharmacology platform unique for traditional Chinese medicine, called INPUT (http://cbcb.cdutcm.edu.cn/INPUT/), for automatically performing network pharmacology analysis. Besides the curated data collected from multiple resources, a series of bioinformatics tools for network pharmacology analysis were also embedded in INPUT, which makes it become the first automatic platform able to explore the disease treatment mechanisms of TCM. With the built-in tools, researchers can also analyze their own in-house data and obtain the results of pivotal ingredients, GO and KEGG pathway, protein-protein interactions, etc. In addition, as a proof-of-principle, INPUT was applied to decipher the antidepressant mechanism of a commonly used prescription. In summary, INPUT is a powerful platform for network pharmacology analysis and will facilitate the researches on drug discovery.

11.
Curr Gene Ther ; 22(3): 228-244, 2022.
Article in English | MEDLINE | ID: mdl-34254917

ABSTRACT

Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semisupervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


Subject(s)
RNA, Long Noncoding , Computational Biology/methods , Gene Expression Regulation , Machine Learning , Proteins/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
12.
IEEE J Biomed Health Inform ; 26(4): 1861-1871, 2022 04.
Article in English | MEDLINE | ID: mdl-34699377

ABSTRACT

ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice.


Subject(s)
Computational Biology , RNA, Untranslated , Computational Biology/methods , Humans , RNA, Untranslated/metabolism
13.
Genomics ; 113(6): 4052-4060, 2021 11.
Article in English | MEDLINE | ID: mdl-34666191

ABSTRACT

Super-enhancer (SE) is a cluster of active typical enhancers (TE) with high levels of the Mediator complex, master transcriptional factors, and chromatin regulators. SEs play a key role in the control of cell identity and disease. Traditionally, scientists used a variety of high-throughput data of different transcriptional factors or chromatin marks to distinguish SEs from TEs. This kind of experimental methods are usually costly and time-consuming. In this paper, we proposed a model DeepSE, which is based on a deep convolutional neural network model, to distinguish the SEs from TEs. DeepSE represent the DNA sequences using the dna2vec feature embeddings. With only the DNA sequence information, DeepSE outperformed all state-of-the-art methods. In addition, DeepSE can be generalized well across different cell lines, which implied that cell-type specific SEs may share hidden sequence patterns across different cell lines. The source code and data are stored in GitHub (https://github.com/QiaoyingJi/DeepSE).


Subject(s)
Chromatin , Enhancer Elements, Genetic , Cell Line , Chromatin/genetics , Neural Networks, Computer , Transcription Factors/genetics , Transcription Factors/metabolism
14.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33822882

ABSTRACT

Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).


Subject(s)
Deep Learning , Proteins/metabolism , RNA, Untranslated/metabolism , Software , Benchmarking , Datasets as Topic , Humans , Internet , Protein Binding , Proteins/genetics , RNA, Untranslated/genetics , Sensitivity and Specificity
15.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33147622

ABSTRACT

With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.


Subject(s)
DNA/genetics , Databases, Nucleic Acid , High-Throughput Nucleotide Sequencing , Nucleotides/genetics , RNA/genetics , Software , Genomics
16.
Front Pharmacol ; 12: 784171, 2021.
Article in English | MEDLINE | ID: mdl-35095495

ABSTRACT

Drug repositioning provides a promising and efficient strategy to discover potential associations between drugs and diseases. Many systematic computational drug-repositioning methods have been introduced, which are based on various similarities of drugs and diseases. In this work, we proposed a new computational model, DDA-SKF (drug-disease associations prediction using similarity kernels fusion), which can predict novel drug indications by utilizing similarity kernel fusion (SKF) and Laplacian regularized least squares (LapRLS) algorithms. DDA-SKF integrated multiple similarities of drugs and diseases. The prediction performances of DDA-SKF are better, or at least comparable, to all state-of-the-art methods. The DDA-SKF can work without sufficient similarity information between drug indications. This allows us to predict new purpose for orphan drugs. The source code and benchmarking datasets are deposited in a GitHub repository (https://github.com/GCQ2119216031/DDA-SKF).

17.
Front Genet ; 11: 615144, 2020.
Article in English | MEDLINE | ID: mdl-33362868

ABSTRACT

Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).

18.
Front Genet ; 11: 600454, 2020.
Article in English | MEDLINE | ID: mdl-33193746

ABSTRACT

Eukaryotic cells contain numerous components, which are known as subcellular compartments or subcellular organelles. Proteins must be sorted to proper subcellular compartments to carry out their molecular functions. Mis-localized proteins are related to various cancers. Identifying mis-localized proteins is important in understanding the pathology of cancers and in developing therapies. However, experimental methods, which are used to determine protein subcellular locations, are always costly and time-consuming. We tried to identify cancer-related mis-localized proteins in three different cancers using computational approaches. By integrating gene expression profiles and dynamic protein-protein interaction networks, we established DPPN-SVM (Dynamic Protein-Protein Network with Support Vector Machine), a predictive model using the SVM classifier with diffusion kernels. With this predictive model, we identified a number of mis-localized proteins. Since we introduced the dynamic protein-protein network, which has never been considered in existing works, our model is capable of identifying more mis-localized proteins than existing studies. As far as we know, this is the first study to incorporate dynamic protein-protein interaction network in identifying mis-localized proteins in cancers.

19.
Mol Ther Nucleic Acids ; 21: 1044-1049, 2020 Sep 04.
Article in English | MEDLINE | ID: mdl-32858457

ABSTRACT

N6-methyladenosine (m6A) is the most abundant post-transcriptional modification and involves a series of important biological processes. Therefore, accurate detection of the m6A site is very important for revealing its biological functions and impacts on diseases. Although both experimental and computational methods have been proposed for identifying m6A sites, few of them are able to detect m6A sites in different tissues. With the consideration of the spatial specificity of m6A modification, it is necessary to develop methods able to detect the m6A site in different tissues. In this work, by using the convolutional neural network (CNN), we proposed a new method, called im6A-TS-CNN, that can identify m6A sites in brain, liver, kidney, heart, and testis of Homo sapiens, Mus musculus, and Rattus norvegicus. In im6A-TS-CNN, the samples were encoded by using the one-hot encoding scheme. The results from both a 5-fold cross-validation test and independent dataset test demonstrate that im6A-TS-CNN is better than the existing method for the same purpose. The command-line version of im6A-TS-CNN is available at https://github.com/liukeweiaway/DeepM6A_cnn.

20.
Brief Bioinform ; 21(1): 11-23, 2020 Jan 17.
Article in English | MEDLINE | ID: mdl-30239616

ABSTRACT

Cell-penetrating peptides (CPPs) have been shown to be a transport vehicle for delivering cargoes into live cells, offering great potential as future therapeutics. It is essential to identify CPPs for better understanding of their functional mechanisms. Machine learning-based methods have recently emerged as a main approach for computational identification of CPPs. However, one of the main challenges and difficulties is to propose an effective feature representation model that sufficiently exploits the inner difference and relevance between CPPs and non-CPPs, in order to improve the predictive performance. In this paper, we have developed CPPred-FL, a powerful bioinformatics tool for fast, accurate and large-scale identification of CPPs. In our predictor, we introduce a new feature representation learning scheme that enables one to learn feature representations from totally 45 well-trained random forest models with multiple feature descriptors from different perspectives, such as compositional information, position-specific information and physicochemical properties, etc. We integrate class and probabilistic information into our feature representations. To improve the feature representation ability, we further remove redundant and irrelevant features by feature space optimization. Benchmarking experiments showed that CPPred-FL, using 19 informative features only, is able to achieve better performance than the state-of-the-art predictors. We anticipate that CPPred-FL will be a powerful tool for large-scale identification of CPPs, facilitating the characterization of their functional mechanisms and accelerating their applications in clinical therapy.

SELECTION OF CITATIONS
SEARCH DETAIL
...