Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 42
Filter
1.
Article in English | MEDLINE | ID: mdl-38502632

ABSTRACT

Skeleton-based exercise assessment focuses on evaluating the correctness or quality of an exercise performed by a subject. Skeleton data provide two groups of features (i.e., position and orientation), which existing methods have not fully harnessed. We previously proposed an ensemble-based graph convolutional network (EGCN) that considers both position and orientation features to construct a model-based approach. Integrating these types of features achieved better performance than available methods. However, EGCN lacked a fusion strategy across the data, feature, decision, and model levels. In this paper, we present an advanced framework, EGCN++, for rehabilitation exercise assessment. Based on EGCN, a new fusion strategy called MLE-PO is proposed for EGCN++; this technique considers fusion at the data and model levels. We conduct extensive cross-validation experiments and investigate the consistency between machine and human evaluations on three datasets: UI-PRMD, KIMORE, and EHE. Results demonstrate that MLE-PO outperforms other EGCN ensemble strategies and representative baselines. Furthermore, the MLE-PO's model evaluation scores are more quantitatively consistent with clinical evaluations than other ensemble strategies.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3522-3538, 2023 03.
Article in English | MEDLINE | ID: mdl-35617191

ABSTRACT

Human action recognition (HAR) in RGB-D videos has been widely investigated since the release of affordable depth sensors. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements with increasingly larger datasets. However, multimodal methods specifically with model-level fusion have seldom been investigated. In this article, we propose a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a model-based approach. The objective of our method is to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. For the model-based fusion scheme, we use a spatiotemporal graph convolution network for the skeleton modality to learn attention weights that will be transferred to the network of the RGB modality. Extensive experiments are conducted on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR. We also tested our MMNet on an RGB video dataset Kinetics 400 that contains more outdoor actions, which shows consistent results with those of RGB-D video datasets.


Subject(s)
Algorithms , Pattern Recognition, Automated , Humans , Benchmarking , Human Activities , Learning
3.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2546-2554, 2021.
Article in English | MEDLINE | ID: mdl-32070992

ABSTRACT

A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.


Subject(s)
Models, Biological , Systems Biology/methods , Computer Simulation , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Pharmaceutical Preparations/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
4.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32633319

ABSTRACT

MOTIVATION: Identifying microRNAs that are associated with different diseases as biomarkers is a problem of great medical significance. Existing computational methods for uncovering such microRNA-diseases associations (MDAs) are mostly developed under the assumption that similar microRNAs tend to associate with similar diseases. Since such an assumption is not always valid, these methods may not always be applicable to all kinds of MDAs. Considering that the relationship between long noncoding RNA (lncRNA) and different diseases and the co-regulation relationships between the biological functions of lncRNA and microRNA have been established, we propose here a multiview multitask method to make use of the known lncRNA-microRNA interaction to predict MDAs on a large scale. The investigation is performed in the absence of complete information of microRNAs and any similarity measurement for it and to the best knowledge, the work represents the first ever attempt to discover MDAs based on lncRNA-microRNA interactions. RESULTS: In this paper, we propose to develop a deep learning model called MVMTMDA that can create a multiview representation of microRNAs. The model is trained based on an end-to-end multitasking approach to machine learning so that, based on it, missing data in the side information can be determined automatically. Experimental results show that the proposed model yields an average area under ROC curve of 0.8410+/-0.018, 0.8512+/-0.012 and 0.8521+/-0.008 when k is set to 2, 5 and 10, respectively. In addition, we also propose here a statistical approach to predicting lncRNA-disease associations based on these associations and the MDA discovered using MVMTMDA. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/yahuang1991polyu/MVMTMDA/.


Subject(s)
Disease/genetics , Machine Learning , MicroRNAs , Models, Genetic , RNA, Long Noncoding , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Predictive Value of Tests , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
5.
J Bioinform Comput Biol ; 18(5): 2050035, 2020 10.
Article in English | MEDLINE | ID: mdl-33064052

ABSTRACT

As interactions among genetic variants in different genes can be an important factor for predicting complex diseases, many computational methods have been proposed to detect if a particular set of genes has interaction with a particular complex disease. However, even though many such methods have been shown to be useful, they can be made more effective if the properties of gene-gene interactions can be better understood. Towards this goal, we have attempted to uncover patterns in gene-gene interactions and the patterns reveal an interesting property that can be reflected in an inequality that describes the relationship between two genotype variables and a disease-status variable. We show, in this paper, that this inequality can be generalized to [Formula: see text] genotype variables. Based on this inequality, we establish a conditional independence and redundancy (CIR)-based definition of gene-gene interaction and the concept of an interaction group. From these new definitions, a novel measure of gene-gene interaction is then derived. We discuss the properties of these concepts and explain how they can be used in a novel algorithm to detect high-order gene-gene interactions. Experimental results using both simulated and real datasets show that the proposed method can be very promising.


Subject(s)
Algorithms , Epistasis, Genetic , Case-Control Studies , Computational Biology/methods , Gene Frequency , Genotype , Hemoglobinopathies/genetics , Humans , Linkage Disequilibrium , Malaria, Falciparum/genetics , Polymorphism, Single Nucleotide , alpha-Globins/genetics
6.
iScience ; 23(7): 101261, 2020 Jul 24.
Article in English | MEDLINE | ID: mdl-32580123

ABSTRACT

Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.

7.
Bioinformatics ; 36(3): 851-858, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31397851

ABSTRACT

MOTIVATION: MicroRNA (miRNA) therapeutics is becoming increasingly important. However, aberrant expression of miRNAs is known to cause drug resistance and can become an obstacle for miRNA-based therapeutics. At present, little is known about associations between miRNA and drug resistance and there is no computational tool available for predicting such association relationship. Since it is known that miRNAs can regulate genes that encode specific proteins that are keys for drug efficacy, we propose here a computational approach, called GCMDR, for finding a three-layer latent factor model that can be used to predict miRNA-drug resistance associations. RESULTS: In this paper, we discuss how the problem of predicting such associations can be formulated as a link prediction problem involving a bipartite attributed graph. GCMDR makes use of the technique of graph convolution to build a latent factor model, which can effectively utilize information of high-dimensional attributes of miRNA/drug in an end-to-end learning scheme. In addition, GCMDR also learns graph embedding features for miRNAs and drugs. We leveraged the data from multiple databases storing miRNA expression profile, drug substructure fingerprints, gene ontology and disease ontology. The test for performance shows that the GCMDR prediction model can achieve AUCs of 0.9301 ± 0.0005, 0.9359 ± 0.0006 and 0.9369 ± 0.0003 based on 2-fold, 5-fold and 10-fold cross validation, respectively. Using this model, we show that the associations between miRNA and drug resistance can be reliably predicted by properly introducing useful side information like miRNA expression profile and drug structure fingerprints. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/yahuang1991polyu/GCMDR/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
MicroRNAs , Algorithms , Area Under Curve , Computational Biology , Drug Resistance
8.
IEEE Trans Cybern ; 50(10): 4318-4331, 2020 Oct.
Article in English | MEDLINE | ID: mdl-31329151

ABSTRACT

Graph clustering, which aims at discovering sets of related vertices in graph-structured data, plays a crucial role in various applications, such as social community detection and biological module discovery. With the huge increase in the volume of data in recent years, graph clustering is used in an increasing number of real-life scenarios. However, the classical and state-of-the-art methods, which consider only single-view features or a single vector concatenating features from different views and neglect the contextual correlation between pairwise features, are insufficient for the task, as features that characterize vertices in a graph are usually from multiple views and the contextual correlation between pairwise features may influence the cluster preference for vertices. To address this challenging problem, we introduce in this paper, a novel graph clustering model, dubbed contextual correlation preserving multiview featured graph clustering (CCPMVFGC) for discovering clusters in graphs with multiview vertex features. Unlike most of the aforementioned approaches, CCPMVFGC is capable of learning a shared latent space from multiview features as the cluster preference for each vertex and making use of this latent space to model the inter-relationship between pairwise vertices. CCPMVFGC uses an effective method to compute the degree of contextual correlation between pairwise vertex features and utilizes view-wise latent space representing the feature-cluster preference to model the computed correlation. Thus, the cluster preference learned by CCPMVFGC is jointly inferred by multiview features, view-wise correlations of pairwise features, and the graph topology. Accordingly, we propose a unified objective function for CCPMVFGC and develop an iterative strategy to solve the formulated optimization problem. We also provide the theoretical analysis of the proposed model, including convergence proof and computational complexity analysis. In our experiments, we extensively compare the proposed CCPMVFGC with both classical and state-of-the-art graph clustering methods on eight standard graph datasets (six multiview and two single-view datasets). The results show that CCPMVFGC achieves competitive performance on all eight datasets, which validates the effectiveness of the proposed model.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1516-1524, 2020.
Article in English | MEDLINE | ID: mdl-31796414

ABSTRACT

Long noncoding RNAs (lncRNAs) is an important class of non-protein coding RNAs. They have recently been found to potentially be able to act as a regulatory molecule in some important biological processes. MicroRNAs (miRNAs) have been confirmed to be closely related to the regulation of various human diseases. Recent studies have suggested that lncRNAs could interact with miRNAs to modulate their regulatory roles. Hence, predicting lncRNA-miRNA interactions are biologically significant due to their potential roles in determining the effectiveness of diagnostic biomarkers and therapeutic targets for various human diseases. For the details of the mechanisms to be better understood, it would be useful if some computational approaches are developed to allow for such investigations. As diverse heterogeneous datasets for describing lncRNA and miRNA have been made available, it becomes more feasible for us to develop a model to describe potential interactions between lncRNAs and miRNAs. In this work, we present a novel computational approach called LMNLMI for such purpose. LMNLMI works in several phases. First, it learns patterns from expression, sequences and functional data. Based on the patterns, it then constructs several networks including an expression-similarity network, a functional-similarity network, and a sequence-similarity network. Based on a measure of similarities between these networks, LMNLMI computes an interaction score for each pair of lncRNA and miRNA in the database. The novelty of LMNLMI lies in the use of a network fusion technique to combine the patterns inherent in multiple similarity networks and a matrix completion technique in predicting interaction relationships. Using a set of real data, we show that LMNLMI can be a very effective approach for the accurate prediction of lncRNA-miRNA interactions.


Subject(s)
Computational Biology/methods , MicroRNAs , RNA, Long Noncoding , Transcriptome/genetics , Databases, Genetic , Disease , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Models, Genetic , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
10.
Bioinformatics ; 36(13): 4038-4046, 2020 07 01.
Article in English | MEDLINE | ID: mdl-31793982

ABSTRACT

MOTIVATION: Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA-disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA-disease associations on a large scale and to provide the most promising candidates for biological experiments. RESULTS: In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA-disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA-disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. AVAILABILITY AND IMPLEMENTATION: The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neural Networks, Computer , RNA, Circular , Algorithms , Humans
11.
Stud Health Technol Inform ; 264: 1480-1481, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438191

ABSTRACT

The low proportion and the rapid evolvement of major adverse cardiac events (MACE) present challenges for predicting MACE by machine learning models. In this paper, we propose a method to predict MACE from large-scale imbalanced EMR data by using a network-based one-class classifier. It only used the reliably known MACE samples to establish the hyperspherical model. Experiments show that our model outperforms the state-of-the-art models.


Subject(s)
Acute Coronary Syndrome , Humans
12.
BMC Bioinformatics ; 19(1): 329, 2018 Sep 18.
Article in English | MEDLINE | ID: mdl-30227829

ABSTRACT

BACKGROUND: Quantitative traits or continuous outcomes related to complex diseases can provide more information and therefore more accurate analysis for identifying gene-gene and gene- environment interactions associated with complex diseases. Multifactor Dimensionality Reduction (MDR) is originally proposed to identify gene-gene and gene- environment interactions associated with binary status of complex diseases. Some efforts have been made to extend it to quantitative traits (QTs) and ordinal traits. However these and other methods are still not computationally efficient or effective. RESULTS: Generalized Fuzzy Quantitative trait MDR (GFQMDR) is proposed in this paper to strengthen identification of gene-gene interactions associated with a quantitative trait by first transforming it to an ordinal trait and then selecting best sets of genetic markers, mainly single nucleotide polymorphisms (SNPs) or simple sequence length polymorphic markers (SSLPs), as having strong association with the trait through generalized fuzzy classification using extended member functions. Experimental results on simulated datasets and real datasets show that our algorithm has better success rate, classification accuracy and consistency in identifying gene-gene interactions associated with QTs. CONCLUSION: The proposed algorithm provides a more effective way to identify gene-gene interactions associated with quantitative traits.


Subject(s)
Computational Biology/methods , Epistasis, Genetic , Fuzzy Logic , Phenotype , Animals , Female , Genetic Markers/genetics , Humans , Mice , Models, Genetic , Polymorphism, Single Nucleotide
13.
Article in English | MEDLINE | ID: mdl-29993661

ABSTRACT

The problem of identifying protein complexes in Protein-Protein Interaction (PPI) networks is usually formulated as the problem of identifying dense regions in such networks. In this paper, we present a novel approach, called TBPCI, to identify protein complexes based instead on the concept of a measure of boundedness. Such a measure is defined as an objective function of a Jaccard Index-based connectedness measure which takes into consideration how much two proteins within a network are connected to each other, and an association measure which takes into consideration how much two connecting proteins are associated based on their attributes found in the Gene Ontology database. Based on the above two measures, the objective function is derived to capture how strong the proteins can be considered as bounded together and the objective value is therefore referred as the aggregated degree of boundedness. To identify protein complexes, TBPCI computes the degree of boundedness between all possible pairwise proteins. Then, TBPCI uses a Breadth-First-Search method to determine whether a protein-pair should be incorporated into the same complex. TBPCI has been tested with several real data sets and the experimental results show it is an effective approach for identifying protein complexes in PPI networks.

14.
IEEE Trans Cybern ; 48(5): 1369-1382, 2018 May.
Article in English | MEDLINE | ID: mdl-28459699

ABSTRACT

An attributed graph contains vertices that are associated with a set of attribute values. Mining clusters or communities, which are interesting subgraphs in the attributed graph is one of the most important tasks of graph analytics. Many problems can be defined as the mining of interesting subgraphs in attributed graphs. Algorithms that discover subgraphs based on predefined topologies cannot be used to tackle these problems. To discover interesting subgraphs in the attributed graph, we propose an algorithm called mining interesting subgraphs in attributed graph algorithm (MISAGA). MISAGA performs its tasks by first using a probabilistic measure to determine whether the strength of association between a pair of attribute values is strong enough to be interesting. Given the interesting pairs of attribute values, then the degree of association is computed for each pair of vertices using an information theoretic measure. Based on the edge structure and degree of association between each pair of vertices, MISAGA identifies interesting subgraphs by formulating it as a constrained optimization problem and solves it by identifying the optimal affiliation of subgraphs for the vertices in the attributed graph. MISAGA has been tested with several large-sized real graphs and is found to be potentially very useful for various applications.

15.
Article in English | MEDLINE | ID: mdl-28029628

ABSTRACT

This paper presents a graph clustering algorithm, called EGCPI, to discover protein complexes in protein-protein interaction (PPI) networks. In performing its task, EGCPI takes into consideration both network topologies and attributes of interacting proteins, both of which have been shown to be important for protein complex discovery. EGCPI formulates the problem as an optimization problem and tackles it with evolutionary clustering. Given a PPI network, EGCPI first annotates each protein with corresponding attributes that are provided in Gene Ontology database. It then adopts a similarity measure to evaluate how similar the connected proteins are taking into consideration the network topology. Given this measure, EGCPI then discovers a number of graph clusters within which proteins are densely connected, based on an evolutionary strategy. At last, EGCPI identifies protein complexes in each discovered cluster based on the homogeneity of attributes performed by pairwise proteins. EGCPI has been tested with several real data sets and the experimental results show EGCPI is very effective on protein complex discovery, and the evolutionary clustering is helpful to identify protein complexes in PPI networks. The software of EGCPI can be downloaded via: https://github.com/hetiantian1985/EGCPI.


Subject(s)
Cluster Analysis , Computational Biology/methods , Protein Interaction Mapping/methods , Protein Interaction Maps/physiology , Proteins , Algorithms , Humans , Proteins/chemistry , Proteins/metabolism , Proteins/physiology , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae Proteins/physiology
16.
Curr Protein Pept Sci ; 19(5): 479-487, 2018.
Article in English | MEDLINE | ID: mdl-27829343

ABSTRACT

BACKGROUND: A basic task in drug discovery is to find new medication in the form of candidate compounds that act on a target protein. In other words, a drug has to interact with a target and such drug-target interaction (DTI) is not expected to be random. Significant and interesting patterns are expected to be hidden in them. If these patterns can be discovered, new drugs are expected to be more easily discoverable. OBJECTIVE: Currently, a number of computational methods have been proposed to predict DTIs based on their similarity. However, such as approach does not allow biochemical features to be directly considered. As a result, some methods have been proposed to try to discover patterns in physicochemical interactions. Since the number of potential negative DTIs are very high both in absolute terms and in comparison to that of the known ones, these methods are rather computationally expensive and they can only rely on subsets, rather than the full set, of negative DTIs for training and validation. As there is always a relatively high chance for negative DTIs to be falsely identified and as only partial subset of such DTIs is considered, existing approaches can be further improved to better predict DTIs. METHOD: In this paper, we present a novel approach, called ODT (one class drug target interaction prediction), for such purpose. One main task of ODT is to discover association patterns between interacting drugs and proteins from the chemical structure of the former and the protein sequence network of the latter. ODT does so in two phases. First, the DTI-network is transformed to a representation by structural properties. Second, it applies a oneclass classification algorithm to build a prediction model based only on known positive interactions. RESULTS: We compared the best AUROC scores of the ODT with several state-of-art approaches on Gold standard data. The prediction accuracy of the ODT is superior in comparison with all the other methods at GPCRs dataset and Ion channels dataset. CONCLUSION: Performance evaluation of ODT shows that it can be potentially useful. It confirms that predicting potential or missing DTIs based on the known interactions is a promising direction to solve problems related to the use of uncertain and unreliable negative samples and those related to the great demand in computational resources.


Subject(s)
Computer Simulation , Models, Molecular , Pharmaceutical Preparations/chemistry , Proteins/chemistry , Algorithms , Drug Discovery/methods , Molecular Structure , Protein Binding , Structure-Activity Relationship
17.
Bioinformatics ; 34(5): 812-819, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29069317

ABSTRACT

Motivation: The interaction of miRNA and lncRNA is known to be important for gene regulations. However, not many computational approaches have been developed to analyze known interactions and predict the unknown ones. Given that there are now more evidences that suggest that lncRNA-miRNA interactions are closely related to their relative expression levels in the form of a titration mechanism, we analyzed the patterns in large-scale expression profiles of known lncRNA-miRNA interactions. From these uncovered patterns, we noticed that lncRNAs tend to interact collaboratively with miRNAs of similar expression profiles, and vice versa. Results: By representing known interaction between lncRNA and miRNA as a bipartite graph, we propose here a technique, called EPLMI, to construct a prediction model from such a graph. EPLMI performs its tasks based on the assumption that lncRNAs that are highly similar to each other tend to have similar interaction or non-interaction patterns with miRNAs and vice versa. The effectiveness of the prediction model so constructed has been evaluated using the latest dataset of lncRNA-miRNA interactions. The results show that the prediction model can achieve AUCs of 0.8522 and 0.8447 ± 0.0017 based on leave-one-out cross validation and 5-fold cross validation. Using this model, we show that lncRNA-miRNA interactions can be reliably predicted. We also show that we can use it to select the most likely lncRNA targets that specific miRNAs would interact with. We believe that the prediction models discovered by EPLMI can yield great insights for further research on ceRNA regulation network. To the best of our knowledge, EPLMI is the first technique that is developed for large-scale lncRNA-miRNA interaction profiling. Availability and implementation: Matlab codes and dataset are available at https://github.com/yahuang1991polyu/EPLMI/. Contact: yu-an.huang@connect.polyu.hk or zhuhongyou@ms.xjb.ac.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation , MicroRNAs/metabolism , RNA, Long Noncoding/metabolism , Sequence Analysis, RNA/methods , Algorithms , Area Under Curve , Humans , Sensitivity and Specificity
18.
Comput Biol Chem ; 69: 202-206, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28396055

ABSTRACT

With a rapid development of high-throughput genomic technologies, a vast amount of protein-protein interactions (PPIs) data has been generated for difference species. However, such set of PPIs is rather small when compared with all possible PPIs. Hence, there is a necessity to specifically develop computational algorithms for large-scale PPI prediction. In response to this need, we propose a parallel algorithm, namely pVLASPD, to perform the prediction task in a distributed manner. In particular, pVLASPD was modified based on the VLASPD algorithm for the purpose of improving the efficiency of VLASPD while maintaining a comparable effectiveness. To do so, we first analyzed VLASPD step by step to identify the places that caused the bottlenecks of efficiency. After that, pVLASPD was developed by parallelizing those inefficient places with the framework of MapReduce. The extensive experimental results demonstrate the promising performance of pVLASPD when applied to prediction of large-scale PPIs.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , Proteins/chemistry , Protein Binding , Proteins/metabolism
19.
Article in English | MEDLINE | ID: mdl-26812730

ABSTRACT

Knowing the ways proteins interact with each other are crucial to our understanding of the functional mechanisms of proteins. It is for this reason that different approaches have been developed in attempts to predict protein-protein interactions (PPIs) computationally. Among them, the sequence-based approaches are preferred to the others as they do not require any information about protein properties to perform their tasks. Instead, most sequence-based approaches make use of feature extraction methods to extract features directly from protein sequences so that for each protein sequence, we can construct a feature vector. The feature vectors of every pair of proteins are then concatenated to form two classes of interacting and non-interacting proteins. The prediction of whether or not two proteins interact with each other is then formulated as a classification problem. How accurate PPI predictions can be made therefore depends on how good the features are that can be extracted from the protein sequences to allow interacting or non-interacting to be best distinguished. To do so, instead of extracting such features from individual protein sequences independently of the other protein in the same pair, we propose to jointly consider features from both sequences in a protein pair during the feature extraction process through using a novel coevolutionary feature extraction approach called CoFex. Coevolutionary features extracted by CoFex refer to the covariations found at coevolving positions. Based on the presence and absence of these coevolutionary features in the sequences of two proteins, feature vectors can be composed for pairs of proteins rather than individual proteins. The experiment results show that CoFex is a promising feature extraction approach and can improve the performance of PPI prediction.


Subject(s)
Conserved Sequence/physiology , Evolution, Molecular , Models, Genetic , Protein Interaction Mapping/methods , Proteome/chemistry , Proteome/metabolism , Sequence Analysis, Protein/methods , Binding Sites , Computer Simulation , Models, Chemical , Protein Binding
20.
Oncotarget ; 7(29): 45948-45958, 2016 Jul 19.
Article in English | MEDLINE | ID: mdl-27322210

ABSTRACT

Accumulating experimental studies have indicated the influence of lncRNAs on various critical biological processes as well as disease development and progression. Calculating lncRNA functional similarity is of high value in inferring lncRNA functions and identifying potential lncRNA-disease associations. However, little effort has been attempt to measure the functional similarity among lncRNAs on a large scale. In this study, we developed a Fuzzy Measure-based LNCRNA functional SIMilarity calculation model (FMLNCSIM) based on the assumption that functionally similar lncRNAs tend to be associated with similar diseases. The performance improvement of FMLNCSIM mainly comes from the combination of information content and the concept of fuzzy measure, which was applied to the directed acyclic graphs of disease MeSH descriptors. To evaluate the effectiveness of FMLNCSIM, we further combined it with the previously proposed model of Laplacian Regularized Least Squares for lncRNA-Disease Association (LRLSLDA). As a result, the integrated model, LRLSLDA-FMLNCSIM, achieve good performance in the frameworks of global LOOCV (AUCs of 0.8266 and 0.9338 based on LncRNADisease and MNDR database) and 5-fold cross validation (average AUCs of 0.7979 and 0.9237 based on LncRNADisease and MNDR database), which significantly improve the performance of previous classical models. It is anticipated that FMLNCSIM could be used for searching functionally similar lncRNAs and inferring lncRNA functions in the future researches.


Subject(s)
Computational Biology/methods , Computer Simulation , Fuzzy Logic , Gene Expression Profiling/methods , RNA, Long Noncoding/genetics , Gene Regulatory Networks , Humans , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...