Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 82
Filter
1.
Article in English | MEDLINE | ID: mdl-38923482

ABSTRACT

Time series anomaly detection is the process of identifying anomalies within time series data. The primary challenge of this task lies in the necessity for the model to comprehend the characteristics of time-independent and abnormal data patterns. In this study, a novel algorithm called adaptive memory broad learning system (AdaMemBLS) is proposed for time series anomaly detection. This algorithm leverages the rapid inference capabilities of the broad learning algorithm and the memory bank's capacity to differentiate between normal and abnormal data. Furthermore, an incremental algorithm based on multiple data augmentation techniques is introduced and applied to multiple ensemble learners, thereby enhancing the model's effectiveness in learning the characteristics of time series data. To bolster the model's anomaly detection capabilities, a more diverse ensemble approach and a discriminative anomaly score are recommended. Extensive experiments conducted on various real-world datasets demonstrate that the proposed method exhibits superior inference speed and more accurate anomaly detection compared to the existing competitors. A detailed experimental investigation is presented to elucidate the effectiveness of the proposed method and the underlying reasons for its efficacy.

2.
Sensors (Basel) ; 24(8)2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38676273

ABSTRACT

Deep neural networks must address the dual challenge of delivering high-accuracy predictions and providing user-friendly explanations. While deep models are widely used in the field of time series modeling, deciphering the core principles that govern the models' outputs remains a significant challenge. This is crucial for fostering the development of trusted models and facilitating domain expert validation, thereby empowering users and domain experts to utilize them confidently in high-risk decision-making contexts (e.g., decision-support systems in healthcare). In this work, we put forward a deep prototype learning model that supports interpretable and manipulable modeling and classification of medical time series (i.e., ECG signal). Specifically, we first optimize the representation of single heartbeat data by employing a bidirectional long short-term memory and attention mechanism, and then construct prototypes during the training phase. The final classification outcomes (i.e., normal sinus rhythm, atrial fibrillation, and other rhythm) are determined by comparing the input with the obtained prototypes. Moreover, the proposed model presents a human-machine collaboration mechanism, allowing domain experts to refine the prototypes by integrating their expertise to further enhance the model's performance (contrary to the human-in-the-loop paradigm, where humans primarily act as supervisors or correctors, intervening when required, our approach focuses on a human-machine collaboration, wherein both parties engage as partners, enabling more fluid and integrated interactions). The experimental outcomes presented herein delineate that, within the realm of binary classification tasks-specifically distinguishing between normal sinus rhythm and atrial fibrillation-our proposed model, albeit registering marginally lower performance in comparison to certain established baseline models such as Convolutional Neural Networks (CNNs) and bidirectional long short-term memory with attention mechanisms (Bi-LSTMAttns), evidently surpasses other contemporary state-of-the-art prototype baseline models. Moreover, it demonstrates significantly enhanced performance relative to these prototype baseline models in the context of triple classification tasks, which encompass normal sinus rhythm, atrial fibrillation, and other rhythm classifications. The proposed model manifests a commendable prediction accuracy of 0.8414, coupled with macro precision, recall, and F1-score metrics of 0.8449, 0.8224, and 0.8235, respectively, achieving both high classification accuracy as well as good interpretability.


Subject(s)
Electrocardiography , Neural Networks, Computer , Humans , Electrocardiography/methods , Atrial Fibrillation/physiopathology , Atrial Fibrillation/diagnosis , Deep Learning , Heart Rate/physiology , Algorithms , Signal Processing, Computer-Assisted
3.
Food Chem ; 424: 136464, 2023 Oct 30.
Article in English | MEDLINE | ID: mdl-37247602

ABSTRACT

As a natural polyphenol, curcumin has been used as an alternative to synthetic preservatives in food preservation. Different from previous reviews that mainly focus on the pH-responsive discoloration of curcumin to detect changes in food quality in real time, this paper focuses on the perspective of the delivery system and photosensitization of curcumin for food preservation. The delivery system is an effective means to overcome the challenges of curcumin like instability, hydrophobicity, and low bioavailability. Curcumin as a photosensitizer can effectively sterilize to preserve food. The practical fresh-keeping effects of the delivery system and photosensitization of curcumin on foods (fruits/vegetables, animal-derived food, and grain) were summarized comprehensively, including shelf-life extension, maintenance of physicochemical properties, nutritional quality, and sensory. Future research should focus on the development of novel curcumin-loaded materials used for food preservation, and most importantly, the biosafety and accumulation toxicity associated with these materials should be explored.


Subject(s)
Curcumin , Animals , Curcumin/pharmacology , Curcumin/chemistry , Food Preservation , Food Quality , Nutritive Value , Fruit
4.
Article in English | MEDLINE | ID: mdl-37028079

ABSTRACT

In this work, we study a more realistic challenging scenario in multiview clustering (MVC), referred to as incomplete MVC (IMVC) where some instances in certain views are missing. The key to IMVC is how to adequately exploit complementary and consistency information under the incompleteness of data. However, most existing methods address the incompleteness problem at the instance level and they require sufficient information to perform data recovery. In this work, we develop a new approach to facilitate IMVC based on the graph propagation perspective. Specifically, a partial graph is used to describe the similarity of samples for incomplete views, such that the issue of missing instances can be translated into the missing entries of the partial graph. In this way, a common graph can be adaptively learned to self-guide the propagation process by exploiting the consistency information, and the propagated graph of each view is in turn used to refine the common self-guided graph in an iterative manner. Thus, the associated missing entries can be inferred through graph propagation by exploiting the consistency information across all views. On the other hand, existing approaches focus on the consistency structure only, and the complementary information has not been sufficiently exploited due to the data incompleteness issue. By contrast, under the proposed graph propagation framework, an exclusive regularization term can be naturally adopted to exploit the complementary information in our method. Extensive experiments demonstrate the effectiveness of the proposed method in comparison with state-of-the-art methods. The source code of our method is available at the https://github.com/CLiu272/TNNLS-PGP.

5.
ISA Trans ; 141: 93-102, 2023 Oct.
Article in English | MEDLINE | ID: mdl-36623991

ABSTRACT

Mobile crowdsensing leverages the ubiquitous sensors of smart devices to facilitate various sensing applications. Users who participate in contributing data usually get rewards from the task requester, while there is a potential risk that someone would preempt the task and provide a forged sensing report for seeking revenue with minimal effort. Thus, trust assessment is essential to identify those irregular sensing reports. The existing methods mainly consider users' reputations and estimate the trustworthiness upon the difference from the aggregated result. However, they still face a severe problem when a majority of reports are invalid or low-quality caused by the repeated submission, e.g., a user can switch multiple accounts on a single device to repeatedly submit forged reports. To tackle this problem, we design a trust assessment scheme with an enhanced device fingerprinting algorithm. Briefly, to reduce the influence of the repeated sensing reports, we first compute their unique fingerprints derived from the intrinsic characteristics of sensors and assign an initial trust weight for each report. Then, to improve the accuracy of the assessment, we further compute the similarity of the reports to obtain their final trust values. Extensive evaluations are conducted to justify the effectiveness of our proposed design.

6.
IEEE Trans Neural Netw Learn Syst ; 34(5): 2284-2297, 2023 May.
Article in English | MEDLINE | ID: mdl-34469316

ABSTRACT

It is hard to construct an optimal classifier for high-dimensional imbalanced data, on which the performance of classifiers is seriously affected and becomes poor. Although many approaches, such as resampling, cost-sensitive, and ensemble learning methods, have been proposed to deal with the skewed data, they are constrained by high-dimensional data with noise and redundancy. In this study, we propose an adaptive subspace optimization ensemble method (ASOEM) for high-dimensional imbalanced data classification to overcome the above limitations. To construct accurate and diverse base classifiers, a novel adaptive subspace optimization (ASO) method based on adaptive subspace generation (ASG) process and rotated subspace optimization (RSO) process is designed to generate multiple robust and discriminative subspaces. Then a resampling scheme is applied on the optimized subspace to build a class-balanced data for each base classifier. To verify the effectiveness, our ASOEM is implemented based on different resampling strategies on 24 real-world high-dimensional imbalanced datasets. Experimental results demonstrate that our proposed methods outperform other mainstream imbalance learning approaches and classifier ensemble methods.

7.
IEEE Trans Biomed Eng ; 70(1): 307-317, 2023 01.
Article in English | MEDLINE | ID: mdl-35820001

ABSTRACT

Advances of high throughput experimental methods have led to the availability of more diverse omic datasets in clinical analysis applications. Different types of omic data reveal different cellular aspects and contribute to the understanding of disease progression from these aspects. While survival prediction and subgroup identification are two important research problems in clinical analysis, their performance can be further boosted by taking advantages of multiple omics data through multi-view learning. However, these two tasks are generally studied separately, and the possibility that they could reinforce each other by collaborative learning has not been adequately considered. In light of this, we propose a View-aware Collaborative Learning (VaCoL) method to jointly boost the performance of survival prediction and subgroup identification by integration of multiple omics data. Specifically, survival analysis and affinity learning, which respectively perform survival prediction and subgroup identification, are integrated into a unified optimization framework to learn the two tasks in a collaborative way. In addition, by considering the diversity of different types of data, we make use of the log-rank test statistic to evaluate the importance of different views. As a result, the proposed approach can adaptively learn the optimal weight for each view during training. Empirical results on several real datasets show that our method is able to significantly improve the performance of survival prediction and subgroup identification. A detailed model analysis study is also provided to show the effectiveness of the proposed collaborative learning and view-weight learning approaches.


Subject(s)
Interdisciplinary Placement , Machine Learning , Learning , Survival Analysis
8.
Plant Biotechnol J ; 21(1): 202-218, 2023 01.
Article in English | MEDLINE | ID: mdl-36196761

ABSTRACT

Temperate japonica/geng (GJ) rice yield has significantly improved due to intensive breeding efforts, dramatically enhancing global food security. However, little is known about the underlying genomic structural variations (SVs) responsible for this improvement. We compared 58 long-read assemblies comprising cultivated and wild rice species in the present study, revealing 156 319 SVs. The phylogenomic analysis based on the SV dataset detected the putatively selected region of GJ sub-populations. A significant portion of the detected SVs overlapped with genic regions were found to influence the expression of involved genes inside GJ assemblies. Integrating the SVs and causal genetic variants underlying agronomic traits into the analysis enables the precise identification of breeding signatures resulting from complex breeding histories aimed at stress tolerance, yield potential and quality improvement. Further, the results demonstrated genomic and genetic evidence that the SV in the promoter of LTG1 is accounting for chilling sensitivity, and the increased copy numbers of GNP1 were associated with positive effects on grain number. In summary, the current study provides genomic resources for retracing the properties of SVs-shaped agronomic traits during previous breeding procedures, which will assist future genetic, genomic and breeding research on rice.


Subject(s)
Oryza , Oryza/genetics , Plant Breeding , Genomics/methods , Phenotype , Edible Grain
9.
Article in English | MEDLINE | ID: mdl-36083961

ABSTRACT

Consensus clustering can derive a more promising and robust clustering result by integrating multiple partitions strategically. However, there are several limitations in the existing approaches: 1) most of the methods compute the ensemble-information matrix heuristically and lack of sufficient optimization; 2) the information from the original dataset is rarely considered; and 3) the noise in both label space and feature space is ignored. To address these issues, we proposed a novel consensus clustering method with co-association matrix optimization (CC-CMO), which aims at improving the co-association matrix by taking abundant information from both label space and feature space into consideration. In label space, CC-CMO derives a weighted partition matrix capturing the intercluster correlation and further designs a least squares regression (LSR) model to explore the global structure of data. In feature space, CC-CMO minimizes the reconstruction error with doubly stochastic normalization in the projective subspace to eliminate noise features as well as learn the local affinity of data. To improve the co-association matrix by jointly considering the subspace representation, global structure, and local affinity of data, we explicitly propose a unified optimization framework and design an alternating optimization algorithm for the optimal co-association matrix. Extensive experiments on a variety of real-world datasets demonstrate the superior performance of CC-CMO to the state-of-the-art consensus clustering approaches.

10.
Sensors (Basel) ; 22(16)2022 Aug 15.
Article in English | MEDLINE | ID: mdl-36015865

ABSTRACT

Modern healthcare practice, especially in intensive care units, produces a vast amount of multivariate time series of health-related data, e.g., multi-lead electrocardiogram (ECG), pulse waveform, blood pressure waveform and so on. As a result, timely and accurate prediction of medical intervention (e.g., intravenous injection) becomes possible, by exploring such semantic-rich time series. Existing works mainly focused on onset prediction at the granularity of hours that was not suitable for medication intervention in emergency medicine. This research proposes a Multi-Variable Hybrid Attentive Model (MVHA) to predict the impending need of medical intervention, by jointly mining multiple time series. Specifically, a two-level attention mechanism is designed to capture the pattern of fluctuations and trends of different time series. This work applied MVHA to the prediction of the impending intravenous injection need of critical patients at the intensive care units. Experiments on the MIMIC Waveform Database demonstrated that the proposed model achieves a prediction accuracy of 0.8475 and an ROC-AUC of 0.8318, which significantly outperforms baseline models.


Subject(s)
Intensive Care Units , Vital Signs , Blood Pressure , Heart Rate , Humans , Time Factors
11.
Article in English | MEDLINE | ID: mdl-35657843

ABSTRACT

High-dimensional class imbalanced data have plagued the performance of classification algorithms seriously. Because of a large number of redundant/invalid features and the class imbalanced issue, it is difficult to construct an optimal classifier for high-dimensional imbalanced data. Classifier ensemble has attracted intensive attention since it can achieve better performance than an individual classifier. In this work, we propose a multiview optimization (MVO) to learn more effective and robust features from high-dimensional imbalanced data, based on which an accurate and robust ensemble system is designed. Specifically, an optimized subview generation (OSG) in MVO is first proposed to generate multiple optimized subviews from different scenarios, which can strengthen the classification ability of features and increase the diversity of ensemble members simultaneously. Second, a new evaluation criterion that considers the distribution of data in each optimized subview is developed based on which a selective ensemble of optimized subviews (SEOS) is designed to perform the subview selective ensemble. Finally, an oversampling approach is executed on the optimized view to obtain a new class rebalanced subset for the classifier. Experimental results on 25 high-dimensional class imbalanced datasets indicate that the proposed method outperforms other mainstream classifier ensemble methods.

12.
Med Phys ; 49(11): 7193-7206, 2022 Nov.
Article in English | MEDLINE | ID: mdl-35746843

ABSTRACT

PURPOSE: To assist physicians in the diagnosis and treatment planning of tumor, a robust and automatic liver and tumor segmentation method is highly demanded in the clinical practice. Recently, numerous researchers have improved the segmentation accuracy of liver and tumor by introducing multiscale contextual information and attention mechanism. However, this tends to introduce more training parameters and suffer from a heavier computational burden. In addition, the tumor has various sizes, shapes, locations, and numbers, which is the main reason for the poor accuracy of automatic segmentation. Although current loss functions can improve the learning ability of the model for hard samples to a certain extent, these loss functions are difficult to optimize the segmentation effect of small tumor regions when the large tumor regions in the sample are in the majority. METHODS: We propose a Liver and Tumor Segmentation Network (LiTS-Net) framework. First, the Shift-Channel Attention Module (S-CAM) is designed to model the feature interdependencies in adjacent channels and does not require additional training parameters. Second, the Weighted-Region (WR) loss function is proposed to emphasize the weight of small tumors in dense tumor regions and reduce the weight of easily segmented samples. Moreover, the Multiple 3D Inception Encoder Units (MEU) is adopted to capture the multiscale contextual information for better segmentation of liver and tumor. RESULTS: Efficacy of the LiTS-Net is demonstrated through the public dataset of MICCAI 2017 Liver Tumor Segmentation (LiTS) challenge, with Dice per case of 96.9 % ${\bf \%}$ and 75.1 % ${\bf \%}$ , respectively. For the 3D Image Reconstruction for Comparison of Algorithm and DataBase (3Dircadb), Dices are 96.47 % ${\bf \%}$ for the liver and 74.54 % ${\bf \%}$ for tumor segmentation. The proposed LiTS-Net outperforms existing state-of-the-art networks. CONCLUSIONS: We demonstrated the effectiveness of LiTS-Net and its core components for liver and tumor segmentation. The S-CAM is designed to model the feature interdependencies in the adjacent channels, which is characterized by no need to add additional training parameters. Meanwhile, we conduct an in-depth study of the feature shift proportion of adjacent channels to determine the optimal shift proportion. In addition, the WR loss function can implicitly learn the weights among regions without the need to manually specify the weights. In dense tumor segmentation tasks, WR aims to enhance the weights of small tumor regions and alleviate the problem that small tumor segmentation is difficult to optimize further when large tumor regions occupy the majority. Last but not least, our proposed method outperforms other state-of-the-art methods on both the LiTS dataset and the 3Dircadb dataset.


Subject(s)
Liver , Neoplasms , Humans , Liver/diagnostic imaging
13.
Inf Process Manag ; 59(3): 102935, 2022 May.
Article in English | MEDLINE | ID: mdl-35400028

ABSTRACT

The rapid dissemination of misinformation in social media during the COVID-19 pandemic triggers panic and threatens the pandemic preparedness and control. Correction is a crucial countermeasure to debunk misperceptions. However, the effective mechanism of correction on social media is not fully verified. Previous works focus on psychological theories and experimental studies, while the applicability of conclusions to the actual social media is unclear. This study explores determinants governing the effectiveness of misinformation corrections on social media with a combination of a data-driven approach and related theories on psychology and communication. Specifically, referring to the Backfire Effect, Source Credibility, and Audience's role in dissemination theories, we propose five hypotheses containing seven potential factors (regarding correction content and publishers' influence), e.g., the proportion of original misinformation and warnings of misinformation. Then, we obtain 1487 significant COVID-19 related corrections on Microblog between January 1st, 2020 and April 30th, 2020, and conduct annotations, which characterize each piece of correction based on the aforementioned factors. We demonstrate several promising conclusions through a comprehensive analysis of the dataset. For example, mentioning excessive original misinformation in corrections would not undermine people's believability within a short period after reading; warnings of misinformation in a demanding tone make correction worse; determinants of correction effectiveness vary among different topics of misinformation. Finally, we build a regression model to predict correction effectiveness. These results provide practical suggestions on misinformation correction on social media, and a tool to guide practitioners to revise corrections before publishing, leading to ideal efficacies.

14.
Int J Mol Sci ; 23(3)2022 Jan 27.
Article in English | MEDLINE | ID: mdl-35163383

ABSTRACT

Heterotrimeric G protein signaling is an evolutionarily conserved mechanism in diverse organisms that mediates intracellular responses to external stimuli. In rice, the G proteins are involved in the regulation of multiple important agronomic traits. In this paper, we present our finding that two type C G protein gamma subunits, DEP1 and GS3, antagonistically regulated grain yield and grain quality. The DEP1 gene editing we conducted, significantly increased the grain number per panicle but had a negative impact on taste value, texture properties, and chalkiness-related traits. The GS3 gene editing decreased grain number per panicle but significantly increased grain length. In addition, the GS3 gene-edited plants showed improved taste value, appearance, texture properties, and Rapid Visco Analyser (RVA) profiles. To combine the advantages of both gs3 and dep1, we conducted a molecular design breeding at the GS3 locus of a "super rice" variety, SN265, which has a truncated dep1 allele with erect panicle architecture, high-yield performance, and which is of mediocre eating quality. The elongated grain size of the sn265/gs3 gene-edited plants further increased the grain yield. More importantly, the texture properties and RVA profiles were significantly improved, and the taste quality was enhanced. Beyond showcasing the combined function of dep1 and gs3, this paper presents a strategy for the simultaneous improvement of rice grain yield and quality through manipulating two type C G protein gamma subunits in rice.


Subject(s)
Seeds/growth & development , Base Sequence , DNA Shuffling , GTP-Binding Protein gamma Subunits/genetics , Gene Editing , Mutation/genetics , Oryza/genetics , Oryza/ultrastructure , Plant Proteins/genetics , Seeds/ultrastructure
15.
IEEE Trans Cybern ; 52(8): 7513-7526, 2022 Aug.
Article in English | MEDLINE | ID: mdl-34990374

ABSTRACT

Text classification has been widely explored in natural language processing. In this article, we propose a novel adaptive dense ensemble model (AdaDEM) for text classification, which includes local ensemble stage (LES) and global dense ensemble stage (GDES). To strengthen the classification ability and robustness of the enhanced layer, we propose a selective ensemble model based on enhanced attention convolutional neural networks (EnCNNs). To increase the diversity of the ensemble system, these EnCNNs are generated by using two manners: 1) different sample subsets and 2) different granularity kernels. Then, an evaluation criterion that considers both accuracy and diversity is proposed in LES to obtain effective integration results. Furthermore, to make better use of information flow, we develop an adaptive dense ensemble structure with multiple enhanced layers in GDES to mitigate the issue that there may be redundant or invalid enhanced layers in the cascade structure. We conducted extensive experiments against state-of-the-art methods on multiple real-world datasets, including long and short texts, which has verified the effectiveness and generality of our method.


Subject(s)
Algorithms , Neural Networks, Computer , Natural Language Processing
16.
IEEE Trans Neural Netw Learn Syst ; 33(2): 654-666, 2022 Feb.
Article in English | MEDLINE | ID: mdl-33079681

ABSTRACT

Recently, multitask learning has been successfully applied to survival analysis problems. A critical challenge in real-world survival analysis tasks is that not all instances and tasks are equally learnable. A survival analysis model can be improved when considering the complexities of instances and tasks during the model training. To this end, we propose an asymmetric graph-guided multitask learning approach with self-paced learning for survival analysis applications. The proposed model is able to improve the learning performance by identifying the complex structure among tasks and considering the complexities of training instances and tasks during the model training. Especially, by incorporating the self-paced learning strategy and asymmetric graph-guided regularization, the proposed model is able to learn the model in a progressive way from "easy" to "hard" loss function items. In addition, together with the self-paced learning function, the asymmetric graph-guided regularization allows the related knowledge transfer from one task to another in an asymmetric way. Consequently, the knowledge acquired from those earlier learned tasks can help to solve complex tasks effectively. The experimental results on both synthetic and real-world TCGA data suggest that the proposed method is indeed useful for improving survival analysis and achieves higher prediction accuracies than the previous state-of-the-art methods.

17.
IEEE Trans Neural Netw Learn Syst ; 33(1): 75-88, 2022 Jan.
Article in English | MEDLINE | ID: mdl-33048763

ABSTRACT

Graph-based methods have achieved impressive performance on semisupervised classification (SSC). Traditional graph-based methods have two main drawbacks. First, the graph is predefined before training a classifier, which does not leverage the interactions between the classifier training and similarity matrix learning. Second, when handling high-dimensional data with noisy or redundant features, the graph constructed in the original input space is actually unsuitable and may lead to poor performance. In this article, we propose an SSC method with novel graph construction (SSC-NGC), in which the similarity matrix is optimized in both label space and an additional subspace to get a better and more robust result than in original data space. Furthermore, to obtain a high-quality subspace, we learn the projection matrix of the additional subspace by preserving the local and global structure of the data. Finally, we intergrade the classifier training, the graph construction, and the subspace learning into a unified framework. With this framework, the classifier parameters, similarity matrix, and projection matrix of subspace are adaptively learned in an iterative scheme to obtain an optimal joint result. We conduct extensive comparative experiments against state-of-the-art methods over multiple real-world data sets. Experimental results demonstrate the superiority of the proposed method over other state-of-the-art algorithms.

18.
IEEE Trans Cybern ; 52(5): 3658-3668, 2022 May.
Article in English | MEDLINE | ID: mdl-32924945

ABSTRACT

Ensemble learning has many successful applications because of its effectiveness in boosting the predictive performance of classification models. In this article, we propose a semisupervised multiple choice learning (SemiMCL) approach to jointly train a network ensemble on partially labeled data. Our model mainly focuses on improving a labeled data assignment among the constituent networks and exploiting unlabeled data to capture domain-specific information, such that semisupervised classification can be effectively facilitated. Different from conventional multiple choice learning models, the constituent networks learn multiple tasks in the training process. Specifically, an auxiliary reconstruction task is included to learn domain-specific representation. For the purpose of performing implicit labeling on reliable unlabeled samples, we adopt a negative l1 -norm regularization when minimizing the conditional entropy with respect to the posterior probability distribution. Extensive experiments on multiple real-world datasets are conducted to verify the effectiveness and superiority of the proposed SemiMCL model.


Subject(s)
Learning , Supervised Machine Learning
19.
IEEE/ACM Trans Comput Biol Bioinform ; 19(2): 1193-1202, 2022.
Article in English | MEDLINE | ID: mdl-32750893

ABSTRACT

Identifying cancer subtypes by integration of multi-omic data is beneficial to improve the understanding of disease progression, and provides more precise treatment for patients. Cancer subtypes identification is usually accomplished by clustering patients with unsupervised learning approaches. Thus, most existing integrative cancer subtyping methods are performed in an entirely unsupervised way. An integrative cancer subtyping approach can be improved to discover clinically more relevant cancer subtypes when considering the clinical survival response variables. In this study, we propose a Survival Supervised Graph Clustering (S2GC)for cancer subtyping by taking into consideration survival information. Specifically, we use a graph to represent similarity of patients, and develop a multi-omic survival analysis embedding with patient-to-patient similarity graph learning for cancer subtype identification. The multi-view (omic)survival analysis model and graph of patients are jointly learned in a unified way. The learned optimal graph can be unitized to cluster cancer subtypes directly. In the proposed model, the survival analysis model and adaptive graph learning could positively reinforce each other. Consequently, the survival time can be considered as supervised information to improve the quality of the similarity graph and explore clinically more relevant subgroups of patients. Experiments on several representative multi-omic cancer datasets demonstrate that the proposed method achieves better results than a number of state-of-the-art methods. The results also suggest that our method is able to identify biologically meaningful subgroups for different cancer types. (Our Matlab source code is available online at github: https://github.com/CLiu272/S2GC).


Subject(s)
Algorithms , Neoplasms , Cluster Analysis , Humans , Neoplasms/genetics , Software , Survival Analysis
20.
IEEE Trans Cybern ; 52(7): 6518-6530, 2022 Jul.
Article in English | MEDLINE | ID: mdl-33284761

ABSTRACT

As an effective method for clustering applications, the clustering ensemble algorithm integrates different clustering solutions into a final one, thus improving the clustering efficiency. The key to designing the clustering ensemble algorithm is to improve the diversities of base learners and optimize the ensemble strategies. To address these problems, we propose a clustering ensemble framework that consists of three parts. First, three view transformation methods, including random principal component analysis, random nearest neighbor, and modified fuzzy extension model, are used as base learners to learn different clustering views. A random transformation and hybrid multiview learning-based clustering ensemble method (RTHMC) is then designed to synthesize the multiview clustering results. Second, a new random subspace transformation is integrated into RTHMC to enhance its performance. Finally, a view-based self-evolutionary strategy is developed to further improve the proposed method by optimizing random subspace sets. Experiments and comparisons demonstrate the effectiveness and superiority of the proposed method for clustering different kinds of data.


Subject(s)
Algorithms , Learning , Cluster Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...