Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
IEEE Trans Image Process ; 33: 2730-2745, 2024.
Article in English | MEDLINE | ID: mdl-38578858

ABSTRACT

In Alzheimer's disease (AD) diagnosis, joint feature selection for predicting disease labels (classification) and estimating cognitive scores (regression) with neuroimaging data has received increasing attention. In this paper, we propose a model named Shared Manifold regularized Joint Feature Selection (SMJFS) that performs classification and regression in a unified framework for AD diagnosis. For classification, unlike the existing works that build least squares regression models which are insufficient in the ability of extracting discriminative information for classification, we design an objective function that integrates linear discriminant analysis and subspace sparsity regularization for acquiring an informative feature subset. Furthermore, the local data relationships are learned according to the samples' transformed distances to exploit the local data structure adaptively. For regression, in contrast to previous works that overlook the correlations among cognitive scores, we learn a latent score space to capture the correlations and employ the latent space to design a regression model with l2,1 -norm regularization, facilitating the feature selection in regression task. Moreover, the missing cognitive scores can be recovered in the latent space for increasing the number of available training samples. Meanwhile, to capture the correlations between the two tasks and describe the local relationships between samples, we construct an adaptive shared graph to guide the subspace learning in classification and the latent cognitive score learning in regression simultaneously. An efficient iterative optimization algorithm is proposed to solve the optimization problem. Extensive experiments on three datasets validate the discriminability of the features selected by SMJFS.


Subject(s)
Alzheimer Disease , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Alzheimer Disease/diagnostic imaging , Algorithms
2.
Int J Clin Exp Pathol ; 17(3): 72-77, 2024.
Article in English | MEDLINE | ID: mdl-38577698

ABSTRACT

Bone cement leakage from the femoral medullary cavity is a rare complication following hip replacement. Currently, there are no reports of bone cement leakage into the heart. Here, we report an 81-year-old female patient with right femoral neck fracture. A thorough preoperative examination showed that bone cement had leaked into the heart during right femoral head replacement, leading to the death of the patient that night. Postoperative cardiac ultrasound showed that bone cement entered the vascular system through the femoral medullary cavity and subsequently entered the heart. Extreme deterioration in the patient's condition resulted in death that night. Unfortunately, the patient's family abandoned the idea of surgical removal of foreign bodies, leading to inevitable death. This case emphasizes the risk of clinical manifestations of cardiac embolism of bone cement after artificial femoral head replacement, suggesting that the risk of such embolism might be underestimated. We propose routine real-time C-arm X-ray guidance and injection of an appropriate amount of bone cement to prevent serious cardiopulmonary failure.

3.
iScience ; 27(2): 108782, 2024 Feb 16.
Article in English | MEDLINE | ID: mdl-38318372

ABSTRACT

As the influence of transformer-based approaches in general and generative artificial intelligence (AI) in particular continues to expand across various domains, concerns regarding authenticity and explainability are on the rise. Here, we share our perspective on the necessity of implementing effective detection, verification, and explainability mechanisms to counteract the potential harms arising from the proliferation of AI-generated inauthentic content and science. We recognize the transformative potential of generative AI, exemplified by ChatGPT, in the scientific landscape. However, we also emphasize the urgency of addressing associated challenges, particularly in light of the risks posed by disinformation, misinformation, and unreproducible science. This perspective serves as a response to the call for concerted efforts to safeguard the authenticity of information in the age of AI. By prioritizing detection, fact-checking, and explainability policies, we aim to foster a climate of trust, uphold ethical standards, and harness the full potential of AI for the betterment of science and society.

4.
Discov Oncol ; 14(1): 232, 2023 Dec 16.
Article in English | MEDLINE | ID: mdl-38103068

ABSTRACT

BACKGROUND: Bladder cancer (BLCA) is a prevalent urinary system malignancy. Understanding the interplay of immunological and metabolic genes in BLCA is crucial for prognosis and treatment. METHODS: Immune/metabolism genes were extracted, their expression profiles analyzed. NMF clustering found prognostic genes. Immunocyte infiltration and tumor microenvironment were examined. Risk prognostic signature using Cox/LASSO methods was developed. Immunological Microenvironment and functional enrichment analysis explored. Immunotherapy response and somatic mutations evaluated. RT-qPCR validated gene expression. RESULTS: We investigated these genes in 614 BLCA samples, identifying relevant prognostic genes. We developed a predictive feature and signature comprising 7 genes (POLE2, AHNAK, SHMT2, NR2F1, TFRC, OAS1, CHKB). This immune and metabolism-related gene (IMRG) signature showed superior predictive performance across multiple datasets and was independent of clinical indicators. Immunotherapy response and immune cell infiltration correlated with the risk score. Functional enrichment analysis revealed distinct biological pathways between low- and high-risk groups. The signature demonstrated higher prediction accuracy than other signatures. qRT-PCR confirmed differential gene expression and immunotherapy response. CONCLUSIONS: The model in our work is a novel assessment tool to measure immunotherapy's effectiveness and anticipate BLCA patients' prognosis, offering new avenues for immunological biomarkers and targeted treatments.

5.
Article in English | MEDLINE | ID: mdl-37971920

ABSTRACT

Density peaks clustering (DPC) is a popular clustering algorithm, which has been studied and favored by many scholars because of its simplicity, fewer parameters, and no iteration. However, in previous improvements of DPC, the issue of privacy data leakage was not considered, and the "Domino" effect caused by the misallocation of noncenters has not been effectively addressed. In view of the above shortcomings, a horizontal federated DPC (HFDPC) is proposed. First, HFDPC introduces the idea of horizontal federated learning and proposes a protection mechanism for client parameter transmission. Second, DPC is improved by using similar density chain (SDC) to alleviate the "Domino" effect caused by multiple local peaks in the flow pattern dataset. Finally, a novel data dimension reduction and image encryption are used to improve the effectiveness of data partitioning. The experimental results show that compared with DPC and some of its improvements, HFDPC has a certain degree of improvement in accuracy and speed.

6.
Article in English | MEDLINE | ID: mdl-37335781

ABSTRACT

Few-shot knowledge graph completion (FKGC), which aims to infer new triples for a relation using only a few reference triples of the relation, has attracted much attention in recent years. Most existing FKGC methods learn a transferable embedding space, where entity pairs belonging to the same relations are close to each other. In real-world knowledge graphs (KGs), however, some relations may involve multiple semantics, and their entity pairs are not always close due to having different meanings. Hence, the existing FKGC methods may yield suboptimal performance when handling multiple semantic relations in the few-shot scenario. To solve this problem, we propose a new method named adaptive prototype interaction network (APINet) for FKGC. Our model consists of two major components: 1) an interaction attention encoder (InterAE) to capture the underlying relational semantics of entity pairs by modeling the interactive information between head and tail entities and 2) an adaptive prototype net (APNet) to generate relation prototypes adaptive to different query triples by extracting query-relevant reference pairs and reducing the data inconsistency between support and query sets. Experimental results on two public datasets demonstrate that APINet outperforms several state-of-the-art FKGC methods. The ablation study demonstrates the rationality and effectiveness of each component of APINet.

7.
IEEE Trans Neural Netw Learn Syst ; 34(3): 1563-1577, 2023 Mar.
Article in English | MEDLINE | ID: mdl-34473627

ABSTRACT

Recently, causal feature selection (CFS) has attracted considerable attention due to its outstanding interpretability and predictability performance. Such a method primarily includes the Markov blanket (MB) discovery and feature selection based on Granger causality. Representatively, the max-min MB (MMMB) can mine an optimal feature subset, i.e., MB; however, it is unsuitable for streaming features. Online streaming feature selection (OSFS) via online process streaming features can determine parents and children (PC), a subset of MB; however, it cannot mine the MB of the target attribute ( T ), i.e., a given feature, thus resulting in insufficient prediction accuracy. The Granger selection method (GSM) establishes a causal matrix of all features by performing excessively time; however, it cannot achieve a high prediction accuracy and only forecasts fixed multivariate time series data. To address these issues, we proposed an online CFS for streaming features (OCFSSFs) that mine MB containing PC and spouse and adopt the interleaving PC and spouse learning method. Furthermore, it distinguishes between PC and spouse in real time and can identify children with parents online when identifying spouses. We experimentally evaluated the proposed algorithm on synthetic datasets using precision, recall, and distance. In addition, the algorithm was tested on real-world and time series datasets using classification precision, the number of selected features, and running time. The results validated the effectiveness of the proposed algorithm.

8.
IEEE Trans Cybern ; 53(5): 3288-3300, 2023 May.
Article in English | MEDLINE | ID: mdl-35560099

ABSTRACT

Traditional sequential pattern mining methods were designed for symbolic sequence. As a collection of measurements in chronological order, a time series needs to be discretized into symbolic sequences, and then users can apply sequential pattern mining methods to discover interesting patterns in time series. The discretization will not only cause the loss of some important information, which partially destroys the continuity of time series, but also ignore the order relations between time-series values. Inspired by order-preserving matching, this article explores a new method called order-preserving sequential pattern (OPP) mining, which does not need to discretize time series into symbolic sequences and represents patterns based on the order relations of time series. An inherent advantage of such representation is that the trend of a time series can be represented by the relative order of the values underneath time series. We propose an OPP-Miner algorithm to mine frequent patterns in time series with the same relative order. OPP-Miner employs the filtration and verification strategies to calculate the support and uses the pattern fusion strategy to generate candidate patterns. To compress the result set, we also study to find the maximal OPPs. Experimental results validate that OPP-Miner is not only efficient but can also discover similar subsequences in time series. In addition, case studies show that our algorithms have high utility in analyzing the COVID-19 epidemic by identifying critical trends and improve the clustering performance. The algorithms and data can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/OPP-Miner.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 2751-2768, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35704541

ABSTRACT

Graph Convolutional Networks (GCNs), as a prominent example of graph neural networks, are receiving extensive attention for their powerful capability in learning node representations on graphs. There are various extensions, either in sampling and/or node feature aggregation, to further improve GCNs' performance, scalability and applicability in various domains. Still, there is room for further improvements on learning efficiency because performing batch gradient descent using the full dataset for every training iteration, as unavoidable for training (vanilla) GCNs, is not a viable option for large graphs. The good potential of random features in speeding up the training phase in large-scale problems motivates us to consider carefully whether GCNs with random weights are feasible. To investigate theoretically and empirically this issue, we propose a novel model termed Graph Convolutional Networks with Random Weights (GCN-RW) by revising the convolutional layer with random filters and simultaneously adjusting the learning objective with regularized least squares loss. Theoretical analyses on the model's approximation upper bound, structure complexity, stability and generalization, are provided with rigorous mathematical proofs. The effectiveness and efficiency of GCN-RW are verified on semi-supervised node classification task with several benchmark datasets. Experimental results demonstrate that, in comparison with some state-of-the-art approaches, GCN-RW can achieve better or matched accuracies with less training time cost.

10.
Mol Cell Biochem ; 478(2): 343-359, 2023 Feb.
Article in English | MEDLINE | ID: mdl-35829871

ABSTRACT

Myocardin-related transcription factor A (MRTF-A) has an inhibitory effect on myocardial infarction; however, the mechanism is not clear. This study reveals the mechanism by which MRTF-A regulates autophagy to alleviate myocardial infarct-mediated inflammation, and the effect of silent information regulator 1 (SIRT1) on the myocardial protective effect of MRTF-A was also verified. MRTF-A significantly decreased cardiac damage induced by myocardial ischemia. In addition, MRTF-A decreased NLRP3 inflammasome activity, and significantly increased the expression of autophagy protein in myocardial ischemia tissue. Lipopolysaccharide (LPS) and 3-methyladenine (3-MA) eliminated the protective effects of MRTF-A. Furthermore, simultaneous overexpression of MRTF-A and SIRT1 effectively reduced the injury caused by myocardial ischemia; this was associated with downregulation of inflammatory factor proteins and when upregulation of autophagy-related proteins. Inhibition of SIRT1 activity partially suppressed these MRTF-A-induced cardioprotective effects. SIRT1 has a synergistic effect with MRTF-A to inhibit myocardial ischemia injury through reducing the inflammation response and inducing autophagy.


Subject(s)
Myocardial Infarction , Myocardial Ischemia , Myocardial Reperfusion Injury , Reperfusion Injury , Rats , Animals , Myocardial Reperfusion Injury/metabolism , Rats, Sprague-Dawley , Sirtuin 1/genetics , Sirtuin 1/metabolism , Autophagy , Inflammation , Apoptosis
11.
IEEE Trans Neural Netw Learn Syst ; 34(10): 6872-6886, 2023 Oct.
Article in English | MEDLINE | ID: mdl-36279327

ABSTRACT

Streaming data mining can be applied in many practical applications, such as social media, market analysis, and sensor networks. Most previous efforts assume that all training instances except for the novel class have been completely labeled for novel class detection in streaming data. However, a more realistic situation is that only a few instances in the data stream are labeled. In addition, most existing algorithms are potentially dependent on the strong cohesion between known classes or the greater separation between novel class and known classes in the feature space. Unfortunately, this potential dependence is usually not an inherent characteristic of streaming data. Therefore, to classify data streams and detect novel classes, the proposed algorithm should satisfy: 1) it can handle any degree of separation between novel class and known classes (both easy and difficult novel class detection) and 2) it can use limited labeled instances to build algorithm models. In this article, we tackle these issues by a new framework called semisupervised streaming learning for difficult novel class detection (SSLDN), which consists of three major components: an effective novel class detector based on random trees, a classifier by using the information of nearest neighbors, and an efficient updating process. Empirical studies on several datasets validate that SSLDN can accurately handle different degrees of separation between the novel and known classes in semisupervised streaming data.

12.
J Biochem Mol Toxicol ; 36(10): e23159, 2022 Oct.
Article in English | MEDLINE | ID: mdl-35876212

ABSTRACT

MicroRNAs (miRNAs) feature prominently in regulating the progression of chronic heart failure (CHF). This study was performed to investigate the role of miR-8485 in the injury of cardiomyocytes and CHF. It was found that miR-8485 level was markedly reduced in the plasma of CHF patients, compared with the healthy controls. H2 O2 treatment increased tumor necrosis factor-α, interleukin (IL)-6, and IL-1ß levels, inhibited the viability of human adult ventricular cardiomyocyte cell line AC16, and increased the apoptosis, while miR-8485 overexpression reversed these effects. Tumor protein p53 inducible nuclear protein 1 (TP53INP1) was identified as a downstream target of miR-8485, and TP53INP1 overexpression weakened the effects of miR-8485 on cell viability, apoptosis, as well as inflammatory responses. Our data suggest that miR-8485 attenuates the injury of cardiomyocytes by targeting TP53INP1, suggesting it is a protective factor against CHF.


Subject(s)
Carrier Proteins , Heat-Shock Proteins , MicroRNAs , Myocytes, Cardiac , Apoptosis , Carrier Proteins/genetics , Carrier Proteins/metabolism , Heat-Shock Proteins/genetics , Heat-Shock Proteins/metabolism , Humans , Interleukins/metabolism , Interleukins/pharmacology , MicroRNAs/genetics , MicroRNAs/metabolism , Myocytes, Cardiac/metabolism , Tumor Necrosis Factor-alpha/metabolism , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/metabolism
13.
Appl Intell (Dordr) ; 52(9): 9861-9884, 2022.
Article in English | MEDLINE | ID: mdl-35035093

ABSTRACT

Nonoverlapping sequential pattern mining, as a kind of repetitive sequential pattern mining with gap constraints, can find more valuable patterns. Traditional algorithms focused on finding all frequent patterns and found lots of redundant short patterns. However, it not only reduces the mining efficiency, but also increases the difficulty in obtaining the demand information. To reduce the frequent patterns and retain its expression ability, this paper focuses on the Nonoverlapping Maximal Sequential Pattern (NMSP) mining which refers to finding frequent patterns whose super-patterns are infrequent. In this paper, we propose an effective mining algorithm, Nettree for NMSP mining (NetNMSP), which has three key steps: calculating the support, generating the candidate patterns, and determining NMSPs. To efficiently calculate the support, NetNMSP employs the backtracking strategy to obtain a nonoverlapping occurrence from the leftmost leaf to its root with the leftmost parent node method in a Nettree. To reduce the candidate patterns, NetNMSP generates candidate patterns by the pattern join strategy. Furthermore, to determine NMSPs, NetNMSP adopts the screening method. Experiments on biological sequence datasets verify that not only does NetNMSP outperform the state-of-the-arts algorithms, but also NMSP mining has better compression performance than closed pattern mining. On sales datasets, we validate that our algorithm guarantees the best scalability on large scale datasets. Moreover, we mine NMSPs and frequent patterns in SARS-CoV-1, SARS-CoV-2 and MERS-CoV. The results show that the three viruses are similar in the short patterns but different in the long patterns. More importantly, NMSP mining is easier to find the differences between the virus sequences.

14.
IEEE Trans Cybern ; 52(8): 7464-7477, 2022 Aug.
Article in English | MEDLINE | ID: mdl-33400661

ABSTRACT

Domain adaptation aims to facilitate the learning task in an unlabeled target domain by leveraging the auxiliary knowledge in a well-labeled source domain from a different distribution. Almost existing autoencoder-based domain adaptation approaches focus on learning domain-invariant representations to reduce the distribution discrepancy between source and target domains. However, there is still a weakness existing in these approaches: the class-discriminative information of the two domains may be damaged while aligning the distributions of the source and target domains, which makes the samples with different classes close to each other, leading to performance degradation. To tackle this issue, we propose a novel dual-representation autoencoder (DRAE) to learn dual-domain-invariant representations for domain adaptation. Specifically, DRAE consists of three learning phases. First, DRAE learns global representations of all source and target data to maximize the interclass distance in each domain and minimize the marginal distribution and conditional distribution of both domains simultaneously. Second, DRAE extracts local representations of instances sharing the same label in both domains to maintain class-discriminative information in each class. Finally, DRAE constructs dual representations by aligning the global and local representations with different weights. Using three text and two image datasets and 12 state-of-the-art domain adaptation methods, the extensive experiments have demonstrated the effectiveness of DRAE.


Subject(s)
Learning , Machine Learning
15.
IEEE Trans Cybern ; 52(11): 11819-11833, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34143749

ABSTRACT

For sequence classification, an important issue is to find discriminative features, where sequential pattern mining (SPM) is often used to find frequent patterns from sequences as features. To improve classification accuracy and pattern interpretability, contrast pattern mining emerges to discover patterns with high-contrast rates between different categories. To date, existing contrast SPM methods face many challenges, including excessive parameter selection and inefficient occurrences counting. To tackle these issues, this article proposes a top- k self-adaptive contrast SPM, which adaptively adjusts the gap constraints to find top- k self-adaptive contrast patterns (SCPs) from positive and negative sequences. One of the key tasks of the mining problem is to calculate the support (the number of occurrences) of a pattern in each sequence. To support efficient counting, we store all occurrences of a pattern in a special array in a Nettree, an extended tree structure with multiple roots and multiple parents. We employ the array to calculate the occurrences of all its superpatterns with one-way scanning to avoid redundant calculation. Meanwhile, because the contrast SPM problem does not satisfy the Apriori property, we propose Zero and Less strategies to prune candidate patterns and a Contrast-first mining strategy to select patterns with the highest contrast rate as the prefix subpattern and calculate the contrast rate of all its superpatterns. Experiments validate the efficiency of the proposed algorithm and show that contrast patterns significantly outperform frequent patterns for sequence classification. The algorithms and datasets can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/SCP-Miner.


Subject(s)
Data Mining , Pattern Recognition, Automated , Algorithms
16.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 10129-10144, 2022 12.
Article in English | MEDLINE | ID: mdl-34914581

ABSTRACT

Adverse drug-drug interaction (ADDI) is a significant life-threatening issue, posing a leading cause of hospitalizations and deaths in healthcare systems. This paper proposes a unified Multi-Attribute Discriminative Representation Learning (MADRL) model for ADDI prediction. Unlike the existing works that equally treat features of each attribute without discrimination and do not consider the underlying relationship among drugs, we first develop a regularized optimization problem based on CUR matrix decomposition for joint representative drug and discriminative feature selection such that the selected drugs and features can well approximate the original feature spaces and the critical factors discriminative to ADDIs can be properly explored. Different from the existing models that ignore the consistent and unique properties among attributes, a Generative Adversarial Network (GAN) framework is then designed to capture the inter-attribute shared and intra-attribute specific representations of adverse drug pairs for exploiting their consensus and complementary information in ADDI prediction. Meanwhile, MADRL is compatible with any kind of attributes and capable of exploring their respective effects on ADDI prediction. An iterative algorithm based on the alternating direction method of multipliers is developed for optimization. Experiments on publicly available dataset demonstrate the effectiveness of MADRL when compared with eleven baselines and its six variants.


Subject(s)
Algorithms , Drug Interactions
17.
Front Physiol ; 12: 645041, 2021.
Article in English | MEDLINE | ID: mdl-34220528

ABSTRACT

Myocardial energy metabolism (MEM) is an important factor of myocardial injury. Trimetazidine (TMZ) provides protection against myocardial ischemia/reperfusion injury. The current study set out to evaluate the effect and mechanism of TMZ on MEM disorder induced by myocardial infarction (MI). Firstly, a MI mouse model was established by coronary artery ligation, which was then treated with different concentrations of TMZ (5, 10, and 20 mg kg-1 day-1). The results suggested that TMZ reduced the heart/weight ratio in a concentration-dependent manner. TMZ also reduced the levels of Bax and cleaved caspase-3 and promoted Bcl-2 expression. In addition, TMZ augmented adenosine triphosphate (ATP) production and superoxide dismutase (SOD) activity induced by MI and decreased the levels of lipid peroxide (LPO), free fatty acids (FFA), and nitric oxide (NO) in a concentration-dependent manner (all P < 0.05). Furthermore, an H2O2-induced cell injury model was established and treated with different concentrations of TMZ (1, 5, and 10 µM). The results showed that SIRT1 overexpression promoted ATP production and reactive oxygen species (ROS) activity and reduced the levels of LPO, FFA, and NO in H9C2 cardiomyocytes treated with H2O2 and TMZ. Silencing SIRT1 suppressed ATP production and ROS activity and increased the levels of LPO, FFA, and NO (all P < 0.05). TMZ activated the SIRT1-AMPK pathway by increasing SIRT1 expression and AMPK phosphorylation. In conclusion, TMZ inhibited MI-induced myocardial apoptosis and MEM disorder by activating the SIRT1-AMPK pathway.

18.
IEEE Trans Neural Netw Learn Syst ; 32(3): 1228-1240, 2021 03.
Article in English | MEDLINE | ID: mdl-32305941

ABSTRACT

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume that the feature space is fixed or changes by following explicit regularities, limiting their applicability in real-time applications. For example, in a smart healthcare platform, the feature space of the patient data varies when different medical service providers use nonidentical feature sets to describe the patients' symptoms. To fill the gap, we in this article propose a novel learning paradigm, namely, Generative Learning With Streaming Capricious (GLSC) data, which does not make any assumption on the feature space dynamics. In other words, GLSC handles the data streams with a varying feature space, where each arriving data instance can arbitrarily carry new features and/or stop carrying partial old features. Specifically, GLSC trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. The universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve the performance with theoretical guarantees. The experimental results demonstrate that GLSC achieves conspicuous performance on both synthetic and real data sets.

19.
IEEE Trans Neural Netw Learn Syst ; 32(10): 4691-4702, 2021 10.
Article in English | MEDLINE | ID: mdl-33021946

ABSTRACT

Traditional feature selection methods assume that all data instances and features are known before learning. However, it is not the case in many real-world applications that we are more likely faced with data streams or feature streams or both. Feature streams are defined as features that flow in one by one over time, whereas the number of training examples remains fixed. Existing streaming feature selection methods focus on removing irrelevant and redundant features and selecting the most relevant features, but they ignore the interaction between features. A feature might have little correlation with the target concept by itself, but, when it is combined with some other features, they can be strongly correlated with the target concept. In other words, the interactive features contribute to the target concept as an integer greater than the sum of individuals. Nevertheless, most of the existing streaming feature selection methods treat features individually, but it is necessary to consider the interaction between features. In this article, we focus on the problem of feature interaction in feature streams and propose a new streaming feature selection method that can select features to interact with each other, named Streaming Feature Selection considering Feature Interaction (SFS-FI). With the formal definition of feature interaction, we design a new metric named interaction gain that can measure the interaction degree between the new arriving feature and the selected feature subset. Besides, we analyzed and demonstrated the relationship between feature relevance and feature interaction. Extensive experiments conducted on 14 real-world microarray data sets indicate the efficiency of our new method.

20.
Knowl Based Syst ; 196: 105812, 2020 May 21.
Article in English | MEDLINE | ID: mdl-32292248

ABSTRACT

Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.

SELECTION OF CITATIONS
SEARCH DETAIL
...