Search | VHL Regional Portal

Convolutional Neural Network-Based Discriminator for Outlier Detection.

Alharbi, Fahad; El Hindi, Khalil; Al Ahmadi, Saad; Alsalamn, Hussien.

Comput Intell Neurosci ; 2021: 8811147, 2021.

Article in English | MEDLINE | ID: mdl-33763125

ABSTRACT

Noise in training data increases the tendency of many machine learning methods to overfit the training data, which undermines the performance. Outliers occur in big data as a result of various factors, including human errors. In this work, we present a novel discriminator model for the identification of outliers in the training data. We propose a systematic approach for creating training datasets to train the discriminator based on a small number of genuine instances (trusted data). The noise discriminator is a convolutional neural network (CNN). We evaluate the discriminator's performance using several benchmark datasets and with different noise ratios. We inserted random noise in each dataset and trained discriminators to clean them. Different discriminators were trained using different numbers of genuine instances with and without data augmentation. We compare the performance of the proposed noise-discriminator method with seven other methods proposed in the literature using several benchmark datasets. Our empirical results indicate that the proposed method is very competitive to the other methods. It actually outperforms them for pair noise.

Subject(s)

Machine Learning , Neural Networks, Computer , Humans

Improved Distance Functions for Instance-Based Text Classification.

El Hindi, Khalil; Abu Shawar, Bayan; Aljulaidan, Reem; Alsalamn, Hussien.

Comput Intell Neurosci ; 2020: 4717984, 2020.

Article in English | MEDLINE | ID: mdl-33299391

ABSTRACT

Text classification has many applications in text processing and information retrieval. Instance-based learning (IBL) is among the top-performing text classification methods. However, its effectiveness depends on the distance function it uses to determine similar documents. In this study, we evaluate some popular distance measures' performance and propose new ones that exploit word frequencies and the ordinal relationship between them. In particular, we propose new distance measures that are based on the value distance metric (VDM) and the inverted specific-class distance measure (ISCDM). The proposed measures are suitable for documents represented as vectors of word frequencies. We compare these measures' performance with their original counterparts and with powerful Naïve Bayesian-based text classification algorithms. We evaluate the proposed distance measures using the kNN algorithm on 18 benchmark text classification datasets. Our empirical results reveal that the distance metrics for nominal values render better classification results for text classification than the Euclidean distance measure for numeric values. Furthermore, our results indicate that ISCDM substantially outperforms VDM, but it is also more susceptible to make use of the ordinal nature of term-frequencies than VDM. Thus, we were able to propose more ISCDM-based distance measures for text classification than VDM-based measures. We also compare the proposed distance measures with Naïve Bayesian-based text classification, namely, multinomial Naïve Bayes (MNB), complement Naïve Bayes (CNB), and the one-versus-all-but-one (OVA) model. It turned out that when kNN uses some of the proposed measures, it outperforms NB-based text classifiers for most datasets.

Subject(s)

Algorithms , Information Storage and Retrieval , Bayes Theorem

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL