Search | VHL Regional Portal

Evaluating Word Embedding Feature Extraction Techniques for Host-Based Intrusion Detection Systems.

Mvula, Paul K; Branco, Paula; Jourdan, Guy-Vincent; Viktor, Herna L.

Discov Data ; 1(1): 2, 2023.

Article in English | MEDLINE | ID: mdl-37035459

ABSTRACT

Research into Intrusion and Anomaly Detectors at the Host level typically pays much attention to extracting attributes from system call traces. These include window-based, Hidden Markov Models, and sequence-model-based attributes. Recently, several works have been focusing on sequence-model-based feature extractors, specifically Word2Vec and GloVe, to extract embeddings from the system call traces due to their ability to capture semantic relationships among system calls. However, due to the nature of the data, these extractors introduce inconsistencies in the extracted features, causing the Machine Learning models built on them to yield inaccurate and potentially misleading results. In this paper, we first highlight the research challenges posed by these extractors. Then, we conduct experiments with new feature sets assessing their suitability to address the detected issues. Our experiments show that Word2Vec is prone to introducing more duplicated samples than GloVe. Regarding the solutions proposed, we found that concatenating the embedding vectors generated by Word2Vec and GloVe yields the overall best balanced accuracy. In addition to resolving the challenge of data leakage, this approach enables an improvement in performance relative to other alternatives.

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning.

Mvula, Paul K; Branco, Paula; Jourdan, Guy-Vincent; Viktor, Herna L.

Discov Data ; 1(1): 4, 2023.

Article in English | MEDLINE | ID: mdl-37038388

ABSTRACT

In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.

Deformable Protein Shape Classification Based on Deep Learning, and the Fractional Fokker-Planck and Kähler-Dirac Equations.

Paquet, Eric; Viktor, Herna L; Madi, Kamel; Wu, Junzheng.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 391-407, 2023 01.

Article in English | MEDLINE | ID: mdl-35085073

ABSTRACT

The classification of deformable protein shapes, based solely on their macromolecular surfaces, is a challenging problem in protein-protein interaction prediction and protein design. Shape classification is made difficult by the fact that proteins are dynamic, flexible entities with high geometrical complexity. In this paper, we introduce a novel description for such deformable shapes. This description is based on the bifractional Fokker-Planck and Dirac-Kähler equations. These equations analyse and probe protein shapes in terms of a scalar, vectorial and non-commuting quaternionic field, allowing for a more comprehensive description of the protein shapes. An underlying non-Markovian Lévy random walk establishes geometrical relationships between distant regions while recalling previous analyses. Classification is performed with a multiobjective deep hierarchical pyramidal neural network, thus performing a multilevel analysis of the description. Our approach is applied to the SHREC'19 dataset for deformable protein shapes classification and to the SHREC'16 dataset for deformable partial shapes classification, demonstrating the effectiveness and generality of our approach.

Subject(s)

Algorithms , Deep Learning , Neural Networks, Computer

COVID-19 malicious domain names classification.

Mvula, Paul K; Branco, Paula; Jourdan, Guy-Vincent; Viktor, Herna L.

Expert Syst Appl ; 204: 117553, 2022 Oct 15.

Article in English | MEDLINE | ID: mdl-35611122

ABSTRACT

Due to the rapid technological advances that have been made over the years, more people are changing their way of living from traditional ways of doing business to those featuring greater use of electronic resources. This transition has attracted (and continues to attract) the attention of cybercriminals, referred to in this article as "attackers", who make use of the structure of the Internet to commit cybercrimes, such as phishing, in order to trick users into revealing sensitive data, including personal information, banking and credit card details, IDs, passwords, and more important information via replicas of legitimate websites of trusted organizations. In our digital society, the COVID-19 pandemic represents an unprecedented situation. As a result, many individuals were left vulnerable to cyberattacks while attempting to gather credible information about this alarming situation. Unfortunately, by taking advantage of this situation, specific attacks associated with the pandemic dramatically increased. Regrettably, cyberattacks do not appear to be abating. For this reason, cyber-security corporations and researchers must constantly develop effective and innovative solutions to tackle this growing issue. Although several anti-phishing approaches are already in use, such as the use of blacklists, visuals, heuristics, and other protective solutions, they cannot efficiently prevent imminent phishing attacks. In this paper, we propose machine learning models that use a limited number of features to classify COVID-19-related domain names as either malicious or legitimate. Our primary results show that a small set of carefully extracted lexical features, from domain names, can allow models to yield high scores; additionally, the number of subdomain levels as a feature can have a large influence on the predictions.

Isometrically invariant and allosterically aware description of deformable macromolecular surfaces: Application to the viral neuraminidase.

Paquet, Eric; Viktor, Herna L.

Vaccine ; 33(48): 6930-7, 2015 Nov 27.

Article in English | MEDLINE | ID: mdl-26413882

ABSTRACT

MOTIVATION: The macromolecular surfaces associated with proteins and macromolecules play a key role in determining their functionality and interactions, and are also of importance in structural analysis and classification. As a result of their interaction with their environment, the macromolecular surfaces experience random conformational deformations. Consequently, a realistic description of the molecular surface must be invariant under these deformations. Further, the motion associated with disconnected regions on the molecular surface may be correlated. This property is known as the allosteric effect. In this paper, we address these two requirements. To this end, we propose an approach based on discrete differential geometry and the fractional Fokker-Planck equation which provides an isometrically invariant and allosteric aware description of macromolecular surfaces. Our method is applied to the influenza neuraminidase.

Subject(s)

Neuraminidase/chemistry , Viral Proteins/chemistry , Allosteric Regulation , Chemical Phenomena , Protein Conformation , Surface Properties

Molecular dynamics, monte carlo simulations, and langevin dynamics: a computational review.

Paquet, Eric; Viktor, Herna L.

Biomed Res Int ; 2015: 183918, 2015.

Article in English | MEDLINE | ID: mdl-25785262

ABSTRACT

Macromolecular structures, such as neuraminidases, hemagglutinins, and monoclonal antibodies, are not rigid entities. Rather, they are characterised by their flexibility, which is the result of the interaction and collective motion of their constituent atoms. This conformational diversity has a significant impact on their physicochemical and biological properties. Among these are their structural stability, the transport of ions through the M2 channel, drug resistance, macromolecular docking, binding energy, and rational epitope design. To assess these properties and to calculate the associated thermodynamical observables, the conformational space must be efficiently sampled and the dynamic of the constituent atoms must be simulated. This paper presents algorithms and techniques that address the abovementioned issues. To this end, a computational review of molecular dynamics, Monte Carlo simulations, Langevin dynamics, and free energy calculation is presented. The exposition is made from first principles to promote a better understanding of the potentialities, limitations, applications, and interrelations of these computational methods.

Subject(s)

Computers, Molecular , Molecular Dynamics Simulation , Molecular Structure , Monte Carlo Method , Humans

Macromolecular structure comparison and docking: an algorithmic review.

Paquet, Eric; Viktor, Herna L.

Curr Pharm Des ; 19(12): 2183-93, 2013.

Article in English | MEDLINE | ID: mdl-23016846

ABSTRACT

The comparison of macromolecular structures, in terms of functionalities, is a crucial step when aiming to identify potential docking sites. Drug designers require the identification of such docking sites for the binding of two proteins, in order to form a stable complex. This paper presents a review of current approaches to macromolecular structure comparison and docking, following an algorithmic approach. We describe techniques based on the Bayesian framework, kernel-based methods, projection-based techniques and spectral approaches. We introduce the use of quantum particle swarm optimization, in order to aid us to find the most appropriate docking sites. We discuss the importance of the heat and Schrodinger equations to address the non-rigid nature of proteins and highlight the strengths and limitations of the various methods.

Subject(s)

Computational Biology , Models, Molecular , Multiprotein Complexes/chemistry , Algorithms , Animals , Bayes Theorem , Databases, Protein , Humans , Molecular Docking Simulation , Multiprotein Complexes/metabolism , Protein Conformation , Protein Stability , Quantum Theory

Addressing the docking problem: finding similar 3-D protein envelopes for computer-aided drug design.

Paquet, Eric; Viktor, Herna L.

Adv Exp Med Biol ; 680: 447-54, 2010.

Article in English | MEDLINE | ID: mdl-20865529

ABSTRACT

Consider a protein (P(X)) that has been identified, during drug design, to constitute a new breakthrough in the design of a drug for treating a terminal illness. That is, this protein has the ability to dock on active sites and mask the subsequent docking of harmful foreign proteins. Unfortunately, protein X has serious side effects and is therefore not suitable for use in drug design. Suppose another protein (P(Y)) with similar outer structure (or envelope) and functionality, but without these side effects, exists. Locating and using such an alternative protein has obvious benefits. This paper introduces an approach to locate such similar protein envelopes by considering their three-dimensional (3D) shapes. We present a system which indexes and searches a large 3D protein database and illustrate its effectiveness against a very large protein repository.

Subject(s)

Computer-Aided Design/statistics & numerical data , Drug Design , Proteins/chemistry , Proteins/metabolism , Binding Sites , Computational Biology , Computer Simulation , Databases, Protein , Models, Molecular , Protein Binding , Protein Conformation , Protein Folding , Proteomics

Exploring protein architecture using 3D shape-based signatures.

Paquet, Eric; Viktor, Herna L.

Annu Int Conf IEEE Eng Med Biol Soc ; 2007: 1204-8, 2007.

Article in English | MEDLINE | ID: mdl-18002179

ABSTRACT

Consider the scenario where, for a prescription drug designed to treat a terminal illness, a particular protein has been successfully identified as a crucial, beneficial component in the drug compound. However, this protein has contra-indications and causes severe adverse effects in a certain subset of the population. If another protein from the same family, with similar structure and functionality, but without these adverse effects, can be found, the subsequent modification of the harmful drug has obvious benefits. This paper describes a new indexing and similarity search system to retrieve such protein structure family members, based on their 3D shape. Our approach is translation, scale and rotation invariant, which eliminates the need for prior structure alignment. Our experimental evaluation against seven (7) diverse protein families indicate that our system accurately and precisely locate all members of a family. We further illustrate this by showing that our system precisely retrieves the Homo Sapiens Hemoglobin family members, against a database containing 26,000 protein structures.

Subject(s)

Models, Chemical , Models, Molecular , Proteins/chemistry , Proteins/ultrastructure , Sequence Analysis, Protein/methods , Amino Acid Sequence , Computer Simulation , Imaging, Three-Dimensional/methods , Molecular Sequence Data , Protein Conformation , Sequence Alignment/methods

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL