HELPHED: Hybrid Ensemble Learning PHishing Email Detection

Bountakas, P.; Xenakis, C.

Bountakas, P.; Xenakis, C..

Journal of Network and Computer Applications ; 210, 2023.

Article in English | Scopus | ID: covidwho-2239325

ABSTRACT

ABSTRACT

Phishing email attack is a dominant cyber-criminal strategy for decades. Despite its longevity, it has evolved during the COVID-19 pandemic, indicating that adversaries exploit critical situations to lure victims. Plenty of detectors have been proposed over the years, which mainly focus on the contents or the textual information of emails;however, to cope with the evolution of phishing emails more sophisticated approaches should be introduced that will exploit all the emails' traits to enhance the detection capability of Machine Learning/Deep Learning classifiers. To tackle the limitations of existing works, this paper proposes a phishing email detection methodology, named HELPHED that focuses on the detection of phishing emails by combining Ensemble Learning methods with hybrid features. The hybrid features provide an accurate representation of emails by fusing their content and textual traits. We propose two methods of HELPHED, the first one employs the Stacking Ensemble Learning method, while the second method utilizes the Soft Voting Ensemble Learning. Both methods deploy two different Machine Learning algorithms to handle the hybrid features separately, yet in parallel, minimizing the features' complexity and improving the model's performance. A thorough evaluation analysis is carried out considering innovative guidelines that aim to prevent partial and misleading results. Experimental tests verified that the combination of hybrid features with Ensemble Learning, overall, accomplishes better detection performance than when employing only content-based or text-based features. Numerical results on a rich imbalanced dataset (i.e., 32,051 benign and 3,460 phishing email samples) that considers the evolution of phishing emails show that Soft Voting Ensemble Learning outperforms other prominent Machine Learning/Deep Learning algorithms and existing works yielding F1-score equal to 0.9942. © 2022 Elsevier Ltd

Keywords

Classification (of information); Computer crime; Cybersecurity; Feature extraction; Learning algorithms; Learning systems; Natural language processing systems; Email Detection; Ensemble learning; Hybrid features; Language processing; Learning methods; Machine-learning; Natural language processing; Natural languages; Phishing; Phishing email detection; Electronic mail; Machine Learning

Fulltext

XML

Search on Google

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: Journal of Network and Computer Applications Year: 2023 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

Search on Google