Phishing and related cyber threats are becoming increasingly sophisticated, with email-based phishing remaining the most persistent attack vector. These attacks exploit human vulnerabilities to deliver malware or gain unauthorized access to sensitive information. Transformer-based models enhance phishing detection through robust contextual language understanding; yet they are often regarded as black boxes due to a lack of interpretability. Moreover, recent AI-enabled attacks further undermine model resilience. To address these challenges, this work proposes a lightweight phishing detection framework based on DistilBERT, a lightweight Transformer model. Robustness to embedding-level perturbations and character-level input noise is enhanced through gradient-based adversarial training using the Fast Gradient Method (FGM), combined with stochastic character-level perturbations. To improve transparency, three prominent Explainable AI (XAI) methods, LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradients), are integrated to interpret model decision-making. A structured rule-based prompt combines model predictions and XAI features to guide Flan-T5-Small in generating plain-language, evidence-based explanations. Experimental results demonstrate that the proposed framework outperforms a standard DistilBERT-based detection model trained without robustness enhancements in terms of accuracy and resilience. This integrated approach helps bridge the gap between model reliability and user trust, advancing transparent phishing detection.
A Robust and Explainable Transformer-Based Framework for Phishing Email Detection
Phishing and related cyber threats are becoming increasingly sophisticated, with email-based phishing remaining the most persistent attack vector. These attacks exploit human vulnerabilities to deliver malware or gain unauthorized access to sensitive information.
- Year
- 2025
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2511.12085CC-BY-4.0
- TL;DR
- Semantic Scholar