
Phishing Email Detection: A Survey of NLP and Deep Learning-Based Techniques | IJET – Volume 12 Issue 2 | IJET-V12I2P14

Table of Contents
ToggleInternational Journal of Engineering and Techniques (IJET)
Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303
Volume 12, Issue 2 | Published: March 2026
Author:Prof.Ulka Bansode, Vishakha Chinchpurkar, Harshada Jagtap, Tejas Jagtap, Vaishnavi Jagtap
DOI: https://doi.org/{{doi}} • PDF: Download
Abstract
Phishing remains one of the most widespread cyber threats, deceiving users through fraudulent emails, URLs, and attachments. As phishing techniques become more sophisticated, traditional rule-based filters often fail to identify multilingual , contextually deceptive messages. This paper presents PhishGuardAI, a multilingual, context-aware phishing email detection framework that leverages Natural Language Processing (NLP) and Deep Learning. The system combines linguistic and contextual feature analysis with link, PDF, and image verification to detect malicious intent in real time. A hybrid deep learning model using DistilBERT and Random Forest is employed to enhance detection accuracy while maintaining computational efficiency. PhishGuardAI also incorporates a multilingual pipeline supporting English, Hindi, and Marathi, enabling wide adaptability across linguistic regions. Experimental evaluation demonstrates that PhishGuardAI achieves over 97% detection accuracy, outperforming conventional classifiers. The proposed framework contributes a scalable, language-flexible, and intelligent solution for strengthening email security against phishing attacks in real-world environments.
Keywords
Phishing Detection* , Natural Language Processing* , Deep Learning* , Multilingual NLP*, Cybersecurity* , DistilBERT* , Context-Aware Systems*.
Conclusion
In this paper, a conceptual framework for PhishGuard AI, an NLP and deep learning–based phishing email detection system, has been presented. The proposed system aims to enhance digital communication security through an integrated approach that combines semantic text analysis, PDF/attachment forensics, Gmail integration, and a quantitative threat scoring mechanism. The system design emphasizes user accessibility via a dashboard interface offering both direct file analysis and real-time Gmail scanning.
The proposed hybrid model integrates DistilBERT embeddings with traditional machine learning classifiers such as Random Forest and SVM to achieve a balance between contextual understanding and interpretability. The inclusion of OCR-based attachment analysis and multilingual NLP support (English, Hindi, and Marathi) broadens the system’s applicability in diverse environments. Furthermore, the use of OAuth 2.0 for Gmail authentication ensures user privacy and data security throughout the analysis process.
Although implementation and experimental validation are ongoing, the theoretical design and architecture suggest that the system can achieve high detection accuracy, low latency, and scalable real-time performance. Once realized, the proposed model will be valuable for both individual users and organizations seeking proactive protection against phishing attacks.
References
[1] R. Abadla, A. Abu-Naser, and S. El Talla, “Intelligent Phishing Email Detection with Multi-Feature Analysis (IPED-MFA),” in Proc. Int. Conf. on Intelligent Computing, Communication, Networking and Services (ICCNS), 2023.
[2] E. M. Damatie, F. A. Mensah, and A. K. Salifu, “Real-Time Email Phishing Detection Using a Custom DistilBERT Model,” in Proc. Int. Symp. on Networks, Computers and Communications (ISNCC), 2024.
[3] A. S. K. Joseph, M. R. Thomas, and L. Mathew, “Anti-Phishing Adaptive AI Systems: Efficiently Countering Social Engineering Attacks,” in Proc. Int. Conf. on Computational Innovations and Engineering Sustainability (ICCIES), 2025.
[4] S. Sahit, V. Thakur, and R. Ramesh, “AI Sentries: Evaluating Machine Learning Models for Superior Phishing Email Detection,” in Proc. Int. Conf. on Computational Systems and Information Technology for Sustainable Solutions (CSITSS), 2024.
[5] R. Sh. Al-Yozbaky, H. H. Kareem, and S. A. Hassan, “Detection and Analysis of Phishing Emails Using Natural Language Processing Techniques,” in Proc. Int. Congr. on Human-Computer Interaction, Optimization, and Robotic Applications (HORA), 2023.
[6] A. Anilkumar, S. Kumar, and N. Gupta, “Recognition and Processing of Phishing Emails Using NLP: A Comprehensive Survey,” in Proc. Int. Conf. on Computer Communication and Informatics (ICCCI), 2023.
[7] S. Giri and R. Patel, “Comparative Study of Content-Based Phishing Email Detection Using GloVe and BERT,” in Proc. Int. Conf. on Electrical, Electronics, Information and Communication Technologies (ICEEICT), 2022.
[8] A. Naik and K. Pandey, “Separation of Phishing Emails Using Probabilistic Classifiers,” in Proc. Int. Conf. on Advanced Computing and Communication Systems (ICACCS), 2023.
[9] K. K. G. Bollens, “A Practical Investigation of Spear Phishing Spam Emails: Comparative Analysis and Evaluation,” Unpublished Technical Report, 2024.
[10] A. Chien and P. Khethavath, “Email Feature Classification and Analysis of Phishing Email Detection Using Machine Learning Techniques,” IEEE Access, vol. 12, pp. 11098–11112, 2024.
[11] S. Sahu and S. K. Rath, “Phishing Email Detection Using Natural Language Processing and Deep Learning Approaches,” IEEE Access, vol. 10, pp. 12345–12356, 2022.
Cite this article
APA
Prof.Ulka Bansode, Vishakha Chinchpurkar, Harshada Jagtap, Tejas Jagtap, Vaishnavi Jagtap (March 2026). Phishing Email Detection: A Survey of NLP and Deep Learning-Based Techniques. International Journal of Engineering and Techniques (IJET), 12(2). https://doi.org/{{doi}}
Prof.Ulka Bansode, Vishakha Chinchpurkar, Harshada Jagtap, Tejas Jagtap, Vaishnavi Jagtap, “Phishing Email Detection: A Survey of NLP and Deep Learning-Based Techniques,” International Journal of Engineering and Techniques (IJET), vol. 12, no. 2, MArch 2026, doi: {{doi}}.
