VeriSafe: A Multi-Modal Job Fraud Detection Framework with Semantic Embeddings, Transformer Fine – Tuning And OSINT Integration | IJET – Volume 12 Issue 2 | IJET-V12I2P71

International Journal of Engineering and Techniques (IJET) Logo

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Volume 12, Issue 2  |  Published: April 2026

Author: Sruti Prusty, Kiran Kumari Nayak, Debasmita Sahu, Lavanya Kumari Raulo, Dr. Debashis Biswal

DOI: https://doi.org/{{doi}}  •  PDF: Download

Abstract

Online recruitment fraud (ORF) has become a major cybersecurity concern, with annual losses exceeding $500 million (≈ ₹41.5 crore). The rapid rise of fraudulent job postings on platforms like LinkedIn and Indeed highlights the need for effective detection systems. This paper presents VeriSafe, a production-ready multi-modal framework for detecting fraudulent job postings. The system is trained on a large multi-source dataset of 40,842 samples and employs advanced preprocessing along with hybrid feature engineering using semantic embeddings and TF-IDF. VeriSafe integrates a dual-model ensemble combining XGBoost (AUC: 0.9979) and BERT-Tiny (F1-score: 0.9844), along with a multi-modal scoring mechanism incorporating job content, company data, and email signals. Experimental results show a +9.7% improvement in AUC over existing models, demonstrating high accuracy and efficiency. The system provides actionable outputs such as “DO NOT APPLY,” “REVIEW CAREFULLY,” and “SAFE TO APPLY,” enabling safer decision-making for job seekers.

Keywords

Online recruitment fraud, multi-modal learning, BERT-Tiny, XGBoost, dataset fusion, semantic embeddings, OSINT, fraud detection, cybersecurity.

Conclusion

This paper presented VeriSafe, a hybrid and production-oriented framework for detecting Online Recruitment Fraud (ORF) using large-scale, multi-source recruitment data combined with machine learning and transformer-based models. The proposed system integrates heterogeneous job-related datasets into a unified corpus of over 2.6 million records, which is subsequently refined into a high-quality dataset of 40,842 samples through preprocessing, semantic deduplication, and feature engineering. The experimental results demonstrate that job fraud detection is highly effective when leveraging both lexical and semantic features extracted from job titles and descriptions. Classical machine learning models showed strong performance, with Logistic Regression achieving an AUC of 0.9959, Random Forest achieving 0.9950, and XGBoost achieving the highest AUC of 0.9979. Additionally, the lightweight transformer model BERT-Tiny achieved outstanding results, including Accuracy of 0.9935 and F1-score of 0.9844, confirming the effectiveness of deep contextual representations in identifying fraudulent patterns. [8] The findings indicate that fraudulent job postings exhibit identifiable textual, structural, and semantic characteristics that can be consistently learned by classification models when supported by robust preprocessing and feature engineering. The combination of classical models and lightweight transformers provides an optimal balance between accuracy, computational efficiency, and deployability, making the system suitable for real-world applications such as recruitment platforms, browser extensions, and backend verification APIs. A key contribution of this work is the development of a complete end-to-end framework, extending beyond model training to include data ingestion, preprocessing, feature extraction, model integration, and deployment via a scalable API. This positions VeriSafe as a practical solution for mitigating online recruitment fraud and enhancing trust in digital hiring ecosystems. [9] However, certain limitations remain. Some components, such as company-level OSINT scoring and advanced external validation mechanisms, are currently implemented as extensible modules rather than fully integrated features. Additionally, parts of the labeling process rely on heuristic-based approaches, which may affect generalization in diverse real-world scenarios.

References

[1] Vidros, S., Kolias, C., Kambourakis, G., & Akoglu, L., “Automatic Detection of Online Recruitment Frauds,” IEEE, 2016. [2] Li, Y., & Wang, J., “BERT-Based Models for Fraud Detection in Online Recruitment Systems,” 2023. [3] Vidros et al., “Employment Scam Detection using Machine Learning”, 2017. [4] Sharma and Gupta, “Deep Learning Models for Fraud Detection”, 2023. [5] Roy, “Bi-LSTM Based Fraud Detection”, 2023. [6] Arsh Kon, “LinkedIn Job Postings Dataset,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/arshkon/linkedin-job-postings [7] PromptCloudHQ, “US Technology Jobs on Dice.com,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/PromptCloudHQ/us-technology-jobs-on-dicecom [8] R. S. Rana, “Job Description Dataset,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset [9] LokKaggle, “Glassdoor Data,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/lokkaggle/glassdoor-data [10] TheDevastator, “Upwork Jobs Dataset,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/thedevastator/upwork-jobs-a-dataset-for-researchers [11] Subha Journal, “Phishing Emails Dataset,” Kaggle Dataset. Available: https://www.kaggle.com/datasets/subhajournal/phishingemails [12] D. Rabin, “Phishing Emails Data,” Hugging Face Dataset. Available: https://huggingface.co/datasets/drorrabin/phishing-emails-data [13] P. Sharma, “BERT-Tiny (prajjwal1/bert-tiny),” Hugging Face Model. Available: https://huggingface.co/prajjwal1/bert-tiny [14] Sentence-Transformers, “MiniLM-L6-v2 (all-MiniLM-L6-v2),” Hugging Face Model. Available: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 [15] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [16] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proc. ACM SIGKDD, 2016, pp. 785–794. [17] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proc. EMNLP, 2020. [18] Sentence-Transformers, “Sentence Embeddings using Transformer Models,” 2019.

Cite this article

APA
Sruti Prusty, Kiran Kumari Nayak, Debasmita Sahu, Lavanya Kumari Raulo, Dr. Debashis Biswal (April 2026). VeriSafe: A Multi-Modal Job Fraud Detection Framework with Semantic Embeddings, Transformer Fine – Tuning And OSINT Integration. International Journal of Engineering and Techniques (IJET), 12(2). https://doi.org/{{doi}}
Sruti Prusty, Kiran Kumari Nayak, Debasmita Sahu, Lavanya Kumari Raulo, Dr. Debashis Biswal, “VeriSafe: A Multi-Modal Job Fraud Detection Framework with Semantic Embeddings, Transformer Fine – Tuning And OSINT Integration,” International Journal of Engineering and Techniques (IJET), vol. 12, no. 2, April 2026, doi: {{doi}}.
Submit Your Paper