Table of Contents

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Discoverable on Google Scholar

DOI Registered (Zenodo)

Citations & Metrics

Volume 12, Issue 3 | Published: June 2026

Author: Rushikesh Ganesh Wagh, Sarthak Anil Thorat, Rohan Bhausaheb Pohakar, Prof. S. Y. Mandlik

DOI: https://doi.org/{{doi}} • PDF: Download

Abstract

The emergence of advanced synthetic media, particularly deepfakes generated via Generative Adversarial Networks (GANs) and diffusion models, has created a critical demand for forensic detection models capable of generalizing across diverse manipulation methods. Conventional convolutional neural networks (CNNs), while achieving high accuracy on closed-set benchmarks, exhibit a significant “generalization gap” when exposed to novel forgery techniques or low-quality social media content. This paper presents the final empirical evaluation and comprehensive performance analysis of the Meta-learning EfficientNet Vision Transformer (MEViT) framework—a hybrid architecture that integrates EfficientNet for local texture feature extraction with Vision Transformers (ViT) for global context modelling. The optimization strategy employs Pair-Discrimination Loss (PDL) and Domain Adjustment Loss (DAL) within an episodic meta-learning schedule to bridge the generalization gap. Extensive experiments on FaceForensics++ (FF++) and Celeb-DF benchmarks demonstrate that MEViT achieves 98.4% average detection accuracy on FF++ (c23) and maintains a strong AUC of 89.2% on the unseen Celeb-DF dataset—surpassing Xception, EfficientNet-B7, and Multi-Domain Transformer baselines by significant margins. Ablation studies confirm the indispensable contribution of each architectural component, and comparative analyses with multimodal systems validate the competitiveness of the visual-only MEViT approach. Explainability analysis via Grad-CAM further demonstrates that MEViT correctly localizes forensic artifacts in facial regions. These results establish MEViT as a robust, generalizable, and practically deployable solution for next-generation digital forensics.

Keywords

Deepfake Detection, MEViT, Meta-Learning, Vision Transformer (ViT), EfficientNet, Generalization, Pair-Discrimination Loss, Domain Adjustment Loss, Digital Forensics, Explainable AI (XAI), FaceForensics++, Celeb-DF.

Conclusion

References

[1]R. Wagh, S. Thorat, R. Pohakar, S. Y. Mandlik, and A. A. Khatri, “AI-powered detection of deepfakes using EfficientNet and Vision Transformer,” Int. J. Adv. Res. Sci. Commun. Technol. (IJARSCT), vol. 6, no. 9, Nov. 2025. [2]V.-N. Tran, H.-S. Le, P. Choi, S.-H. Lee, and K.-R. Kwon, “MEViT: Generalization of deepfake detection with meta-learning EfficientNet Vision Transformer,” IEEE Open J. Comput. Soc., vol. 6, pp. 104–118, May 2025. [3]Q. Man, S.-J. Gee, and Y.-I. Cho, “Multi-domain perception transformer for generalized forgery image detection,” Appl. Sci., vol. 16, no. 1, Art. no. 533, Dec. 2025. [4]K. L. Shu and M.-J.-S. Wang, “Multi-domain feature fusion transformer with cross-domain robustness for facial expression recognition,” Symmetry, vol. 17, no. 1, Art. no. 88, Dec. 2025. [5]T. Tong and D. Anastasiu, “Deepfake detection using spatiotemporal methods and vision-language models,” in Proc. 31st ACM SIGKDD (KDD ’25), Aug. 2025, pp. 1–12. [6]A. Yermakov, J. Cech, J. Matas, and M. Fritz, “Deepfake detection that generalizes across benchmarks,” arXiv:2508.06248, Aug. 2025. [7]F. Shaad, “Multi-modal deepfake detection: Analyzing video, audio, and text for enhanced forgery identification,” ResearchGate, Jan. 2025. [8]L. K. Joshi and S. Joshi, “Deepfake detection using multimodal AI,” Int. J. Res. Innov. Appl. Sci. (IJRIAS), vol. 10, no. 5, pp. 355–357, May 2025. [9]M. Wang, “Deepfake detection: A multimodal survey,” ITM Web Conf., vol. 78, Art. no. 02027, 2025. [10]S. AlMuhaideb, H. Alshaya, L. Almutairi, D. Alomran, and S. T. Alhamed, “LightFakeDetect: A lightweight model for deepfake detection in videos that focuses on facial regions,” Mathematics, vol. 13, no. 19, Art. no. 3088, Sep. 2025. [11]S. Peng et al., “Wmamba: Wavelet-based Mamba for face forgery detection,” in Proc. 33rd ACM Int. Conf. Multimedia (MM ’25), Oct. 2025. [12]N. Mansoor and A. I. Iliev, “Explainable AI for deepfake detection,” Appl. Sci., vol. 15, no. 2, Art. no. 725, Jan. 2025. [13]H. Qian et al., “From black boxes to glass boxes: Explainable AI for trustworthy deepfake forensics,” Cryptography, vol. 9, no. 4, Art. no. 61, Dec. 2025. [14]X. Hu, “A comprehensive evaluation of deepfake detection methods: Approaches, challenges and future prospects,” ITM Web Conf., vol. 73, Art. no. 03002, 2025. A. Ashraf Bekheet, A. S. Ghoneim, and G. Khoriba, “Unmasking the digital deception: A comprehensive survey of large vision models for deepfake detection,” Inform. Bull., vol. 7, no. 2, 2025.

Cite this article

APA

Rushikesh Ganesh Wagh, Sarthak Anil Thorat, Rohan Bhausaheb Pohakar, Prof. S. Y. Mandlik (June 2026). Empirical Evaluation and Optimization of the MEViT Framework for Generalized Deepfake Detection. International Journal of Engineering and Techniques (IJET), 12(3). https://doi.org/{{doi}}

Rushikesh Ganesh Wagh, Sarthak Anil Thorat, Rohan Bhausaheb Pohakar, Prof. S. Y. Mandlik, “Empirical Evaluation and Optimization of the MEViT Framework for Generalized Deepfake Detection,” International Journal of Engineering and Techniques (IJET), vol. 12, no. 3, June 2026, doi: {{doi}}.