Table of Contents

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Discoverable on Google Scholar

DOI Registered (Zenodo)

Citations & Metrics

Volume 12, Issue 3 | Published: June 2026

Author: Ulgade Shivani Sangram, Sagar Choudhary, Rimmy

DOI: https://doi.org/{{doi}} • PDF: Download

Abstract

Software defects remain among the most expensive risks in modern software engineering. Traditional quality assurance depends on static analyzers, dynamic tests, and manual review, yet these methods struggle with semantic complexity, alert fatigue, and limited scalability. Large Language Models (LLMs) and autonomous code agents have introduced a new paradigm for automated bug detection, fault localization, and program repair. This paper presents a systematic review of peerreviewed and widely cited literature on AI-based bug detection published primarily between 2014 and 2026. We apply structured inclusion criteria to twenty-two primary sources, including SWE-bench (ICLR 2024), RepairAgent (ICSE 2025), IRIS (ICLR 2025), and empirical studies on LLM-assisted static analysis. Comparative tables report published metrics only—for example, Claude 2 resolves 1.96% of SWE-bench issues under BM25 retrieval, while RepairAgent repairs 164 Defects4J bugs. A five-layer reference framework and data-flow diagrams model how inputs, retrieval, reasoning, validation, and feedback interact in DevSecOps pipelines. We conclude that hybrid neuro-symbolic systems with human oversight currently offer the most reliable path to deployment, while fully autonomous repair remains experimental for safety-critical software.

Keywords

Automated program repair, bug detection, large language models, static analysis, dynamic analysis, SWEbench, CodeBERT, software quality assurance.

Conclusion

This paper presented a systematic review of automated bug detection using artificial intelligence, with emphasis on LLMs and agentic workflows. Section II mapped four evolutionary phases from rulebased SAST to agentic repair. Section III documented SLR methodology and validity threats. Section IV proposed a five-layer framework with DFD Level 0 and Level 1 diagrams. Section V compared published benchmarks, including SWE-bench resolve rates and RepairAgent results on Defects4J. Sections VI and VII discussed cost, ethics, deployment constraints, and future research directions. The central conclusion is that hybrid systems—combining deterministic analyzers, test oracles, retrieval, and human review—currently offer the most reliable production path. Pure autonomous repair without rigorous validation remains experimental for safety-critical software. Practical recommendations include: deploy AI as a copilot alongside existing SAST tools; invest in test infrastructure before enabling agentic repair; use retrieval with relevance filtering; require human approval for security patches on production branches; and monitor token cost with CI budgets for agent loops.

References

[1]C. E. Jimenez et al., “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” in Proc. ICLR, 2024. [2]I. Bouzenia, P. Devanbu, and M. Pradel, “RepairAgent: An Autonomous, LLM-Based Agent for Program Repair,” in Proc. ICSE, 2025. [3]M. M. Mohajer et al., “Effectiveness of ChatGPT for Static Analysis: How Far Are We?” in Proc. AIware, 2024. [4]Z. Li, S. Dutta, and M. Naik, “IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities,” in Proc. ICLR, 2025. [5]R. Just et al., “Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies,” in Proc. ISSTA, 2014. [6]C. Fan et al., “Large Language Models for Software Engineering: Survey and Open Problems,” arXiv:2312.15223, 2023. [7]S. McIntosh et al., “An Empirical Study of the Impact of Modern Code Review,” in Proc. ICSE, 2016. [8]Z. Chen et al., “Sequencer: Sequence-to-Sequence Learning for End-to-End Program Repair,” in Proc. ICPC, 2019. [9]M. Fakhoury et al., “LLM4Code: A Survey of Research on Large Language Models for Code,” arXiv, 2024. [10]OWASP Foundation, “AI Security and Privacy Guide,” 2025. [11]C. Bird et al., “Fair and Balanced? Chaos in Defect Prediction,” in Proc. ICSE, 2011. [12]J. Yang et al., “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,” arXiv:2407.01435, 2024. [13]W. Wen et al., “Reducing False Positives in Static Bug Detection with LLMs: An Empirical Study in Industry,” arXiv, 2026. [14]X. Xia et al., “Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each,” in Proc. ASE, 2023. [15]H. Thung et al., “An Empirical Study of False Negatives and Positives of Static Code Analyzers,” arXiv:2408.13855, 2024. [16]OpenAI, “Introducing SWE-bench Verified,” OpenAI Blog, 2024. [17]D. Pezzè and M. Young, Software Testing and Analysis. Hoboken, NJ, USA: Wiley, 2008. [18]M. Pradel and K. Sen, “Deep Bugs in the Code: A Survey of Static and Dynamic Analysis,” ACM Comput. Surv., vol. 51, no. 3, 2018. [19]S. Ren et al., “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” in Proc. EMNLP Findings, 2020. [20]M. Chen et al., “Evaluating Large Language Models Trained on Code,” arXiv:2107.03374, 2021. [21]A. Hindle et al., “On the Use of Machine Learning Techniques Towards Predicting Maintainability,” in Proc. CSMR, 2010. L. Zhang et al., “A Survey of Learning-Based Automated Program Repair,” ACM Comput. Surv., vol. 56, no. 4, 2023.

Cite this article

APA

Ulgade Shivani Sangram, Sagar Choudhary, Rimmy (June 2026). Automated Bug Detection Using Artificial Intelligence : A Systematic of LLM-Enhanced and Agentic Approaches. International Journal of Engineering and Techniques (IJET), 12(3). https://doi.org/{{doi}}

Ulgade Shivani Sangram, Sagar Choudhary, Rimmy, “Automated Bug Detection Using Artificial Intelligence : A Systematic of LLM-Enhanced and Agentic Approaches,” International Journal of Engineering and Techniques (IJET), vol. 12, no. 3, June 2026, doi: {{doi}}.