DataAnalyzer: An AI-Driven Framework for Automated Data Cleaning and Intelligent Business Analytics | IJET โ€“ Volume 12 Issue 2 | IJET-V12I2P64

International Journal of Engineering and Techniques (IJET) Logo

International Journal of Engineering and Techniques (IJET)

Open Access โ€ข Peer Reviewed โ€ข High Citation & Impact Factor โ€ข ISSN: 2395-1303

Volume 12, Issue 2  |  Published: April 2026

Author: Affan Khan, Mohammed Bagdadi, Ayaan Khan, Armaan Khilji, Anas Dange

DOI: https://doi.org/{{doi}}  โ€ข  PDF: Download

Abstract

Effective data analytics and machine learning depend heavily on data quality, but real-world datasets frequently have missing values, outliers, duplicate records, and structural irregularities that lower analytical reliability. An AI-based interactive data cleaning and analysis platform that automates the data preparation lifecycle and facilitates effective data exploration is presented in this work. Python and the Streamlit framework are used in the development of the suggested system, which incorporates automated data cleaning methods such as KNN-based imputation for managing missing values and Isolation Forest for anomaly detection. The platform offers an interactive Exploratory Data Analysis (EDA) module that facilitates statistical summaries and visual data exploration in addition to preprocessing. The system includes an analytics interface based on the Large Language Model (LLM) to improve usability. The integration of this method bridges the gap between automated data cleaning and intuitive data interpretation. The experimental use of the system demonstrates that this proposed approach reduces manual preprocessing effort and also improves dataset readiness for downstream analytics and various machine learning tasks.

Keywords

Data Cleaning, Artificial Intelligence, Data Preprocessing, Outlier Detection, Missing Value Imputation, Exploratory Data Analysis, Interactive Data Analytics.

Conclusion

Combining data exploration, cleaning, and insight generating, the recently created data analysis and visualization system provides a useful web-based solution. This integrated platform enables users to upload raw files, carry out preprocessing procedures, and instantly provide graphical summaries rather than requiring them to juggle several disparate tools [1][2]. The application uses automated purification techniques to ensure the dataset is extremely dependable before delving into in-depth research. By eliminating duplicate rows, imputing missing items, and identifying structural outliers, these built-in methods address typical data problems. After the data has been cleaned, users may rapidly determine how their variables connect to each other and what the overall data distributions look like by using interactive visual tools like box plots, histograms, and correlation heatmaps [3][4]. Additionally, the platform uses machine learning techniques to go beyond simple charts. The system may automatically uncover hidden structures and uncommon observations that might not be apparent during manual inspection by using methods like clustering, dimensionality reduction, and anomaly detection [5][6]. The AI-driven insight creation module is one of this setup’s main benefits. Instead of writing complicated code, users may query their datasets using everyday language thanks to its Large Language Model. Also, the system responds with summaries of the analytical findings in plain English. This particular feature removes technical barriers so that people with very little training in data science can easily analyse their data and also make data-driven and well-informed judgments [7][8]. In the end, the entire analytical workflow is significantly accelerated by combining automated data preparation, interactive graphics, and AI-powered querying into a single, easily accessible workspace. By adding more sophisticated predictive models to the pipeline and extending the backend to handle large or real-time data streams, future updates could further enhance these capabilities [9][10].

References

[1]Kabita Sahoo, Abhaya Kumar Samal, Jitendra Pramanik, Subhendu Kumar Pani (2019) . Exploratory Data Analysis using Python., International Journal of Innovative Technology and Exploring Engineering (IJITEE) [2]Vijay Panwar (2024) . AI-Powered Data Cleansing: Innovative Approaches for Ensuring Database Integrity and Accuracy., International Journal of Computer Trends and Technology (IJCTT) [3] Rahul Cherekar (2024). Automated Data Cleaning: AI Methods for Enhancing Data Quality and Consistency, International Journal of Emerging Trends in Computer Science and Information Technology (IJETCSIT) [4]Jingyu Zhu, Xintong Zhao, Yu Sun, Shaoxu Song, Xiaojie Yuan (2024). Relational Data Cleaning Meets Artificial Intelligence: A Survey, Data Science and Engineering [5] Shuo Zhang, Zezhou Huang, Eugene Wu (2024). Data Cleaning Using Large Language Models, arXiv:2410.15547 [cs.DB] [6] Alberto Sรกnchez Pรฉrez, Paolo Papotti, Alaa Boukhary, Luis Castejรณn Lozano, Adam Elwood (2025). An LLM-Based Approach for Insight Generation in Data Analysis, North American Chapter of the Association for Computational Linguistics (NAACL 2025) [7] Anjali Kapoor (2025). AI-Driven Data Cleaning: Intelligent Detection and Correction of Data Errors, International Journal of Computer Technology and Electronics Communication (IJCTEC) [8] Zuleaizal Sidek, Sharifah Sakinah Syed Ahmad, Noor Hasimah Ibrahim Teo (2025). Unsupervised outlier detection in high-dimensional text data: a comparative analysis, Bulletin of Electrical Engineering and Informatics [9]Sanjeet Singh, Geetika Madaan, HR Swapna, Amrinder Singh, Binay Kumar Pandey, A. Shaji George, Digvijay Pandey (2025). Unleashing the Power of AI and Data Analysis: Transforming Insights into Action, Interdisciplinary Approaches to AI, Internet of Everything, and Machine Learning (IGI Global) Alhassan Mumuni, Fuseini Mumuni (2025). Automated data processing and feature engineering for deep learning and big data applications: A survey, Journal of Information and Intelligence

Cite this article

APA
Affan Khan, Mohammed Bagdadi, Ayaan Khan, Armaan Khilji, Anas Dange (April 2026). DataAnalyzer: An AI-Driven Framework for Automated Data Cleaning and Intelligent Business Analytics International Journal of Engineering and Techniques (IJET), 12(2). https://doi.org/{{doi}}
Affan Khan, Mohammed Bagdadi, Ayaan Khan, Armaan Khilji, Anas Dange, โ€œDataAnalyzer: An AI-Driven Framework for Automated Data Cleaning and Intelligent Business Analytics,โ€ International Journal of Engineering and Techniques (IJET), vol. 12, no. 2, April 2026, doi: {{doi}}.
Submit Your Paper