SYNTHETIC DATA GENERATOR | IJET – Volume 12 Issue 2 | IJET-V12I2P163

International Journal of Engineering and Techniques (IJET) Logo

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Volume 12, Issue 2  |  Published: April 2026

Author: Ms.Neha Kulshrestha, Saurabh Mishra, Shriyansh Tiwari

DOI: https://doi.org/{{doi}}  •  PDF: Download

Abstract

The growing demand for high-quality datasets in machine learning and artificial intelligence is often constrained by issues such as data scarcity, high collection costs, and strict privacy regulations. This paper presents a Synthetic Data Generation System (SDGS), a scalable and modular framework designed to generate high-fidelity artificial datasets across multiple modalities, including images, text, and tabular data. The system integrates advanced generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models to replicate the statistical properties of real-world data while preserving privacy

Keywords

{{keywords}}

Conclusion

The development of the Synthetic Data Generation System (SDGS) represents a critical advancement in overcoming the data scarcity and privacy hurdles that frequently stall artificial intelligence research. By leveraging a suite of cutting-edge generative frameworks—including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models—the proposed platform enables the creation of high-fidelity, labeled datasets from minimal initial inputs. This system democratizes access to sophisticated AI tools, allowing researchers and small-scale developers to augment their datasets efficiently without requiring deep programming expertise or massive computational budgets. The implementation of a modular architecture ensures that the system is not only powerful but also accessible and secure. By decoupling the user interface from the intensive model training backend, the platform provides a seamless workflow where data is ingested, preprocessed, synthesized, and rigorously evaluated for quality. The experimental results demonstrate that the generated synthetic samples maintain the statistical essence and diversity of real-world data, making them highly effective for training robust machine learning models. Ultimately, this project serves as a scalable solution to the rising demand for ethical and privacy-preserving data in modern AI applications.

References

[1][1] Lyu, B., & Song, R.: Controllable Image Generation with Conditional Diffusion Models. AAAI Conference on Artificial Intelligence (2024). [2][2] Chen, H., Zhang, Z., & Zhao, J.: Synthetic Image Generation Using Multi-Modal GANs for Model Training. IEEE Transactions on Neural Networks and Learning Systems (2024). [3][3] Azizi, R., et al.: Synthetic Data for Improving Medical Imaging Diagnosis. Nature Communications (2023). [4][4] Sharma, A., et al.: Hybrid VAE-GAN for Data Augmentation in Machine Learning. Springer Nature (2023). [5][5] Woo, J., & Kim, H.: Synthetic Data Generation for Privacy- Preserving Machine Learning. Springer – Data Mining & Knowledge Discovery (2023). [6][6] Tuli, R., & Narayanan, P.: Label-Efficient Learning with Semi- Supervised Synthetic Data. ACM SIGKDD (2023). [7][7] Niemeyer, T., & Geiger, A.: Generating Training Data with Neural Rendering. IEEE Conference on 3D Vision (2022). [8][8] Gupta, R., et al.: Synthetic Data Generation for Autonomous Vehicles. IEEE Access (2022). [9][9] Stable Diffusion for Large-Scale Synthetic Data Creation. Stability AI Research (2022). [10][10] Dhariwal, P., & Nichol, A.: Data Augmentation Using Diffusion Models. NeurIPS (2021). [11][11] King, L., & Tan, S.: Synthetic Data Generation Using Variational Autoencoders for Small Dataset Learning. Elsevier – Expert Systems with Applications (2021). [12][12] Ho, J., Jain, A., & Abbeel, P.: Denoising Diffusion Probabilistic Models (DDPM). NeurIPS (2020). [13][13] Karras, T., et al.: StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN. CVPR (2020). [14][14] Xu, S., & Park, H.: TabularGAN: A GAN Architecture for Synthetic Structured Data. ACM Transactions on Data Science (2020). [15][15] OpenAI Research Team: Text-to-Image Synthesis using Generative Transformers (DALL·E). OpenAI Blog / arXiv (2020). [16][16] Tanaka, H., et al.: Neural Style Transfer for Synthetic Data Diversity. ACM Digital Library (2020). [17] Chawla, N., et al.: Improving Classification with Synthetic Minority Over-Sampling Technique (SMOTE). Journal of Artificial Intelligence Research (2020).

Cite this article

APA
Ms.Neha Kulshrestha, Saurabh Mishra, Shriyansh Tiwari (April 2026). SYNTHETIC DATA GENERATOR. International Journal of Engineering and Techniques (IJET), 12(2). https://doi.org/{{doi}}
Ms.Neha Kulshrestha, Saurabh Mishra, Shriyansh Tiwari, “SYNTHETIC DATA GENERATOR,” International Journal of Engineering and Techniques (IJET), vol. 12, no. 2, April 2026, doi: {{doi}}.
Submit Your Paper