Table of Contents

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Discoverable on Google Scholar

DOI Registered (Zenodo)

Citations & Metrics

Volume 12, Issue 2 | Published: April 2026

Author: S.Elakkiya, J.Dhivya Dharshini, R.Kavya, R.Karthick Rajan, P.Sivapriyan

DOI: https://doi.org/{{doi}} • PDF: Download

Abstract

The rapid advancement of smart technologies has led to the increasing adoption of voice-based systems for home automation. Voice-controlled interfaces enable users to operate household appliances through spoken commands, providing a convenient and hands-free method of interaction. However, accurate recognition of speech commands remains challenging due to variations in speech patterns and background noise. This paper presents a voice command recognition system using a Bidirectional Long Short-Term Memory (Bi-LSTM) network. The system processes audio signals using Mel-Frequency Cepstral Coefficients (MFCC) and applies data augmentation techniques to improve robustness. The trained model classifies voice commands to control appliances such as TV, fan, and lights through a web-based interface. Experimental results show that the system achieves an accuracy of 94%, ensuring reliable performance and usability.

Keywords

Voice-Command Recognition (VCR), Bi-Directional LSTM (Bi-LSTM), Audio Preprocessing, Mel-Frequency Cepstral Coefficient (MFCC), Home Automation.

Conclusion

The project titled “Voice Command Recognition for Home Automation using Bi-Directional LSTM” was successfully developed to create a voice-based smart home automation system that allows users to control household appliances through simple voice commands. In this system, the user’s voice input is captured through a microphone and processed through stages such as audio preprocessing and feature extraction to obtain meaningful speech features. These features are then given to a Bi-Directional Long Short-Term Memory (Bi-LSTM) model, which analyzes the speech sequence in both forward and backward directions to better understand the context of the command. The proposed model achieved an accuracy of 94%, showing that the system can recognize voice commands with high reliability. Once the command is identified, the system sends the appropriate signal to control home appliances such as lights and fans. This system improves convenience by enabling hands-free control and reducing the need for manual switches or mobile applications. The proposed system can be applied in smart homes, assistive living environments for elderly and disabled individuals, and smart workplaces where voice-based automation can improve efficiency. In the future, the system can be enhanced by supporting multiple languages, larger datasets, and integration with IoT platforms for better automation and remote control. Overall, the project demonstrates that Bi-Directional LSTM is an effective approach for accurate voice command recognition in home automation systems.

References

[1]A. Chandran, A. Anu, and V. Raj, “IoT Based Smart Home Automation System,” IEEE, vol. 7, no. 3, pp. 45–50, 2019. [2]D. Yu, G. Hinton, and L. Deng, “Application of DNN for Voice Recognition System,” IEEE Transactions on Audio, Speech and Language Processing, vol. 26, no. 5, pp. 1020–1030, 2018. [3]I. Ihor, H. Heorhii, and O. Oleksii, “Application of Deep Neural Network for Real-Time Voice Command Recognition,” IEEE Access, vol. 10, pp. 55678–55687, 2022. [4]J. Oruh and S. Viriri, “Speech Recognition using Long Short-Term Memory,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 4, pp. 1205–1215, 2022. [5]K. Amin and R. Khalil, “Real-Time Speech Recognition using Deep Learning Techniques,” IEEE Access, vol. 7, pp. 135245–135255, 2019. [6]L. Filipe and R. Silva Peres, “Voice Controlled Home Automation using Machine Learning,” IEEE Internet of Things Journal, vol. 8, no. 6, pp. 4521–4530, 2021. [7]P. Kumar, S. Bhudhani, and T. Malche, “Voice Activated Home Automation using TinyML,” Springer Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 2, pp. 987–995, 2025. [8]V. Venkateswarlu, V. Kumar, and V. Jayasri, “Speech Recognition using Recurrent Neural Network,” International Journal of Scientific and Engineering Research, vol. 8, no. 10, pp. 1120–1125, 2017. [9]A. Graves, N. Jaitly, and A. Mohamed, “Speech Recognition with Deep Recurrent Neural Networks,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649, 2013. [10]S. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980. [11]H. Sak, A. Senior, and F. Beaufays, “Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 338–342, 2014. [12]T. N. Sainath and B. Li, “Deep Convolutional Neural Networks for Large-Scale Speech Tasks,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 29–39, 2012. [13]F. Eyben, M. Wöllmer, and B. Schuller, “OpenSMILE: The Munich Versatile and Fast Open-Source Audio Feature Extractor,” Proceedings of the ACM Multimedia Conference, pp. 1459–1462, 2010. [14]A. Graves and J. Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM Networks,” Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 2047–2052, 2005. [15]J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291–298, 1994.