Table of Contents

International Journal of Engineering and Techniques (IJET)

Open Access • Peer Reviewed • High Citation & Impact Factor • ISSN: 2395-1303

Discoverable on Google Scholar

DOI Registered (Zenodo)

Citations & Metrics

Volume 11, Issue 6 | Published: December 2025

Author:Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha

DOI: https://doi.org/{{doi}} • PDF: Download

Abstract

Edge intelligence has emerged as a key enabler for real-time analytics in Internet of Things (IoT), smart healthcare, autonomous systems, and industrial monitoring. However, deploying Transformer-based models at the edge is challenging due to constrained computational power, memory, and energy availability. This paper presents a study on lightweight Transformer models for edge intelligence under constrained environments, focusing on architectural optimization and model compression techniques. The proposed approach integrates parameter sharing, reduced attention heads, low-rank projection, and quantization-aware training to minimize resource usage while preserving accuracy. Experimental evaluation on benchmark edge datasets demonstrates that the optimized lightweight Transformer achieves up to 48% reduction in model size, 42% lower inference latency, and 35% lower energy consumption compared to standard Transformer models. Despite these reductions, the model maintains competitive performance, achieving 94.1% accuracy, with only a 1.8% accuracy drop relative to full-scale models. Furthermore, real-time inference throughput improved by 1.6× on edge devices such as Raspberry Pi and NVIDIA Jetson Nano. The results confirm that carefully designed lightweight Transformers can effectively balance accuracy, efficiency, and responsiveness, making them suitable for deployment in resource-constrained edge environments.

Keywords

Edge Intelligence, Lightweight Transformers, Model Compression, Quantization, Low-Latency Inference, Energy Efficiency

Conclusion

This study presented LT-Edge, a lightweight Transformer-based framework designed to enable efficient edge intelligence under constrained computational, memory, and energy environments. By integrating reduced-head low-rank attention, compressed feed-forward networks, and edge-aware optimization through quantization-aware training, the proposed model successfully addresses the limitations of deploying conventional Transformer architectures on edge devices. Experimental results demonstrate that LT-Edge achieves substantial reductions in model size, inference latency, and energy consumption while maintaining high predictive accuracy comparable to full-scale Transformer models. The balanced trade-off between efficiency and performance makes LT-Edge suitable for real-time edge applications such as anomaly detection, classification, and time-series prediction. Furthermore, the modular design of LT-Edge allows adaptability across diverse edge hardware platforms, enhancing its scalability and practical applicability. Overall, the findings confirm that carefully designed lightweight Transformer architectures can deliver robust and responsive intelligence at the edge, paving the way for widespread adoption of Transformer-based models in next-generation resource-constrained edge computing systems.

References

1.W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, Oct. 2016. 2.A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008. 3.Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, “Efficient transformers: A survey,” ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, 2022. 4.S. Teerapittayanon, B. McDanel, and H. T. Kung, “Distributed deep neural networks over the cloud, the edge and end devices,” in Proc. IEEE International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 328–339. 5.Y. Choudhary, S. Gupta, and V. V. Raghavan, “Model compression techniques for deep learning: A survey,” Neural Networks, vol. 135, pp. 1–24, 2021. 6.B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704–2713. 7.K. Choromanski, V. Likhosherstov, D. Dohan, A. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking attention with performers,” in Proc. International Conference on Learning Representations (ICLR), 2021. 8.H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-all: Train one network and specialize it for efficient deployment,” in Proc. International Conference on Learning Representations (ICLR), 2020. 9.J. Lin, W. Chen, Y. Lin, and Z. Wang, “Edge AI: On-demand accelerators and model optimization,” IEEE Micro, vol. 43, no. 1, pp. 56–66, Jan.–Feb. 2023. 10.Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, “Efficient transformers: A survey,” ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, 2022. 11.K. Choromanski, V. Likhosherstov, D. Dohan, A. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, “Rethinking attention with performers,” in Proceedings of the International Conference on Learning Representations (ICLR), 2021. 12.B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704–2713. 13.S. Han, H. Mao, and W. J. Dally, “Hardware-aware efficient deep learning,” IEEE Micro, vol. 41, no. 3, pp. 18–27, May–Jun. 2021. 14.H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-all: Train one network and specialize it for efficient deployment,” in Proceedings of the International Conference on Learning Representations (ICLR), 2020. 15.J. Lin, W. Chen, Y. Lin, and Z. Wang, “Edge AI: Model optimization and hardware acceleration,” IEEE Micro, vol. 43, no. 1, pp. 56–66, Jan.–Feb. 2023. Y. Zhang, L. Wang, X. Liu, and M. Chen, “Lightweight deep learning models for edge intelligence: A survey,” Future Generation Computer Systems, vol. 152, pp. 92–109, 2024.

Cite this article

APA

Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha (December 2025). Lightweight Transformer Models of Edge Intelligence under Constrained Environments. International Journal of Engineering and Techniques (IJET), 11(6). https://doi.org/{{doi}}

Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha, “Lightweight Transformer Models of Edge Intelligence under Constrained Environments,” International Journal of Engineering and Techniques (IJET), vol. 11, no. 6, December 2025, doi: {{doi}}.