Lightweight Transformer Models of Edge Intelligence under Constrained Environments | IJET – Volume 11 Issue 6 | IJET-V11I6P37

International Journal of Engineering and Techniques (IJET) Logo

International Journal of Engineering and Techniques (IJET)

Open Access β€’ Peer Reviewed β€’ High Citation & Impact Factor β€’ ISSN: 2395-1303

Volume 11, Issue 6  |  Published: December 2025

Author:Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha

DOI: https://doi.org/{{doi}}  β€’  PDF: Download

Abstract

Edge intelligence has emerged as a key enabler for real-time analytics in Internet of Things (IoT), smart healthcare, autonomous systems, and industrial monitoring. However, deploying Transformer-based models at the edge is challenging due to constrained computational power, memory, and energy availability. This paper presents a study on lightweight Transformer models for edge intelligence under constrained environments, focusing on architectural optimization and model compression techniques. The proposed approach integrates parameter sharing, reduced attention heads, low-rank projection, and quantization-aware training to minimize resource usage while preserving accuracy. Experimental evaluation on benchmark edge datasets demonstrates that the optimized lightweight Transformer achieves up to 48% reduction in model size, 42% lower inference latency, and 35% lower energy consumption compared to standard Transformer models. Despite these reductions, the model maintains competitive performance, achieving 94.1% accuracy, with only a 1.8% accuracy drop relative to full-scale models. Furthermore, real-time inference throughput improved by 1.6Γ— on edge devices such as Raspberry Pi and NVIDIA Jetson Nano. The results confirm that carefully designed lightweight Transformers can effectively balance accuracy, efficiency, and responsiveness, making them suitable for deployment in resource-constrained edge environments.

Keywords

Edge Intelligence, Lightweight Transformers, Model Compression, Quantization, Low-Latency Inference, Energy Efficiency

Conclusion

This study presented LT-Edge, a lightweight Transformer-based framework designed to enable efficient edge intelligence under constrained computational, memory, and energy environments. By integrating reduced-head low-rank attention, compressed feed-forward networks, and edge-aware optimization through quantization-aware training, the proposed model successfully addresses the limitations of deploying conventional Transformer architectures on edge devices. Experimental results demonstrate that LT-Edge achieves substantial reductions in model size, inference latency, and energy consumption while maintaining high predictive accuracy comparable to full-scale Transformer models. The balanced trade-off between efficiency and performance makes LT-Edge suitable for real-time edge applications such as anomaly detection, classification, and time-series prediction. Furthermore, the modular design of LT-Edge allows adaptability across diverse edge hardware platforms, enhancing its scalability and practical applicability. Overall, the findings confirm that carefully designed lightweight Transformer architectures can deliver robust and responsive intelligence at the edge, paving the way for widespread adoption of Transformer-based models in next-generation resource-constrained edge computing systems.

References

1.W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, β€œEdge computing: Vision and challenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646, Oct. 2016. 2.A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, β€œAttention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008. 3.Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, β€œEfficient transformers: A survey,” ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, 2022. 4.S. Teerapittayanon, B. McDanel, and H. T. Kung, β€œDistributed deep neural networks over the cloud, the edge and end devices,” in Proc. IEEE International Conference on Distributed Computing Systems (ICDCS), 2017, pp. 328–339. 5.Y. Choudhary, S. Gupta, and V. V. Raghavan, β€œModel compression techniques for deep learning: A survey,” Neural Networks, vol. 135, pp. 1–24, 2021. 6.B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, β€œQuantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704–2713. 7.K. Choromanski, V. Likhosherstov, D. Dohan, A. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, β€œRethinking attention with performers,” in Proc. International Conference on Learning Representations (ICLR), 2021. 8.H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, β€œOnce-for-all: Train one network and specialize it for efficient deployment,” in Proc. International Conference on Learning Representations (ICLR), 2020. 9.J. Lin, W. Chen, Y. Lin, and Z. Wang, β€œEdge AI: On-demand accelerators and model optimization,” IEEE Micro, vol. 43, no. 1, pp. 56–66, Jan.–Feb. 2023. 10.Y. Tay, M. Dehghani, D. Bahri, and D. Metzler, β€œEfficient transformers: A survey,” ACM Computing Surveys, vol. 55, no. 6, pp. 1–28, 2022. 11.K. Choromanski, V. Likhosherstov, D. Dohan, A. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller, β€œRethinking attention with performers,” in Proceedings of the International Conference on Learning Representations (ICLR), 2021. 12.B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, β€œQuantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704–2713. 13.S. Han, H. Mao, and W. J. Dally, β€œHardware-aware efficient deep learning,” IEEE Micro, vol. 41, no. 3, pp. 18–27, May–Jun. 2021. 14.H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, β€œOnce-for-all: Train one network and specialize it for efficient deployment,” in Proceedings of the International Conference on Learning Representations (ICLR), 2020. 15.J. Lin, W. Chen, Y. Lin, and Z. Wang, β€œEdge AI: Model optimization and hardware acceleration,” IEEE Micro, vol. 43, no. 1, pp. 56–66, Jan.–Feb. 2023. Y. Zhang, L. Wang, X. Liu, and M. Chen, β€œLightweight deep learning models for edge intelligence: A survey,” Future Generation Computer Systems, vol. 152, pp. 92–109, 2024.

Cite this article

APA
Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha (December 2025). Lightweight Transformer Models of Edge Intelligence under Constrained Environments. International Journal of Engineering and Techniques (IJET), 11(6). https://doi.org/{{doi}}
Shravanchandra G, Vunnam Himasri, Yedulla Nikhitha, β€œLightweight Transformer Models of Edge Intelligence under Constrained Environments,” International Journal of Engineering and Techniques (IJET), vol. 11, no. 6, December 2025, doi: {{doi}}.
Submit Your Paper