Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control Policy
- Autores: Surchenko A.V1, Nedbailo Y.A1
-
Afiliações:
- JSC "MCST"
- Edição: Volume 23, Nº 3 (2024)
- Páginas: 859-885
- Seção: Digital information telecommunication technologies
- URL: https://journals.rcsi.science/2713-3192/article/view/265780
- DOI: https://doi.org/10.15622/ia.23.3.8
- ID: 265780
Citar
Texto integral
Resumo
Increasing the number of processing cores is currently a common way to boost processor performance. However, the load on the memory subsystem consequently increases as the number of its agents grows. Hardware data compression is an unconventional approach to improving memory subsystem performance by reducing, firstly, the main memory access rate by increasing the cache capacity and, secondly, data traffic by packing the data more densely. The paper describes the implementation of hardware data compression in the on-chip network and interprocessor links of a configuration with wide data transmission channels and a wormhole flow control policy. The existing solutions cannot be applied to such configurations because they are essentially based on using narrow data channels and flow control policies implying uninterrupted packet transmission, which is not maintained with the wormhole flow control. The method proposed in this paper enables the use of hardware compression in the aforementioned configuration by moving data compression and decompression from networks to the connected devices, as well as by using a number of optimizations to hide the data processing delays. Optimizations of some specific cases, such as the transmission of large data packets with several cache lines or the transmission of zero data, are considered. Special attention is given to data transmission via interprocessor links, where, due to their lower bandwidth compared to the on-chip network, data compression can be the most beneficial. The increase in memory subsystem bandwidth from using hardware data compression was confirmed in the experiments showing the relative IPC increase in SPEC CPU2017 benchmarks up to 14 percent.
Sobre autores
A. Surchenko
JSC "MCST"
Email: Alexander.V.Surchenko@mcst.ru
Vavilova St. 24
Yu. Nedbailo
JSC "MCST"
Email: yuri.nedbailo@mail.ru
Vavilova St. 24
Bibliografia
- Serpa M.S., Moreira F.B., Navaux P.O., Cruz E.H., Diener M., Griebler D., Fernandes L.G. Memory performance and bottlenecks in multicore and GPU architectures. 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 2019. pp. 233–236.
- Mohamed A.M., Mubark N., Zagloul S. Performance aware shared memory hierarchy model for multicore processors. Scientific Reports. 2023. vol. 13(1). no. 7313.
- Iyer R., De V., Illikkal, R., Koufaty, D., Chitlur, B., Herdrich, A., Khellah M., Hamzaoglu F., Karl E. Advances in microprocessor cache architectures over the last 25 years. IEEE Micro. 2021. Т. 41. № 6. С. 78–88.
- Papazian I.E. New 3rd Gen Intel® Xeon® Scalable Processor (Codename: Ice Lake-SP) // Hot Chips Symposium. 2020. С. 1–22.
- Zhan J., Poremba M., Xu Y., Xie Y. NoΔ: Leveraging delta compression for end-to-end memory access in NoC based multicores. 19th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2014. pp. 586–591.
- Deb D., Rohith M.K., Jose J. Flitzip: Effective packet compression for noc in multiprocessor system-on-chip // IEEE Transactions on Parallel and Distributed Systems. 2021. Т. 33. № 1. pp. 117–128.
- Wang Y., Han Y., Zhou J., Li H., Li X. DISCO: A low overhead in-network data compressor for energy-efficient chip multi-processors // Proceedings of the 53rd Annual Design Automation Conference. 2016. С. 1–6.
- Wang Y., Li H., Han Y., Li X. A low overhead in-network data compressor for the memory hierarchy of chip multiprocessors // IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2017. vol. 37. no. 6. pp. 1265–1277.
- Li X., Sondhi T. FlitReduce: Improving Memory Fabric Performance via End-to-End Network Packet Compression. UC Berkeley CS262A Report. 2021. 9 p.
- Pullaiah T., Manjunathachari K., Malleswari B.L. BΔ-NIS: Performance analysis of an efficient data compression technique for on-chip communication network. Integration. 2023. vol. 89. pp. 83–93.
- Pekhimenko G., Seshadri V., Mutlu O., Gibbons P.B., Kozuch M.A., Mowry T.C. Base-delta-immediate compression: Practical data compression for on-chip caches // Proceedings of the 21st international conference on Parallel architectures and compilation techniques. 2012. С. 377–388.
- Gaur J., Alameldeen A.R., Subramoney S. Base-victim compression: An opportunistic cache compression architecture // ACM SIGARCH Computer Architecture News. 2016. vol. 44. no. 3. pp. 317–328.
- Carvalho D.R., Seznec A. Understanding cache compression // ACM Transactions on Architecture and Code Optimization (TACO). 2021. vol. 18. no. 3. pp. 1–27.
- Pekhimenko G., Seshadri V., Kim Y., Xin H., Mutlu O., Gibbons P.B., Kozuch M.A., Mowry T.C. Linearly compressed pages: A low-complexity, low-latency main memory compression framework // Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. С. 172–184.
- Young V., Kariyappa S., Qureshi M.K. CRAM: Efficient Hardware-Based Memory Compression for Bandwidth Enhancement // arXiv preprint arXiv:1807.07685. 2018.
- Choukse E., Erez M., Alameldeen A.R. Compresso: Pragmatic main memory compression // 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2018. С. 546–558.
- Сурченко А.В. Исследование применимости аппаратной компрессии данных в межпроцессорных каналах связи процессоров с архитектурой Эльбрус // Труды Института системного программирования РАН. 2022. Т. 34. № 1. С. 49–58.
- Thuresson M., Spracklen L., Stenstrom P. Memory-link compression schemes: A value locality perspective // IEEE Transactions on Computers. 2008. vol. 57. no. 7. pp. 916–927.
- Kozhin A.S., Surchenko A.V. Design of Data Compression Mechanism in Cache Memory of Elbrus Processors // International Conference Engineering and Telecommunication (En&T). IEEE, 2020. С. 1–5.
- Nedbailo Y.A., Surchenko A.V., Bychkov I.N. Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor // Computer Research and Modeling. 2023. vol. 15. no. 3. pp. 639–656.
- Nedbailo Y. Fast and scalable simulation framework for large in-order chip multiprocessors // 26th Conference of Open Innovations Association (FRUCT). IEEE, 2020. pp. 335–345.
Arquivos suplementares

