Abstract
<jats:p>Представлены результаты исследования производительности новых процессоров на архитектуре RISC-V c набором векторных инструкций T-Head TH1520 и SpacemiT K1 в одноплатных компьютерах Lichee Pi 4A и Banana Pi BPI-F3. Для тестирования использована функция sgemm из библиотеки OpenBLAS, которая лежит в основе многих современных вычислительных алгоритмов: нейросетей, графических вычислений, цифровой обработки сигналов и численных методов. Показано, что эффективность функции sgemm на RISC-V достигает лишь 40 % от теоретического максимума, тогда как в x86-64 этот показатель превышает 80 %. Кроме того, обнаружено двукратное отставание sgemm в производительности на плате Lichee Pi 4A в некоторых тестах по сравнению с предложенной в работе альтернативной оптимизированной реализацией minigemm.</jats:p> <jats:p>Purpose. The purpose of the work is to evaluate the effectiveness of the OpenBLAS library when performing dense matrix multiplication (GEMM) on processors of RISC-V architecture with support of vector extension. Methodology. The research was conducted through numerical experiments on two RISC-V single- board computers with RISC-V processors (Lichee Pi 4A with a T-Head TH1520 CPU and Banana Pi BPI-F3 with a SpacemiT K1 CPU) and a laptop with x86-64 processor (AMD Ryzen 7 5800H). The performance of the sgemm OpenBLAS function was compared with a custom optimized implementa- tion of the matrix multiplication function called minigemm, which was specifically designed for the RVV vector extension in RISC-V to work with matrices with a non-unit step in both dimensions. Findings. The results revealed a significant performance gap. While the x86-64 implementation achieved more than 80–90 % of its theoretical peak performance, the efficiency of sgemm from OpenBLAS on RISC-V boards was only 30–40 %. Moreover, the custom implementation of minigemm outperformed OpenBLAS by almost two times for matrices of certain sizes. It was assumed that the main bottleneck was the memory subsystem, whose bandwidth on RISC-V processoин и др.rs was lower than in a system with an x86-64 processor, which seriously limited the achievable performance, despite the computing potential of the processor cores. Originality/value. This work is a critical analysis of the performance of a fundamental computing operation on available RISC-V hardware. She emphasizes that the current OpenBLAS parameters are not optimal for these specific RISC-V processors, and demonstrates that the memory subsystem, rather than vector computing units, is currently the main limiting factor.</jats:p>