Matrix multiplication, ARM architecture, Vector operations, Matrix transposition


Background. Matrix multiplication is a rather complicated algorithm with a large number of operations. An additional problem is the nonlinear memory traversal of matrices. Matrix multiplication is widely used in various fields, such as neural networks, solutions of linear equation systems, matrix transformations, and so on. Therefore, it is important to develop a method of matrix multiplication, which will take into account the problems of the location of the matrices in memory, and will effectively manage the data when reused.

Objective. The purpose of the paper is to develop a method of fast matrix multiplication of two matrices, as well as multiplying the matrix by the transposed matrix and by a list of vectors (including special case for only one vector), as well as to implement it as a function with optimization for ARM architecture processors. The function must be able to handle different types of data and submatrices. The integer result can be scaled.

Methods. The main ideas of the developed method are simultaneous work with several rows/columns of input matrices and their splitting into blocks, which will allow the algorithm to run on the same memory for a while. The C programming language was chosen for implementation. SIMD instructions were used to increase productivity. We also need to properly organize the memory preloading for effective implementation under the ARM architecture.

Results. A function that performs matrix multiplication by the developed method with the necessary parameters was implemented as a result of the study. Tests on various sizes and types have shown that the implemented function is faster than analogues from the OpenCV2 and Eigen 3 libraries. Testing was done using the vipmed utility for running and measuring features developed for enterprise use at VIT.

Conclusions. The proposed matrix multiplication method gives the expected acceleration of matrix multiplication operations, has passed evaluation test for use and meets the target requirements. For further work, it is necessary to study in more detail the influence of the cache at different levels and compare with other existing libraries.

Author Biographies

Ivan A. Dychka, Igor Sikorsky Kyiv Polytechnic Institute

Іван Андрійович Дичка

Denys A. Vinnyk, Igor Sikorsky Kyiv Polytechnic Institute

Денис Андрійович Вінник

Yuriy V. Bukhtiyarov, Igor Sikorsky Kyiv Polytechnic Institute

Юрій Вікторович Бухтіяров

Vasyl Ya. Yurchyshyn, Igor Sikorsky Kyiv Polytechnic Institute

Василь Якович Юрчишин


Application of Matrices in Real-Life [Online]. Available:

Le Gall, François, “Powers of tensors and fast matrix multiplication”, in Proc. 39th Int. Symp. Symbolic and Algebraic Computation, Kobe, Japan, 2014, pp. 157–164.

V. Strassen, “Gaussian elimination is not optimal”, Numer. Math., vol. 13, no. 4, pp. 354–356, 1969.

S. Robinson, “Toward an optimal algorithm for matrix multiplication”, SIAM News, vol. 38, no. 9, 2005.

C.L. Lawson et al., “Basic linear algebra subprograms for FORTRAN usage”, in ACM Trans. Math. Software, 1979, pp. 308–323.

OpenCV official site [Online]. Available:

Eigen Official Site [Online]. Available:

ARM Official Site [Online]. Available:

ARM Information Center [Online]. Available: