A Specialised Computing Device for Multiplying Square Binary Matrices Based on Multiport Parallel-Pipelined Memory

Main Article Content

Mohammed H. Najajra
https://orcid.org/0009-0004-0712-306X
Aleksei V. Bolgak
https://orcid.org/0009-0004-6081-0395
Eduard I. Vatutin
Ismail F. Amer
https://orcid.org/0000-0002-1561-6559

Abstract

This article considers matrix multiplication in the problem of finding the transitive closure of a binary relation with the transitivity property, as well as in the construction of the reachability and counter-reachability matrices in general graphs. An analysis of approaches to practical implementation for finding the transitive closure of a binary relation is presented: the Floyd-Warshall algorithm and raising the adjacency matrix to a power until it stabilises. The problem of processing large (thousands to millions of elements) graph diagrams of parallel algorithms on a processor (CPU), and the primary methods for optimising matrix calculations at both the software (algorithmic) and hardware levels, are considered. The main types of digital devices based on the parallel-pipeline data-processing principle are identified, and their advantages and disadvantages are outlined. A specialised computing device for fast multiplication of square binary matrices of size n × n is considered, whose distinctive feature is pipelining the data read operation from a specialised multiport memory. A mathematical model and a method for organising the parallel-pipeline memory of a specialised square binary matrix multiplication device are presented. An estimate of the matrix-processing time and hardware complexity for the developed and prototype devices is presented. Computational experiments showed that, despite a slightly higher hardware complexity (up to 8.8×) than the prototype device, the proposed device multiplies square binary matrices of size n ≤ 512 up to 52.4× faster. This represents a significant advantage when implemented in a semi-custom design using field-programmable gate arrays or a custom design based on application-specific integrated circuits. In this paper, we present a novel systolic device whose core innovation is a pipelined multiport memory architecture. By ensuring a continuous, high-bandwidth data flow to the processing elements, our contribution enables the systolic array to operate at its theoretical peak performance.

Article Details

Section

Special Issue

References

Zykov AG, Polyakov VI. Algorithms for Computer-Aided Design of Electronic Computing Machines. Saint Petersburg: ITMO University 2014; 136.

Saenko IB, Mityakov ES, Lauta OS, Sokolov AP. Swarm Control Algorithm for UAVs with Elements of Cluster Analysis. Information and Space 2024; 4: 68–75.

Volkov AS, Baskakov AE. Development of a Bidirectional Search Procedure for Solving the Routing Problem in Software-Defined Transport Networks. Trudy MAI (Proceedings of Moscow Aviation Institute) 2021; 118: 1-15.

Zykov AA. Fundamentals of Graph Theory. Moscow: Nauka Publishing 1986; 384.

Vatutin EI, Zotov IV. Construction of a Relation Matrix in the Problem of Optimal Partitioning of Parallel Control Algorithms. Izvestiya of Kursk State Technical University 2004; 2: 85–89.

Levitin AV. Overcoming Limitations: The Bisection Method. Algorithms: Introduction to Design and Analysis 2006; 349–353.

Ismail H, Toroslu IH. Improving the Floyd–Warshall All-Pairs Shortest Paths Algorithm. Department of Computer Engineering, METU 2021; 1–5.

Bolgak AV, Vatutin EI. Evaluation of the Actual Performance of Intel Core Processors of Various Generations in the Task of Real Matrix Multiplication for Single-Threaded Software Implementation. Cloud and Distributed Computing Systems in E-Government 2024; 98–100.

Vatutin EI, Martynov IA, Titov VS. Evaluation of the Real Performance of Modern GPUs Supporting CUDA Technology in the Matrix Multiplication Problem. Izvestiya of Southwest State University. Series: Control, Computer Engineering, Informatics, Medical Instrumentation 2014; 2: 8–17.

Boreskov AV, Kharlamov AA, Markovsky ND, Mikushin DN, Mortikov EV, Myl’tsev AA, Sakharnykh NA, Frolov VA, Sadovnichiy VA. Parallel Computing on GPUs: Architecture and CUDA Programming Model. Moscow University Press 2012; 336.

Starovoitov IN, Revnyakov EN, Polyakova EN. Parallel Computing on Graphics Processors. Proceedings of the First International Scientific Conference on Digitalisation Issues: EDCRUNCH URAL 2020 2020; 314–319.

Khaldoon AO, Kamil NY, Abbas A, Al Smadi T. A Novel Flying Robot Swarm Formation Technique Based on Adaptive Wireless Communication Using a MUSIC Algorithm. International Journal of Electrical and Electronics Research 2024; 12(2): 688–695.

Yushin AM. Optoelectronic Devices and Their Foreign Analogues: Reference Book. Moscow: RadioSoft Publishing 2000; 1: 512.

Trrad I, Smadi TA, Al-Wahshat H. Application of Fuzzy Logic to Cognitive Wireless Communications. International Journal of Recent Technology and Engineering 2019; 8(3): 2228–2234.

Plaksienko VS, Plaksienko NE, Plaksienko SV, Plaksienko VS. Signal Reception and Processing Devices: A Textbook for Universities. Moscow: Educational and Methodological Publishing Centre “Uchebnaya Literatura” 2004; 376.

Al-Maitah M, Al Smadi TA, Al-Zoubi HQR. Scalable User Interface. Research Journal of Applied Sciences, Engineering and Technology 2014; 7(16): 3273–3279.

Gümüşkaya H, Örencik B. A Parallel Pipelined Computer Architecture for Digital Signal Processing. Turkish Journal of Electrical Engineering and Computer Sciences 1998; 6(2): 107–130.

Strogonov AV. Fundamentals of Digital Signal Processing. Voronezh: Voronezh State Technical University 2014.

Gvozdeva SN, Vatutin EI, Pshenichnykh AO, Titov VS. Device for Multiplying Binary Matrices. Patent of the Russian Federation for Utility Model No. 193927 2019.

Gvozdeva SN, Vatutin EI, Titov VS. Performance Evaluation of a Systolic Structure Device for Binary Matrix Multiplication. Telecommunications 2020; 3: 2–10.

Bolgak AV, Vatutin EI. Device for Multiplying Binary Matrices. Patent for Invention No. 2025104287 2025.

Bolgak AV, Vatutin EI. Mathematical Model and Structural–Functional Organization of a Parallel–Pipelined Memory Device for Fast Multiplication of Square Binary Matrices. XXI Century: Results of the Past and Problems of the Present Plus 2025; 14(2): 27–34.

Bolgak AV, Vatutin EI, Trokoz DA. Estimation of Time Costs for Multiplying Square Binary Matrices in a Device with Pipelined Data Reading from Specialised Multi-Port Memory. Izvestiya Southern Federal University. Engineering Sciences 2025; 4(246): 6–20.

Al-Maitah M, Al Smadi TA, Al-Zoubi HQR. Scalable User Interface. Research Journal of Applied Sciences, Engineering and Technology 2014; 7(16): 3273-3279.‏

Smadi TAA. Computer Application Using a Low-Cost Smart Sensor. International Journal of Computer Aided Engineering and Technology 2012; 4(6): 567.

Vatutin EI, Titov VS. Evaluation of the Actual Performance of Modern Processors in the Matrix Multiplication Problem for Single-Threaded Software Implementation Using SSE Extension. Izvestiya of Southwest State University 2015; 1(4): 26–35.

Vatutin EI, Martynov IA, Titov VS. Evaluation of the Actual Performance of Modern GPUs Supporting CUDA Technology in the Matrix Multiplication Problem. Izvestiya of Southwest State University. Series: Control, Computer Engineering, Informatics, Medical Instrumentation. 2014;2:8–17. (in Russian).

Martynov IA, Vatutin EI, Titov VS. Device for Matrix Multiplication. Patent of the Russian Federation for Utility Model No. 157948 2015.

Gvozdeva SN, Vatutin EI, Titov VS. Device for Use During a Binary Matrix. Patent of the Russian Federation No. 2744239 2021.

Bolgak AV, Vatutin EI. Assessment of the Hardware Complexity of a Device for Multiplying Square Binary Matrices with Pipelined Data Reading from Specialised Multi-Port Memory. Trudy MAI (Proceedings of Moscow Aviation Institute) 2025; 143: 1-20.

Gribok S, Pasca B. Efficient 8-Bit Matrix Multiplication on Intel Agilex-5 Fogas. IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines 2024; 43–53.

Nayak SK, Nayak AK, Laha SR, Tripathy N, Smadi TA. A Robust Deep Learning-Based Speaker Identification System Using a Hybrid Model on the KUI Dataset. International Journal of Electrical and Electronics Research 2024; 12(4): 1502–1507.

Khaleel RZ, et al. Improved Trajectory Planning for a Mobile Robot Based on the Pelican Optimization Algorithm. Journal Européen des Systèmes Automatisés 2024; 57(4): 1125-1134.

Khaleel HZ, Humaidi AJ. Toward Accuracy Improvement in the Solution of the Inverse Kinematic Problem in a Redundant Robot: A Comparative Analysis. International Review of Applied Sciences and Engineering 2024; 15(2): 242–251.

Obied H, et al. Implementation and Derivation Kinematics Modeling Analysis of WidowX 250 6 Degrees of Freedom Robotic Arm. Journal of Engineering and Applied Sciences 2024; 19: 45-56.

Similar Articles

You may also start an advanced similarity search for this article.