Systolic arrays are critical in parallel computing. They efficiently accomplish tasks like matrix multiplication and signal processing by coordinating a grid of processing components to perform synchronized operations. The structured data flow reduces memory access while increasing processing, resulting in substantial speedups. Systolic arrays are used in a variety of domains, from AI model training to scientific simulations, to improve speed and enable complicated computations that typical sequential approaches struggle with. Their importance in increasing computing capabilities across multiple fields is highlighted by their role in expediting activities while maximizing resource efficiency.
Systolic Array DesignTarget DateCompleted Date
We developed a Processing Element (PE) comprising an accumulator and a multiplier for the Systolic Array implementation within our SoC. To ensure the IP's readiness for successive cycles of operations and avoid a global reset, we adopted a separate reset mechanism for the accumulator. This approach enhances the efficiency of the Systolic Array, allowing for seamless and independent reset of individual PEs during each cycle of computation. The use of separate resets minimizes overhead and contributes to improved performance and scalability of the overall system.
Accelerator IP InterfaceTarget DateCompleted Date
We opted for 32-bit wide Non-Sequential transfers to provide continuous data delivery to the Array Interface IP. This purposeful approach provides efficient data transmission while also simplifying the overall architecture. We achieve exact data transfer throughout the system by applying appropriate control signals. This method helps to a more streamlined and ordered data flow, which improves the functionality of the Array IP. The usage of 32-bit data width, together with diligent control signal management, ensures accurate and timely data transmission, eventually improving system performance.
AHB Memory Interface DesignTarget DateCompleted Date
This Interface implements a byte-addressable memory interface assuming a 32-bit memory width.
AHB GPIO InterfaceTarget DateCompleted Date
AHB Based GPIO Interface
Physical ImplementationTarget DateCompleted Date
The Array IP was implemented on TSMC 65nm using Cadence Genus and Innovus tools. The table provided below offers insights into the diverse implementation runs conducted during the design phase, highlighting the evolution and refinement process of the Array IP. These iterative runs allowed us to fine-tune the IP's performance, power efficiency, and area utilization, ensuring that the final implementation met stringent design specifications.
Block Implementation Report Period(ns) Frequency(MHz) Area(um2) Power(mW) PPA(mW/um2) 1.33 751.88 37662.48 38.72 1.03e-3 2.0 500 31237.56 18.78 6.01e-4 4.0 250 30402.36 8.49 2.79e-4 10.0 100 30020.04 3.29 1.09e-4
Array Alignment InterfaceTarget DateCompleted Date
As the data to the array has to be aligned we plan to implement a data storage, alignment, and delivery architecture that accepts the data via the incoming AHB-Lite transfers and performs the operation and delivers the results back to the memory via the memory interface.
SoC IntergrationTarget Date
The Systolic Array, GPIO, and memory peripherals are integrated to the Cortex-M0 and the SoC is tested.