Competition 2023

Competition: Collaboration/Education

https://www.istockphoto.com/photos/hell-fire

Hell Fire SoC

Systolic arrays are critical in parallel computing. They efficiently accomplish tasks like matrix multiplication and signal processing by coordinating a grid of processing components to perform synchronized operations. The structured data flow reduces memory access while increasing processing, resulting in substantial speedups. Systolic arrays are used in a variety of domains, from AI model training to scientific simulations, to improve speed and enable complicated computations that typical sequential approaches struggle with. Their importance in increasing computing capabilities across multiple fields is highlighted by their role in expediting activities while maximizing resource efficiency.

Project Milestones

Systolic Array Design

Target Date

May 30, 2023

Completed Date

July 5, 2023

We developed a Processing Element (PE) comprising an accumulator and a multiplier for the Systolic Array implementation within our SoC. To ensure the IP's readiness for successive cycles of operations and avoid a global reset, we adopted a separate reset mechanism for the accumulator. This approach enhances the efficiency of the Systolic Array, allowing for seamless and independent reset of individual PEs during each cycle of computation. The use of separate resets minimizes overhead and contributes to improved performance and scalability of the overall system.
Accelerator IP Interface

Target Date

June 20, 2023

Completed Date

July 5, 2023

We opted for 32-bit wide Non-Sequential transfers to provide continuous data delivery to the Array Interface IP. This purposeful approach provides efficient data transmission while also simplifying the overall architecture. We achieve exact data transfer throughout the system by applying appropriate control signals. This method helps to a more streamlined and ordered data flow, which improves the functionality of the Array IP. The usage of 32-bit data width, together with diligent control signal management, ensures accurate and timely data transmission, eventually improving system performance.
AHB Memory Interface Design

Target Date

June 30, 2023

Completed Date

July 5, 2023

This Interface implements a byte-addressable memory interface assuming a 32-bit memory width.
AHB GPIO Interface

Target Date

July 20, 2023

Completed Date

July 5, 2023

AHB Based GPIO Interface
Array Alignment Interface

Target Date

August 20, 2023

Completed Date

August 4, 2023

As the data to the array has to be aligned we plan to implement a data storage, alignment, and delivery architecture that accepts the data via the incoming AHB-Lite transfers and performs the operation and delivers the results back to the memory via the memory interface.
SoC Intergration

Target Date

September 1, 2023

The Systolic Array, GPIO, and memory peripherals are integrated to the Cortex-M0 and the SoC is tested.

Physical Implementation

Target Date

August 16, 2023

Completed Date

August 16, 2023

The Array IP was implemented on TSMC 65nm using Cadence Genus and Innovus tools. The table provided below offers insights into the diverse implementation runs conducted during the design phase, highlighting the evolution and refinement process of the Array IP. These iterative runs allowed us to fine-tune the IP's performance, power efficiency, and area utilization, ensuring that the final implementation met stringent design specifications.

Block Implementation Report
Period(ns)	Frequency(MHz)	Area(um²)	Power(mW)	PPA(mW/um²)
1.33	751.88	37662.48	38.72	1.03e-3
2.0	500	31237.56	18.78	6.01e-4
4.0	250	30402.36	8.49	2.79e-4
10.0	100	30020.04	3.29	1.09e-4

Team

Name

Srimanth Tenneti

Research Area

Machine Learning | SoC Design

Role

Researcher

Name

Sindhura Maddineni

Name

Sai Sumanth Reddy Chinnasani

Research Area

VLSI/RTL Design

Role

Student

Name

Sindhuja Gangapuram

Role

student

Name

Nick Gian

Research Area

Hardware Design

Role

student

Comments

Use of the DMA and NanoSoC reference design

Thanks for the project outline. Hopefully you are well into the Design of a 5x5 systolic array with input alignment. From that and looking at the data requirements you may be able to see how David Mapstone uses the DMA in the NanoSoC reference design to push/pull data from the accelerator. Your wrapper interface is going to differ from his implementation but hopefully it will act as a starting point. If you need any help then please let us know.

We also hope the item on how to structure a Project and our example projects help you get set up quickly for this project.

Structure for the Project

Hopefully we have a clear pathway now to establish the Project and set up the repositories. If I understand things this Project will use the the NanoSoC reference design to push/pull data from the accelerator.

na

Accelerator Interface

Hi Sri,

I hope your project is going well. What is your plan for your accelerator interface? Are you planning on building an AHB-Lite based interface directly into your accelerator or are you planning on building/using a wrapper level to translate your transactions?

Thanks,

David

Hi David, I plan to use a…

Hi David,

I plan to use a direct interface.

Thanks, & Regards

Srimanth Tenneti

Update1

Collaboration

Hello Everyone,

Our goal with this project was to introduce an Open Accelerator platform, and now, we're eager to invite all of you to join us in taking it to the next level.

We value your input, creativity, and collaboration. Together, we can enhance this design in countless ways. Your engagement and suggestions are not only welcomed but crucial to our success.

Thank you for being a part of this journey, and we look forward to your valuable contributions.

Warm Regards,
Srimanth Tenneti

GitHub - https://github.com/srimanthtenneti/Hell_Fire_SoC_Demo/

Die are back

Srimanth,

The die are back. Are you ready for testing the design with the test board?

John.

Final demonstration and additional information on the project

Hi,

It would be great to get an update on the final demonstration of the accelerator working on the test board and also an update to the project to reflect on the start to finish milestones and any reflections on the lessons learned along the way.

We look forward to hearing about your thoughts on this.

Project Update

Hello Everyone,

I’m excited to share some significant improvements we’ve made to the IP. We’ve successfully reduced latency from 96 cycles to just 25 cycles while introducing advanced features like 2x writes before read and support for IS, WS, and OS data flows—all without significantly increasing the area of these units.

Additionally, we’ve integrated a Gen-1 matrix transpose block and an activation block into the IP, enhancing its power and capabilities. These upgrades mark a major leap forward in performance and functionality.

While the release is still a work in progress, with more updates on the way, we’re thrilled about what’s coming. Stay tuned for further developments.

Thanks and regards,
Srimanth Tenneti

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator

Srimanth Tenneti

Researcher at University of Cincinnati
Research area: Machine Learning | SoC Design

Interests

Software

Design Flow

Basic System Accelerator Integration Verification

Submitted on Fri, 19/05/2023 - 12:37

Actions

Log-in to Join the Team

Project Milestones

Systolic Array Design

Accelerator IP Interface

AHB Memory Interface Design

AHB GPIO Interface

Array Alignment Interface

SoC Intergration

Physical Implementation

Team

Comments

Add new comment

Actions