Competition 2023
Competition: Collaboration/Education
https://www.istockphoto.com/photos/hell-fire

Hell Fire SoC

Systolic arrays are critical in parallel computing. They efficiently accomplish tasks like matrix multiplication and signal processing by coordinating a grid of processing components to perform synchronized operations. The structured data flow reduces memory access while increasing processing, resulting in substantial speedups. Systolic arrays are used in a variety of domains, from AI model training to scientific simulations, to improve speed and enable complicated computations that typical sequential approaches struggle with. Their importance in increasing computing capabilities across multiple fields is highlighted by their role in expediting activities while maximizing resource efficiency.

Project Milestones

  1. Systolic Array Design

    Target Date
    Completed Date

     

    We developed a Processing Element (PE) comprising an accumulator and a multiplier for the Systolic Array implementation within our SoC. To ensure the IP's readiness for successive cycles of operations and avoid a global reset, we adopted a separate reset mechanism for the accumulator. This approach enhances the efficiency of the Systolic Array, allowing for seamless and independent reset of individual PEs during each cycle of computation. The use of separate resets minimizes overhead and contributes to improved performance and scalability of the overall system.

    System Architecture

     

  2. Accelerator IP Interface

    Target Date
    Completed Date

    We opted for  32-bit wide Non-Sequential transfers to provide continuous data delivery to the Array Interface IP. This purposeful approach provides efficient data transmission while also simplifying the overall architecture. We achieve exact data transfer throughout the system by applying appropriate control signals. This method helps to a more streamlined and ordered data flow, which improves the functionality of the Array IP. The usage of 32-bit data width, together with diligent control signal management, ensures accurate and timely data transmission, eventually improving system performance.

  3. AHB Memory Interface Design

    Target Date
    Completed Date

    This Interface implements a byte-addressable memory interface assuming a 32-bit memory width. 

  4. AHB GPIO Interface

    Target Date
    Completed Date

    AHB Based GPIO Interface 

  5. Array Alignment Interface

    Target Date
    Completed Date

    As the data to the array has to be aligned we plan to implement a data storage, alignment, and delivery architecture that accepts the data via the incoming AHB-Lite transfers and performs the operation and delivers the results back to the memory via the memory interface. 

    a

     

    System Architecture

     

  6. SoC Intergration

    Target Date

    The Systolic Array, GPIO, and memory peripherals are integrated to the Cortex-M0 and the SoC is tested. 

     

    System Architecture

     

  7. Physical Implementation

    Target Date
    Completed Date

    The Array IP was implemented on TSMC 65nm using Cadence Genus and Innovus tools. The table provided below offers insights into the diverse implementation runs conducted during the design phase, highlighting the evolution and refinement process of the Array IP. These iterative runs allowed us to fine-tune the IP's performance, power efficiency, and area utilization, ensuring that the final implementation met stringent design specifications. 

     

    Block Implementation Report
    Period(ns) Frequency(MHz) Area(um2) Power(mW) PPA(mW/um2)
    1.33 751.88 37662.48 38.72 1.03e-3
    2.0 500 31237.56 18.78 6.01e-4
    4.0 250 30402.36 8.49 2.79e-4
    10.0 100 30020.04 3.29 1.09e-4

     

Team

Name
Research Area
Hardware Design
Role
student

Comments

Thanks for the project outline. Hopefully you are well into the Design of a 5x5 systolic array with input alignment. From that and looking at the data requirements you may be able to see how David Mapstone uses the DMA in the NanoSoC reference design to push/pull data from the accelerator. Your wrapper interface is going to differ from his implementation but hopefully it will act as a starting point. If you need any help then please let us know.

We also hope the item on how to structure a Project and our example projects help you get set up quickly for this project.

Hello Everyone,

I’m excited to share some significant improvements we’ve made to the IP. We’ve successfully reduced latency from 96 cycles to just 25 cycles while introducing advanced features like 2x writes before read and support for IS, WS, and OS data flows—all without significantly increasing the area of these units.

Additionally, we’ve integrated a Gen-1 matrix transpose block and an activation block into the IP, enhancing its power and capabilities. These upgrades mark a major leap forward in performance and functionality.

While the release is still a work in progress, with more updates on the way, we’re thrilled about what’s coming. Stay tuned for further developments.

Thanks and regards,  
Srimanth Tenneti

Hi Sri,

I hope your project is going well. What is your plan for your accelerator interface? Are you planning on building an AHB-Lite based interface directly into your accelerator or are you planning on building/using a wrapper level to translate your transactions?

Thanks,

David

Hello Everyone

Our goal with this project was to introduce an Open Accelerator platform, and now, we're eager to invite all of you to join us in taking it to the next level.

We value your input, creativity, and collaboration. Together, we can enhance this design in countless ways. Your engagement and suggestions are not only welcomed but crucial to our success.

Thank you for being a part of this journey, and we look forward to your valuable contributions.

Warm Regards,
Srimanth Tenneti

GitHub - https://github.com/srimanthtenneti/Hell_Fire_SoC_Demo/

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
Srimanth Tenneti

Researcher at University of Cincinnati
Research area: Machine Learning | SoC Design

Interests

Software Software

Submitted on

Actions

Log-in to Join the Team