Competition 2024
Competition: Hardware Implementation

Arrhythmia Analysis Accelerator : A-Cube

We propose the A-Cube design methodology to create medical decision support on the edge. The design and implementation of an atrial fibrillation detector hardware core was selected as a proof-of-concept study. To facilitate the required atrial fibrillation functionality, we adopted an established AI model, based on Long Short-Term Memory (LSTM) technology for hardware implementation. The adaptation was done by varying design parameters such as data window and the number of LSTM units. We found that a data window of 40 beats and 20 LSTM units are sufficient to achieve a classification accuracy of 99.02%. We are confident that the A-Cube methodology can be used to implement this model in hardware. Doing so, will create a low power and low latency atrial fibrillation monitoring solution which has the potential to extend the observation duration while being convenient for patients.

 

Project Milestones

  1. Milestone #1

    Target Date

    Implement and test the Soclab encryption example in the ZCU104 FPGA board.

  2. Milestone #2

    Target Date

    Implement an atrial fibrillation detection model in the ZCU104 FPGA board.

  3. Milestone #3

    Target Date

    Analyse the model performance based on different signal length and algorithm complexity.

  4. Milestone #4

    Target Date

    Analyse the model performance based on different quantisation levels.

  5. Milestone #5

    Target Date

    Select a suitable implementation candidate to server as atrial fibrillation detection core.

  6. Milestone #6

    Target Date

    Simulate (RTL) the selected atrial fibrillation detection core.

  7. Milestone #7

    Target Date

    Implement an AHB bus interface for the selected atrial fibrillation detection core.

  8. Milestone #8

    Target Date

    Integrate Soclab and the selected atrial fibrillation detection core.

  9. Milestone #9

    Target Date

    Simulate the integration results.

  10. Milestone #10

    Target Date

    Implement the integration results in the ZCU104 FPGA board.

  11. Milestone #11

    Target Date

    Test the hardware on the ZCU104 FPGA board.

  12. Milestone #12

    Target Date
  13. Milestone #13

    Target Date

Comments

Hi,

Great to see the project live. It is very easy to add the initial milestones and you can come back as each milestone progresses and make updates. 

IITH are a good example of milestones in their project https://soclabs.org/project/real-time-edge-ai-soc-high-speed-low-complexity-reconfigurable-scalable-architecture-deep

You can simply select milestones from the generic design flow stages or create any unique milestones.

add milestones via edit of the Project

The HLS4ML environment can use various backend tool chains for High Level Synthesis (HLS). These tools support the generation of variations of hardware implementation for an algorithm written in a high level language portable language. The benefit of High Level Synthesis is that it bridges hardware and software design domains. 

It was good to get some details of your design environment. As I understand things you are using the Xilinx® Vivado® High-Level Synthesis (HLS) tool chain as your backend. HLS4ML uses streams to pass data and these are mapped into implementation buffers. As with other abstractions it provides an accelerated design environment but then requires effort to optimise the hardware implementation. This can be done by developing optimization directives that direct the tool chain to refine the automated generation of the implementation.

Alternatively some design teams move to use the next layer of tooling down in the design abstraction environment once they have a conceptual design. The data model for your project is relatively easy to understand. It was interesting discussing the use of the  Xilinx® HLS streaming interface.  We also discussed some of the challenges of moving beyond the FPGA fabric support into the full ASIC implementation flow and the subtle differences in performance of logic and memory between the two hardware instantiations. 

It will good to hear how things are progressing. 
 

If you are still planning to use HLS4ML to generate a model you might be interested in one of last year's projects. I have added a few comments on what has been happening on Aba’s project by and the design challenges.

Enhancing HLS4ML: Accelerating DNNs on FPGA and ASIC for Scientific Computing | Review

Is my current overview of their project environment and

Enhancing HLS4ML: Accelerating DNNs on FPGA and ASIC for Scientific Computing | Bridge

Is an overview of the bridge issues from nanosoc to their custom accelerator

 

The workflow proposed by Aba and the team in their project is developing but still has some issues being resolved to map the model to the nanoSoC reference design when dealing with the specifics of physical design for a specific technology node and ensuring there are no DRC violations. The section 'Workflow of Our system' is helpful in giving an overview. You should be able to get some understanding of the next milestones, I think these are 6,7,8.  

One significant difference I am aware of is that the volume of data Aba and team are looking to move through the system is such they plan to use DMA engines to offload the data movement across the bus. In your case you have said the data volume is low and so you don't plan to use DMA. I am assuming the firmware on the M0 will handle the transfers?

Hi,

It was great to hear today how you worked through the analysis to come to the view that the data window of 40 beats and 20 LSTM units are sufficient to achieve a classification accuracy of 99.02%. It was also interesting to explore how the data movement across the bus and into your accelerator will form based on this requirement.

Different people have taken similar but slightly varying approaches to how the data transactions flow into their working registers within their accelerators. There are a few projects that can act as examples:

SHA-2 Accelerator Engine | SoC Labs

Given the volume of data flowing I think you concluded the Master for the bus was likely the M class CPU and not a DMA controller.

All sounds very promising and it will be great to understand more on the design and expected milestones.

Hi,

It was good to open the discussion on what will be your frame work for creating test benches. There are a variety of approaches people are adopting. The Verification pathway parallels the Design pathway through the Design Flow  stages. There are even standarisation efforts in this area: 

Universal Verification Methodology | SoC Labs

It often comes down to experience of various tools and languages that are easily adopted, or the ability to re-use prior verification assets that reduce the overall project timeline rather than extend it with new skills acquisition for unknown techniques. 

There are some  verification assets in SoC Labs and some projects to continue to develop these.

System Verification of NanoSoC | SoC Labs

I think you had some specific focused test bench requirements you wanted to work on for the data flow across the bus. I guess this is part of the usual ebb and flow of design, either top down for the SoC systems architecture or bottom up from foundation IP blocks.

Looking forward to seeing how these develop on your project.

John. 

You are able to build nanoSoC with the DMA removed. The two nanoSoC designs taped out recently had either the DMA 230 or the DMA 350. Aba and the team are developing an accelerator core with micro DMA engines within their accelerator design. This allows them to generate similar firmware for their FPGA implementation as their ASIC implementation. They have been varying the nanoSoC project to build without the  DMA 230 or DMA 350 IP but containing their addressing within the accelerator expansion region. 

We propose the A-Cube design methodology to create medical decision support on the edge. The design and implementation of an atrial fibrillation detector hardware core was selected as a proof-of-concept study. To facilitate the required atrial fibrillation functionality, we adopted an established AI model, based on Long Short-Term Memory (LSTM) technology for hardware implementation. The adaptation was done by varying design parameters such as data window and the number of LSTM units. We found that a data window of 40 beats and 20 LSTM units are sufficient to achieve a classification accuracy of 99.02%. We are confident that the A-Cube methodology can be used to implement this model in hardware. Doing so, will create a low power and low latency atrial fibrillation monitoring solution which has the potential to extend the observation duration while being convenient for patients.

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
rami hariri

researcher student at anglia ruskin university
Research area: AI

Related Articles

Submitted on

Actions

Log-in to Join the Team