Member for
3 years 5 months
Role
Community lead
Points
3641
SoC Labs Roles
Contributor, Moderator

Projects

Articles

Interests

Design Flow

Technology

Authored Comments

Subject Comment Link to Comment
Verification Methodology

Hi,

just added a comment on Todays call:

High Capacity Memory Subsystem Development | SoC Labs

In it 'A discussion on the verification strategy occurred and we agreed that the second meeting in December would be dedicated the verification'. 

Do you think you can get involved with this and help determine the verification planning for the project?

We look forward to hearing from you.

view
Sizing potential system design to natural languages models.

It was good to have the call today and discuss the project. It was helpful to discuss the specification for the project as well as progress on the development of your model and activities to reduce the size of the model for inference on the edge. As we discussed the small microprocessor SoC reference design such as nanoSoC are perhaps capable of keyword detection tasks with a mid-range SoC reference design using specific acceleration subsystem can handle more demanding speech recognition tasks. Hopefully this diagram from the ML Developers Guide for Cortex-M Processors and Ethos-U NPU is helpful in visualising the types of ML application.

 

Arm ML processor portfolio

                  Copyright © 1995-2024 Arm Limited (or its affiliates). All rights reserved

As we discussed it is not only the computational intensity of the specific ML application but the size of the model and movement of data representing the model within the system both the System on Chip memory and the external off chip memory demand.  

view
Larger language models

We also discussed the option for deploying the model on the Corstone 300 Fixed Virtual Platform with the ability to separate parts of the model compute to the Ethos-U NPU or the M class CPU in an optimal way. 

diagram showing how velo optimization using the u55 to offload MAC computations

 

LiteRT is the new name for TensorFlow Lite (TFLite) which is a portable runtime for executing models from a number of AI/ML frameworks including TensorFlow. Without optimisation the runtime would use portable Reference C Kernels. Arm provides kernels optimised for it's processors in the CMSIS-NN library. Some operations can be highly optimised by execution within the Ethos-U accelerator. Velo takes a portable model and changes it to request the runtime to use the most efficient Arm compute infrastructure. 

A baseline might be established by taking your model and seeing if it can be effectively executed on the M55/U55 combination using a Fixed Virtual Platform as a simulation environment. 

view
Generating ASIC implementation from HLS4ML

In this interesting Siemens EDA presentation on Catapult + HLS4ML for Inference at the Edge it is clearly shown why generating the right intermediate code is important to allow efficient ASIC implementation.

Example of generated code that does not allow loop unroll for efficient hardware implementation

David Burnette in this presentation illustrates a hardware design pattern, Sliding Window, and how generating the appropriate code from the HLS4ML environment allows the design pattern to be used to make an efficient implementation.

Example of generated code that does allow loop unroll for efficient hardware implementation
view
Update as of end of November

Perhaps a Milestone 14 could be added? Now that Europractice have set out their schedule for 2025 fabrication shuttles then the 16th April mini@sic shuttle looks a good candidate for a nanoSoC version 2 tape out. 

Europractice mini@asic shuttle schedule as of November 2024

 

Is there a Milestone for the implementation of the Hyperbolic tangent and sigmoid functions ? When will this be complete?

view
Data transfers and Firmware

Looking at this diagram from the ML Developers Guide for Cortex-M Processors and Ethos-U NPU your application is in the vibration detection class I suspect.  As you are planning on using DMA then the M0 processor will handle the data movement through your custom accelerator.  

Diagram from Arm showing the various M class processors and the ML applications that are bet suited

                Copyright © 1995-2024 Arm Limited (or its affiliates). All rights reserved

I recently updated the Interest on Firmware.  I suspect you will need to develop a device driver for your custom accelerator to handle the interactions and interrupts.

view

User statistics

My contributions
:
861
My comments
:
306
Overall contributor
:
#1
2024 contributor
:
#1
December 2024 contributor
:
#1

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.