Projects

Title	Updated date	Comment count
FPGA-Powered Acceleration for NLP Tasks	8 months 2 weeks ago	14

Articles

Interests

Design Flow

Technology

Authored Comments

Subject	Comment	Link to Comment
Tiny-Trans	Team Members: Abhishek Yadav (yadav.49@iitj.ac.in) Ayush Dixit (m23eev006@iitj.ac.in) Binod Kumar (binod@iitj.ac.in)	view
Tiny-Trans	Team Members: Abhishek Yadav (yadav.49@iitj.ac.in) Ayush Dixit (m23eev006@iitj.ac.in) Binod Kumar (binod@iitj.ac.in)	view
Related to project	Prototype with Zynq: We'll start by prototyping the IP on the Zynq MPSOC FPGA with integrating HLS generated IP with the Zynq and other IP . PYNQ Overlay: Create a PYNQ overlay for easy design space exploration and benchmarking. Application Testing: Run the application to evaluate performance and make necessary adjustments. This approach will allow us to iterate quickly and gather critical insights before moving to the physical design process. As of now the model size is quite large in MB's so we are trying to reduce the model size using some techniques and want to know what should be the ideal model size. We should be good to go as there would be some trade offs in accuracy ?	view
FPGA Prototyping	Prototype with Zynq: We'll start by prototyping the IP on the Zynq MPSOC FPGA with integrating HLS generated IP with the Zynq and other IP . PYNQ Overlay: Create a PYNQ overlay for easy design space exploration and benchmarking. Application Testing: Run the application to evaluate performance and make necessary adjustments. This approach will allow us to iterate quickly and gather critical insights before moving to the physical design process. As of now the model size is quite large in MB's so we are trying to reduce the model size using some techniques and want to know what should be the ideal model size. We should be good to go as there would be some trade offs in accuracy ?	view
Query regarding interfacing ARM soc with accelerator	I have been working with PYNQ Xilinx and with Zynq-7000 IP to implement an accelerator using memory-mapped AXI interfaces between the PS and PL parts of the SoC. I would like to understand how to interface my accelerator with the SoC architecture in this new setup. Specifically: 1) How can I replicate the memory-mapped AXI communication approach that I used on the FPGA in an ASIC flow? 2) Which Arm SoC would be apt and what will be the procedure to interface the Soc with the custom accelerator 3) What modifications are needed to efficiently integrate the accelerator with the SoC's memory and CPU subsystems? 4) Which SoC would be most suitable for integrating a custom accelerator, ensuring good support for AXI interfaces, data transfers, and power/performance optimizations? 5) How can I replicate the memory-mapped AXI communication approach that I used on the FPGA in an ASIC flow? 6) What modifications are needed for efficient integration of the accelerator with the SoC's memory and CPU subsystems? Additionally, I am familiar with HLS tools like HLS4ML and frameworks like TensorFlow for developing accelerators but they don't support transformers right now. We are okay to schedule a video call as per your convenience regarding the interfacing of ARM Soc with the accelerator	view

User statistics

My contributions

My comments

Overall contributor

#24

Comments

Welcome

Hello,

It would be good to understand what is of interest for you in SoC Labs. We look forward to hearing from you. You can simply reply to this comment to let us know.

John.

Memory size

You asked the question on the model size in MB's and the ideal size to target versus the trade off in accuracy.

On chip SRAM is a limiting factor in SoC design due to the high cost of the area for SRAM. While the hierarchical memory system for classical compute has been optimised, from the off chip DRAM all the way through the cache levels, it is has not for custom acceleration. One approach we are using to reduce fabrication costs is the use chiplet based SRAM die which can be added to a SoC from a stock of pre-fabricated die as opposed to adding to the die cost of a custom accelerator.

Classical compute has caches in low MB.

Publishing your updates to the project

Hi,

Once you are happy with the changes you have made to your project don't forget to change the save from Drafts to Editorial so it is sumbitted to being published on the site. You have chages current in Draft.

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.