Collaborative

Active Project

AHB Qspi architectural design dwn @ soclabs

AHB eXcecute in Place (XiP) QSPI

The instruction memory in the first tape out of nanosoc was implemented using SRAM. The benefit was the read bandwidth from this memory was very fast, the downside was on a power-on-reset, all the code was erased as SRAM is volatile memory. An alternative use of non-volatile memory would benefit applications where deployment of the ASIC does not allow, or simply time is not available for programming the SRAM after every power up.

Non-volatile memory ("NVM") comes in different forms, but for microcontrollers the most typical type of NVM is flash. In industrial scale tape outs, companies may opt for on chip flash, as the area cost of adding this is typically recovered in selling large volumes of chips. For academic tape outs, the area needed for flash can be costly. The alternative is off-chip flash. There are really 2 categories, parallel and serial. As expected, parallel flash can give higher bandwidth at the cost of extra pins, and serial flash offers lower bandwidth but with significantly fewer pins.

For nanosoc and other small scale SoCs, it makes sense to opt for the serial flash as the ASIC implementation area of nanosoc has previously been I/O constrained (i.e the area of the die is small and impacts how many pins can fit on the die edge). For serial flash, SPI is the most common interface, and is often extended to dual, quad or octal SPI (adding extra data I/O pins). This project has opted for a QSPI implementation as this provides the good bandwidth/no. of pins option. The project may later extend the IP to support dual and octal SPI.

A lot of flash chips implement eXecute in Place (XiP). This feature means that for consecutive reads, you can omit sending the command byte, and just send the address. This can slightly increase the bandwidth of the flash accesses. For XiP it also makes sense to use a fully memory mapped interface, with an associated cache.

The code for this project can be found on our Git here.

Architectural Design

The fundamental design for the architecture is as below:

CG092 Flash Cache

The CG092 flash cache is a cache provided by Arm. It is instantiated between the bus interconnect and the flash controller to support caching. It has been optimised for fetching and caching instructions for M-class processors (particularly M3 and M4). The cache controller has a 32-bit AHB-lite subordinate that connects to the SoC bus, and a 128-bit AHB-lite manager that is connected to the "AHB to QSPI control block". The CG092 also requires an APB port for configuration of the cache controller

APB Mux

A simple APB mux from the Corstone 101. This is used to combine the CG092 apb interface and the internal APB register interface for the QSPI controller

APB Regs

This is used to configure the QSPI controller, and also to send configuration over the QSPI interface to the flash. This block is responsible for setting the clock frequency of the QSPI interface, the mode (SPI or QSPI), enable XiP mode, and to set some parameters of the AHB to QSPI control block. This is also the only interface that can write through to the flash over QSPI (as writing is more complex than reading)

AHB to QSPI control

This takes as input an AHB transaction, and converts to the QSPI control signals used to control the QSPI controller. This block can only read from the QSPI flash and will respond with a bus error if the SoC tries to write over this interface. It will also respond with an error if this interface is used to read over the QSPI, whilst the XiP mode is inactive.

QSPI Control Mux

Passes the QSPI control signals either from the APB controller or AHB controller. This is decided by the XiP mode, if XiP mode is active then the AHB interface is selected, otherwise it is the APB interface.

QSPI Controller

Main body of the AHB QSPI IP. This takes the QSPI control instructions and converts them to QSPI instructions. This is implemented with a state machine with the states: IDLE, NO_FETCH, OP, ADDR, MODE, DUMMY, DATA_O, DATA_I.

The QSPI controller has it's own line buffer. This is because the AHB interface can only send smaller than 128 bit transaction requests. This seems wasteful to fetch over QSPI. So QSPI will always fetch 128 bits when in XiP mode, and if the internal line buffer address matches the 128bit address (i.e. masking the least significant 4 bits) then it will not issue a QSPI transaction (i.e. NO_FETCH).

Verification

Initial verification of the subsystem. The simulation environment was setup using cocotb, using the AHB extensions to drive the AHB and APB ports of this design. The initial results for the coverage of the tests developed are shown below

The average coverage from this is 76.73% (ignoring the arm IP coverage and sst26vf064b flash model). After examining the coverage report, some extra tests were added to the cocotb verification. These extra tests covered:

FSM transitions in u_qspi_controller
Toggle of bits like address, registers
Tests to read uncovered branches

These additional tests were executed on a revised version of the sub-system with some tidying up of the implementation, particularly for registers that were too large (such as the clock divider register that was 8 bits, reduced to 5)

Coverage report after some additional tests

Coverage has so far been improved to 97.51%, with 100% coverage of the FSM in u_qspi_controller. Functionally, the tests are still passing, with assertions to ensure that it is functionally correct.

FPGA Implementation

For the FPGA implementation, a Pynq Z2 board was used with a PMOD SF3. This allowed for simple connection to the QSPI flash. In this case a micron MT25QL256ABA part was used so care had to be taken in order to ensure that the correct commands were sent.

Additional wrappers were added as the PS of the Zynq board is native AXI, so a bridge from AXI to APB and AHB was required as shown below

To ensure there was no effect on the timing of the FPGA, an external logic analyzer was used. Some of the verified behaviour is shown below from the logic analyzer. Firstly, a simple opcode transaction (0x35) which sets the QSPI flash in Quad I/O mode.

Then a QSPI read ID register transaction (0xAF). This shows that both the OP state and DATA in state are correctly working

Then a fast read command (0x0B). This was after writing to the flash so here is a test of the OP, ADDR, MODE, DUMMY, DATA I and DATA O states of the qspi controller

And lastly an XiP read over the AHB interface shows that the AHB controller working as expected

SoC Integration

Another verification test was to establish if a SoC design could boot from the QSPI flash. For simplicity, nanosoc is used here. In order to integrate into nanosoc, first the SRAM instruction memory had to be removed and replaced with an instance of the QPSI controller. Secondly the APB subsystem had to be edited to allow for control of the QPSI controller. And lastly top level pads/pins for the QSPI flash were added to the nanoSoC pad ring.

In behavioural verification the code is preloaded on the QSPI, and this works as expected. For the FPGA verification, the code has to be first written to the flash before it can boot.

The first method for programming the flash over FPGA is by using the ADP controller. This is similar to how the existing nanoSoC device is programmed, which is to write directly to the SRAM. However with the QSPI flash, writing has to be enabled, then data written from the controller buffer to the flash (currently only 16 bytes), wait until the flash has finished the write, polling the status register. Using the pynq environment of the Pynq Z2 board, this looks likes below:

file_stats = os.stat('hello.hex')
file_len_in_bytes = round(file_stats.st_size/3)
print(f'file size in bytes is {file_len_in_bytes}')
base_addr=0x0000
addr = base_addr
count = round(file_len_in_bytes/16)

start = time()
with open('hello.hex', mode='r') as file:
    for i in range(count):
        data = []
        for j in range(4):
            a=str.strip(file.readline())
            b=str.strip(file.readline())
            c=str.strip(file.readline())
            d=str.strip(file.readline())
            tmp = d+c+b+a
            data.append((int(tmp,16)))
        addr = base_addr + i*16
        print(data[0])
        QPI_WRITE_ENABLE(adp)
        QPI_PAGE_PROGRAM_128(adp, addr, data)
        while(QPI_READ_STAT_REG(adp)):
            pass

end = time()
length = end - start
print("Programming took " + str(length), "seconds")

It has been verified that the test code runs as expected and the "Hello World" and "Test Passed" messages generated as expected. Below is the QSPI trace for running the hello world program

Project Milestones

Post Silicon
Architectural Design	Getting Started	Specifying a SoC	data model	IP Selection	Universal Verification Methodology
Behavioural Design	Behavioural Modelling	Generate RTL	RTL Verification	Simulation
Logical Design	Technology Selection	Synthesis	Design for Test	Logical verification
Physical Design	Floor Planning	Clock Tree Synthesis	Routing	Timing closure	Physical Verification	Tape Out

Complete

In Progress

Not Started

Not Needed

Click on any milestone above for details

Do you want to view information on how to complete the work stage ""

View

or update the work stage for this project?

Architectural Design

Design Flow

Architectural Design

Target Date

June 12, 2025

Completed Date

June 12, 2025

High level architecture of the AHB QSPI

Result of Work

Done, image for the architecture added to page above
Getting Started

Design Flow

Getting Started

Target Date

June 12, 2025

Completed Date

June 12, 2025

Setup environment for the AHB QSPI IP

Result of Work

Environment setup with the Arm IP, simulation environment using the SoCtools git
IP Selection

Design Flow

IP Selection

Target Date

June 12, 2025

Completed Date

June 12, 2025

Result of Work

Arm IP used is the CG092 and some of the corstone 101 for the bus infrastructure
Behavioural Design

Design Flow

Behavioural Design

Target Date

June 12, 2025

Completed Date

June 12, 2025

Take the architectural model and develop the behavioural model

Result of Work

HDL created for the IP
Simulation

Design Flow

Simulation

Target Date

June 12, 2025

Completed Date

June 12, 2025

Setup the simulation environment and run the initial verification

Result of Work

Completed simulation with no bugs. Initial verification coverage averages 76.73%
Logical verification

Design Flow

Logical verification

Target Date

June 12, 2025

Completed Date

June 12, 2025

Verify the design, functionally and with coverage

Result of Work

Design has been verified with coverage of 97.5%

Team

Name

Daniel Newbrook

Research Area

IoT Devices

Role

Digital Design Engineer

Comments

Comparison of Non-Volatile memory

If anyone is interested in the relative merits of the different types of NVM then this article by Tim Daulby might be helpful.

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator

Daniel Newbrook

Digital Design Engineer at University of Southampton
Research area: IoT Devices
ORCID Profile

Technology

Corstone 101 for m0/m3

SPI/QSPI

Interests

Hardware design

Design Flow

Architectural Design

Behavioural Design

FPGA SoC Prototyping design flows

Specifying your IP

Submitted on Fri, 21/02/2025 - 13:44

Actions

Log-in to Join the Team

Architectural Design

CG092 Flash Cache

APB Mux

APB Regs

AHB to QSPI control

QSPI Control Mux

QSPI Controller

Verification

FPGA Implementation

SoC Integration

Project Milestones

Architectural Design

Getting Started

IP Selection

Behavioural Design

Simulation

Logical verification

Team

Comments

Comparison of Non-Volatile memory

Add new comment

Actions