Reference Design
Active Project
soclabs nanosoc microcontroller framework - 2024
soclabs

nanosoc - baseline Cortex-M0 microcontroller SoC (2024 update)

This second generation (2024 update) of the nanoSoC reference design started development in 2024Q4. It upgrades the initial nanoSoC with additional off-chip communications interface options to better supports both console (STDIN/STDOUT plus ADP ASCII Debug Protocol) and application-dependent 8-bit data I/O streams for improved data transfer on and off chip. The interrupt vectors have been modified to support efficient multiple UART-compatible channel interfaces to support these command and data channels. 

Rationale

The nanoSoC reference design provides a baseline microcontroller System on Chip design that supports design, implementation, verification and evaluation of academic developed research hardware such as a custom accelerators or signal processing subsystems. The reference design allows a seamless transition from FPGA to physical silicon implementation that is already silicon proven. nanoSoC also provides a project development environment inclusive of the pre-verified SoC system IP (with diagnostic functionality), software, and validation test benches that may be easily adapted and extended as required.

The design is based upon the Arm Cortex-M System Design Kit, CMSDK, allowing reuse of the AAA pre-verified IP, documentation and software. This has been extended to support simple 'bolting-on' of memory-mapped experimental academic hardware with an appropriate test bench development environment.

The first generation nanoSoC' implemented a bit-serial "FTDI1248" controller for off chip interface to an external host operating on the nanoSoC Test/development Board . This second generation of nanoSoC has an alternative interface to accommodate higher SoC clock frequencies, and support for independent bulk data transfer on- and off-chip to better support data movement through the academic developed research hardware subsystems. 

Technical overview

nanoSoC is a Cortex-M0 based microcontroller based SoC design with complete pad-ring support ready for silicon implementation. It has internal address space and control and diagnostic support for integrating custom subsystems or research components:

  • CPU - small Arm Cortex-M0 processor with Serial-Wire Debug integrated support
  • Boot Monitor - Synthesized ROM bootstrap for MCU
  • Code-SRAM bank (configurable size bank of memory primarily for downloaded test programs)
  • Data-SRAM bank (configurable size bank of memory primarily for test program data, stack and heap)
  • System peripherals (serial communications, General Purpose IO - GPIO, system counter timers and clocks)
  • Memory-mapped expansion space
  • Optional support for 1 or 2 Direct Memory Address controllers
  • Two banks of DMA-accessible SRAM buffer space for concurrent expansion space usage
  • ASCII Debug Protocol agent, ADP, with clock independent host interface
Block diagram of nanosoc that supports hosting research components of subsystems
Basic block diagram of 'chip' and 'pad-ring' functionality - to support hosting research experimental IP

 

Getting started

Soclabs provides an implementation framework to support adding an application-specific hardware accelerator into a SoC based on the nanoSoC microcontroller. The nanoSoC is outlined below but get started quickly you can access the framework resources and review an example hardware accelerator implementation.

Generic accelerator framework

 See the git resources and README file for:

soclabs/accelerator-project

https://git.soton.ac.uk/soclabs/accelerator-project

This instantiates a nanoSoC microcontroller subsystem - that is is provided pre-validated in the the nanosoc_tech sub-respository:

soclabs/nanosoc_tech

https://git.soton.ac.uk/soclabs/nanosoc_tech

The example AES128 hardware accelerator implementation 

As a concrete example, a cloned version of the generic Accelerator-Project is extended to add a memory mapped AES encryption engine. nanoSoC supports multiple types of DMA engine to provide efficient data movement for any custom subsytem.  This implementation uses an enhanced version of the Arm DMA230 AMBA-AHB Direct Memory Access controller together with test programs written to test both software driven (memcpy) and DMA mapped (dma230) methods.

See the git resources and README documentation at: 

soclabs/aes-128-project

https://git.soton.ac.uk/soclabs/aes-128-project

nanosoc Architecture

Interconnect fabric

The simple single AMBA AHB bus design of the Arm CMSDK is upgraded in the nanoSoC reference design to a multi-layer AHB-lite matrix that supports up to 4 concurrent access paths to the primary memory and input/output components.

nanosoc AHB bus matrix  supporting concurrent address map access and arbitration only when multiple initiators compete for a shared segment of address space
nanosoc AHB bus matrix  supporting concurrent address map access

 

More details of how this bus matrix is generated using the Arm Academic Access tools is described at https://soclabs.org/project/building-system-optimised-amba-interconnect.

Design and validation testbench

The testbench (tb_nanosoc) provides the functionality to support the provision of:

  • system clocking and initialisation
  • hardware debug communications port which supports serial communications and ASCII Debug Port agent control and diagnostics
  • an Arm Serial Wire Debug controller model for validating software debugger connection and functionality
  • an Arm CPU trace model that replicates internal processor state and allows simulated instruction and data trace. This is done for both RTL and gate-level netlist simulation verification to make the verification at each stage of the design process similar, enabling efficient iterations for a design.
Simulation testbench architecture - with nanosoc_chip_pads "socket"
nanosoc simulation testbench functionality

 

FPGA prototyping platform

Two example FPGA example targets for  Xilinx(R) Vivado (R) have been provided to date that support hardware prototyping and verification of the nanosoc functionality:

  • Xilinx® FPGA platform target with a wrapper layer that provides the mapping from nanosoc chip-level ports (inside the pad-ring) to the FPGA pads as well as providing the target clock and reset control from the board-specific peripherals. This supports board-level evaluation and debug at the desk, usually with a USB-connected JTAG interface.
  • Xilinx PYNQ® platform target, that supports fully networked validation support and that can be used as a shareable development resource. This uses the integrated Zynq® Arm Cortex-A processor subsystem to provide the linux OS, network stack and python environment with jupyter notebook test code. The target example is the Xilinx ZCU104 evaluation board, that has first-class PYNQ software support.

The baseline FPGA target simply requires programmable logic and memory block resources, and would normally be connected by USB cable direct to the host development system:

FPGA prototyping architecture - for baseline FPGA board target
FPGA 'wrapper' that instantiates the nanosoc_chip level of hierarchy

 

The Xilinx Zynq 'PYNQ' platform system development target uses the Programmable Logic (PL) resources to implement the nanosoc design and the Processing System (PS) integrated Zynq-Arm subsystem to provide to run the PYNQ software environment over an Ethernet network connection and allow browser-based SW test and verification remotely from web browser:

FPGA prototyping architecture - for PYNQ-enabled FPGA board target
FPGA 'wrapper' with PYNQ host platform support for networked development

 

Address Map

The address map is kept closely compatible with the Arm CMSDK to allow reuse of the documentation and the example test programs as a staring point. The bus matrix fabric supports additional expansion memory banks and a large uncommitted address mapped region for experimental sub-system interfacing - sufficient to configure, control as well as source and sink workload data to and from memory.

nanosoc address map
start-addressend-addressregionnotes
0xF00000000xF0003FFFSystem table ROMCPU/DBG config
0xA00000000xDFFFFFFFExpansion IO spaceExperimental IO
0x900000000x9FFFFFFFExpansion RAM (hi)(DMA memory buffers)
0x800000000x8FFFFFFFExpansion RAM (lo)(DMA memory buffers)
0x600000000x7FFFFFFFExpansion IO spaceExperimental IO
0x400000000x4FFFFFFFSystem IO(CPU MCU peripherals
0x300000000x3FFFFFFFData memory (RAM)(CPU heap/stack)
0x200000000x2FFFFFFFCode memory (RAM)(CPU execution memory)
0x100000000x1FFFFFFFBootstrap ROMsynthesized, mapped to 0
0x000000000x0FFFFFFFVectors, run-time codeBoot ROM -> Code RAM (remapped by boot monitor)

This address map is fully visible to the CPU software environment and the ADP hardware debug agent.

The optional 1 or 2 DMA controller(s) do not have visibility

For the system IO region:

nanosoc system io address map
start-addressend-addressnotes
0x400000000x40000FFFTimer 0
0x400010000x40001FFFTimer 1
0x400020000x40003FFFDual Timer
0x400040000x40004FFFUSRT 0
0x400050000x40005FFFUSRT 1
0x400060000x40006FFFUART 2
0x400080000x40008FFFWatchdog Timer
0x4000E0000x4000EFFFUSRT 2
0x4000F0000x4000FFFFDMA 0 Base
0x400100000x40010FFFGPIO 0
0x400110000x40011FFFGPIO 1
0x4001F0000x4001FFFFSystem Control

CPU Interrupts

Interrupt NumberInterrupt NameInterrupt Source
0USRTRX0_IRQnUSRT 0 RX interrupt
1USRTTX0_IRQnUSRT 0 TX interrupt
2USRTRX1_IRQnUSRT 1 RX interrupt
3USRTTX1_IRQnUSRT 1 TX interrupt
4UARTRX2_IRQnUART 2 RX interrupt
5UARTTX2_IRQnUART 2 TX interrupt
6PORT0_ALL_IRQnCombined interrupt for any pins on GPIO port 0
7PORT1_ALL_IRQnCombined interrupt for any pins on GPIO port 1
8TIMER0_IRQnTimer 0 interrupt
9TIMER1_IRQnTimer 1 interrupt
10DUALTIMER_IRQnDual timer interrupt
11EXP0_IRQnFrom accelerator - definable by user
12EXP1_IRQnFrom accelerator - definable by user
13EXP2_IRQnFrom accelerator - definable by user
14EXP3_IRQnFrom accelerator - definable by user
15DMA_IRQnInterrupt from DMA, this is a combined interrupt so can be from any channel
16-31PORT0_x_IRQnI/O pins on Port 0, from pin x, each pin has it's own dedicated interrupt

Communications channel

nanosoc supports interfacing to an external testbench via an off-chip protocol 

See the original nanosoc project (2023; version 1) for the original 4-wire Future Devices "FT1248" serial interface which allowed both FPGA instantiation and implemented hardware at the board-level to a use a  standard USB host communication port (Future Devices FT232H chip or similar). Plus an additional two pin Universal Asynchronous Receiver-Transmitter, "UART2" (from the Arm Cortex-M System Design Kit)

As well as a backwards-compatibility mode the communications channel on the new nanoSoC (2024 update) now supports a full dual-ended handshake interface with an off-chip microcontroller or FPGA that implements 4 virtual channels:

EXTIO communications port
Virtual ChannelChannelDirection
CTX - 8-bit console STDOUT/ADP output0TX
CRX- 8-bit console STDIN /ADP command input0RX
DTX - 8-bit transmit data channel (application dependent)1TX
DRX - 8-bit receive data channel (application dependent)1RX

The interface supports 4 AXI-Stream channels and uses a 7-wire packet bus protocol to manage robust data transfer.

The details of the off-chip interface are described in Host-IO | SoC Labs.

The 7-pin interface is mapped onto the four lower pins of the GPIO Port-1 interface:

Communications I/O mapping - and testbench verification support
Port-1 pin nameEXTIO mode(FT1248x1 mode)EXTIO signal description
P1[0]IORQ1_o(FTMISO_i)Gray-coded transfer REQuest 1 -> host
P1[1]IORQ2_o(FTCLK_o)Gray-coded transfer REQuest 2 -> host
P1[2]IOACK_i(FTMIOSIO_io)Transfer ACKnowledge <-- host (async)
P1[3]IODATA0_io(FTSSN_o)CRX virtual channel status <-- host (async) / nibble data[0] transfer (sync)
P1[4]IODATA1_io(UART2RXD_i)CTX virtual channel status <-- host (async) / nibble data[1] transfer (sync)
P1[5]IODATA2_io(UART2TXD_o)DRX virtual channel status <-- host (async) / nibble data[2] transfer (sync)
P1[6]IODATA3_io(user GPIO)DTX Virtual channel status <-- host (async) / nibble data[3] transfer (sync)
P1[7]CFG = 0(CFG = 1)Tie low for EXTIO operation

This provides nanoSoC with 4 robust handshaking byte communications channels. 

The Console RX channel is routed to the microcontroller STDIN FIFO, but supports hardware state-machine parsing of input data to support a bare metal physical memory debug assess:

An ASCII 'ESC' (0x1B) escape character is interpreted by the on-chip ADP (ASCII Debug Protocol) agent as the code to enter the ADP hardware monitor mode, signalled by an ASCII ']' character output prompt that then allows the host console to control and debug the SoC address map directly, regardless of whether the CPU is running, and may be used to pre-initialise memory and syetm IO registers and even download code images to run on the CPU. The functionality of the ADP is described more fully at https://soclabs.org/project/hardware-soc-bus-level-debugger

Pinlist

The default pinout found in nanosoc_chip_pads.v is as follows

Pin/portFunction
SEScan enable, for use with scan chains
CLKSystem clock input, currently entire system is driven from this external clock input
TESTTest mode control, if held high during system reset this will enter into scan mode
NRSTActive low system reset
P0 [15:0]GPIO port 0
P1 [15:0]GPIO port 1 (pins[7:0] provide the system-on-chip communications input/output)
SWDIOI/O for serial wire debug
SWDCKClock for serial wire debug

Additional implementation wrappers for EuroPractice 65nm tape outs have different pin numbers::

  • 44-pin supports bits [7:0] of GPIO ports
  • 60-pin supports bits [15:0] of GPIO ports

Team

Research Area
Low power system design
Role
Consultant

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
David Flynn

Consultant at University of Southampton
Research area: Low power system design

Submitted on

Actions

Log-in to Join the Team