Reference Design
Active Project
dwn @ soclabs

nanosoc re-usable MCU platform

Rationale

nanosoc has been designed to provide a simple microcontroller component appropriate to 'host' and support the development and evaluation of research components or subsystems. The design allows a seamless transition from FPGA to physical silicon implementation via a pre-verified programmable control system that allows reuse of software and diagnostic functionality to facilitate the configuration, control and diagnostic analysis of the research hardware such as a custom accelerators or signal processing.

The design is based upon the Arm reference design in the Cortex-M System Design Kit, CMSDK, allowing reuse of the AAA pre-verified IP, documentation and software but architected to support simple 'bolting-on' of memory-mapped experimental hardware with an appropriate testbench development environment.

nanosoc is designed to be used as a base reference SoC for development, implementation, verification and research evaluation, and comes with validation testbenches, but may be adapted and extended as required.

Technical overview

nanosoc is a Cortex-M0 based microcontroller with pad-ring support for silicon implementation. It has internal address space and control and diagnostic support for integrating custom subsystems or research components:

  • CPU - small Arm Cortex-M0 processor with Serial-Wire Debug integrated support
  • Boot Monitor - Synthesized ROM bootstrap for MCU
  • Code-SRAM bank (configurable size bank of memory primarily for downloaded test programs)
  • Data-SRAM bank (configurable size bank of memory primarily for test program data, stack and heap)
  • System peripherals (serial communications, General Purpose IO - GPIO, system counter timers and clocks)
  • Memory-mapped expansion space
  • Optional support for 1 or 2 Direct Memory Address controllers
  • Two banks of DMA-accessible SRAM buffer space for concurrent expansion space usage
  • ASCII Debug Protocol agent, ADP, with clock independent host interface
Block diagram of nanosoc that supports hosting research components of subsystems
Basic block diagram of 'chip' and 'pad-ring' functionality - to support hosting research experimental IP

Architecture

Interconnect fabric

The simple single AMBA AHB bus design of the Arm CMSDK reference design is upgraded to a multi-layer AHB-lite matrix to support up to 4 concurrent access to the primary memory and input/output components.

nanosoc AHB bus matrix  supporting concurrent address map access and arbitration only when multiple initiators compete for a shared segment of address space
nanosoc AHB bus matrix  supporting concurrent address map access

More details of how this bus matrix is generated using the Arm Academic Access tools is described at https://soclabs.org/project/building-system-optimised-amba-interconnect.

Design and validation testbench

The testbench (tb_nanosoc) provides the functionality to support the provision of:

  • system clocking and initialisation
  • hardware debug communications port which supports serial communications and ASCII Debug Port agent control and diagnostics
  • an Arm Serial Wire Debug controller model for validating software debugger connection and functionality
  • an Arm CPU trace model that replicates internal processor state and allows simulated instruction and data trace (both for RTL and gate-level netlist simulation verification)
Simulation testbench architecture - with nanosoc_chip_pads "socket"
nanosoc simulation testbench functionality

FPGA prototyping platform

Two example FPGA example targets for  Xilinx(R) Vivado (R) have been provided to date that support hardware prototyping and verification of the nanosoc functionality:

  • Xilinx® FPGA platform target with a wrapper layer that provides the mapping from nanosoc chip-level ports (inside the pad-ring) to the FPGA pads as well as providing the target clock and reset control from the board-specific peripherals. This supports board-level evaluation and debug at the desk, usually with a USB-connected JTAG interface.
  • Xilinx PYNQ® platform target, that supports fully networked validation support and that can be used as a shareable development resource. This uses the integrated Zynq® Arm Cortex-A processor subsystem to provide the linux OS, network stack and python environment with jupyter notebook test code. The target example is the Xilinx ZCU104 evaluation board, that has first-class PYNQ software support.

The baseline FPGA target simply requires programmable logic and memory block resources, and would normally be connected by USB cable direct to the host development system:

FPGA prototyping architecture - for baseline FPGA board target
FPGA 'wrapper' that instantiates the nanosoc_chip level of hierarchy

The Xilinx Zynq 'PYNQ' platform system development target uses the Programmable Logic (PL) resources to implement the nanosoc design and the Processing System (PS) integrated Zynq-Arm subsystem to provide to run the PYNQ software environment over an Ethernet network connection and allow browser-based SW test and verification remotely from web browser:

FPGA prototyping architecture - for PYNQ-enabled FPGA board target
FPGA 'wrapper' with PYNQ host platform support for networked development

Address Map

The address map is kept closely compatible with the Arm CMSDK reference design, to allow reuse of the documentation and the example test programs provided as a staring point. The bus matrix fabric supports additional expansion memory banks and a large uncommitted address mapped region for experimental sub-system interfacing - sufficient to configure, control and source and sink workload data to and from memory.

nanosoc address map
start-addressend-addressregionnotes
0xF00000000xF0003FFFSystem table ROMCPU/DBG config
0xA00000000xDFFFFFFFExpansion IO spaceExperimental IO
0x900000000x9FFFFFFFExpansion RAM (hi)(DMA memory buffers)
0x800000000x8FFFFFFFExpansion RAM (lo)(DMA memory buffers)
0x600000000x7FFFFFFFExpansion IO spaceExperimental IO
0x400000000x4FFFFFFFSystem IO(CPU MCU peripherals
0x300000000x3FFFFFFFData memory (RAM)(CPU heap/stack)
0x200000000x2FFFFFFFCode memory (RAM)(CPU execution memory)
0x100000000x1FFFFFFFBootstrap ROMsynthesized, mapped to 0
0x000000000x0FFFFFFFVectors, run-time codeBoot ROM -> Code RAM (remapped by boot monitor)

This address map is fully visible to the CPU software environment and the ADP hardware debug agent.

The optional 1 or 2 DMA controller(s) do not have visibility

For the system IO region:

nanosoc system io address map
start-addressend-addressnotes
0x400000000x40000FFFTimer 0
0x400010000x40001FFFTimer 1
0x400020000x40003FFFDual Timer
0x400040000x40004FFFUART 0
0x400050000x40005FFFUART 1
0x400060000x40006FFFUART 2
0x400080000x40008FFFWatchdog Timer
0x4000E0000x4000EFFFUSRT 2
0x4000F0000x4000FFFFDMA 0 Base
0x400100000x40010FFFGPIO 0
0x400110000x40011FFFGPIO 1
0x4001F0000x4001FFFFSystem Control

CPU Interrupts

Interrupt NumberInterrupt NameInterrupt Source
0EXP0_IRQnFrom accelerator - definable by user
1EXP1_IRQnFrom accelerator - definable by user
2EXP2_IRQnFrom accelerator - definable by user
3EXP3_IRQnFrom accelerator - definable by user
4UARTRX2_IRQnUart 2 RX interrupt
5UARTTX2_IRQnUart 2 TX interrupt
6PORT0_ALL_IRQnCombined interrupt for any pins on GPIO port 0
7PORT1_ALL_IRQnCombined interrupt for any pins on GPIO port 1
8TIMER0_IRQnTimer 0 interrupt
9TIMER1_IRQnTimer 1 interrupt
10DUALTIMER_IRQnDual timer interrupt
11EXPB_IRQnUnused
12EXPC_IRQnUart 0 overflow Interrupt
13EXPD_IRQnUart 1 overflow Interrupt
14UARTOVF2_IRQnUart 2 overflow Interrupt
15DMA_IRQnInterrupt from DMA, this is a combined interrupt so can be from any channel
16-31PORT0_x_IRQnI/O pins on Port 0, from pin x, each pin has it's own dedicated interrupt

Communications channel

nanosoc supports interfacing to an external testbench via an off-chip protocol (Future Devices "FT1248" serial interface). This allows both FPGA and in hardware at the board-level to a use a  standard USB host communication port (Future Devices FT232H chip or similar).

Unlike a conventional Universal Asynchronous Receiver-Transmitter, UART, this interface is chosen as it supports the serial communications clock to be sourced from the SoC, so there is no need for a known accurate baud-rate clock (and the on-chip clock source can even be a basic R-C oscillator that drifts in frequency over temperature and time). The FT1248 protocol is supported with 1, 2, 4 or 8 bit bidirectional data bus width; nanosoc implements the single bit serial protocol to minimize pin use and the interface provides full duplex hardware handshaking over the half-duplex physical channel.

The 4-pin interface is mapped onto the four lower pins of the GPIO Port-1 interface:

IO pad mappingsignal namedescription
P1[0]FT_MISOstatus input from FT232H USB bridge (pin 26*)
P1[1]FT_SCLKserialiser clock output to FT232H USB bridge (pin 21*)
P1[2]FT_MIOSIObidirectional serial data to/from FT232H USB bridge (pin 13*)
P1[3]FT_SSNseriliser select output to FT232H USB bridge (pin 25*)

(* where the FT232H interface chip is configured by serial EEPROM for FT1248 interface mode.)

This provides nanosoc with a robust handshaking serial communications channel. The channel defaults to providing standard input/output character IO, mapped to STDIN/STDOUT for the micro-contoller.

However an ASCII 'ESC' (0x1B) escape character is interpreted by the on-chip ADP (ASCII Debug Protocol) agent as the code to enter the ADP hardware monitor mode, signalled by an ASCII ']' character prompt that then allows the host console to debug and control the SoC address map directly, regardless of whether the CPU is running, and may be used to pre-initialise memory and registers and even download code images to run on the CPU. The functionality of ADP is described more fully at https://soclabs.org/project/hardware-soc-bus-level-debugger

For systems that have known-frequency stable clock generation there is also the option of using a standard two-pin UART interface:

IO pad mappingsignal namedescription
P1[4]UART_RXDserial receive data input (from FT232H etc)
P1[5]UART_TXDserial transmit data output (to FT232H etc)

(Note: standard baud-rate programming typically results in significant simulation overheads in communicating over UART channels)

Pinlist

The default pinout found in nanosoc_chip_pads.v is as follows

Pin/portFunction
SEScan enable, for use with scan chains
CLKSystem clock input, currently entire system is driven from this external clock input
TESTTest mode control, if held high during system reset this will enter into scan mode
NRSTActive low system reset
P0 [15:0]GPIO port 0
P1 [15:0]GPIO port 1 (pins of this can be overriden for communications as outlined previously
SWDIOI/O for serial wire debug
SWDCKClock for serial wire debug

Additional implementation wrappers for recent TSMC65 nm tape outs have different pin numbers, 28, 38, and 54. The only difference between these pinouts is that they use a different number of pins on each GPIO port; 4, 8 and 16. 

Using nanoSoC

If you'd like to use nanoSoC for your accelerator you can find all the files on the nanoSoC tech git. In order to use this in a project we suggest that you implement your accelerator as part of our Accelerator Project structure which allows for easy integration.

Team

Name
Research Area
Analog design
Role
Researcher
Research Area
Hardware Acceleration
Role
Student

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
David Flynn

Consultant at University of Southampton
Research area: Low power system design

Related Articles

Submitted on

Actions

Log-in to Join the Team