nanosoc re-usable MCU platform
Rationale
nanosoc has been designed to provide a simple microcontroller component appropriate to 'host' and support the development and evaluation of research components or subsystems. The design allows a seamless transition from FPGA to physical silicon implementation via a pre-verified programmable control system that allows reuse of software and diagnostic functionality to facilitate the configuration, control and diagnostic analysis of the research hardware such as a custom accelerators or signal processing.
The design is based upon the Arm reference design in the Cortex-M System Design Kit, CMSDK, allowing reuse of the AAA pre-verified IP, documentation and software but architected to support simple 'bolting-on' of memory-mapped experimental hardware with an appropriate testbench development environment.
nanosoc is designed to be used as a base reference SoC for development, implementation, verification and research evaluation, and comes with validation testbenches, but may be adapted and extended as required.
Technical overview
nanosoc is a Cortex-M0 based microcontroller with pad-ring support for silicon implementation. It has internal address space and control and diagnostic support for integrating custom subsystems or research components:
- CPU - small Arm Cortex-M0 processor with Serial-Wire Debug integrated support
- Boot Monitor - Synthesized ROM bootstrap for MCU
- Code-SRAM bank (configurable size bank of memory primarily for downloaded test programs)
- Data-SRAM bank (configurable size bank of memory primarily for test program data, stack and heap)
- System peripherals (serial communications, General Purpose IO - GPIO, system counter timers and clocks)
- Memory-mapped expansion space
- Optional support for 1 or 2 Direct Memory Address controllers
- Two banks of DMA-accessible SRAM buffer space for concurrent expansion space usage
- ASCII Debug Protocol agent, ADP, with clock independent host interface
Architecture
Interconnect fabric
The simple single AMBA AHB bus design of the Arm CMSDK reference design is upgraded to a multi-layer AHB-lite matrix to support up to 4 concurrent access to the primary memory and input/output components.
More details of how this bus matrix is generated using the Arm Academic Access tools is described at https://soclabs.org/project/building-system-optimised-amba-interconnect.
Design and validation testbench
The testbench (tb_nanosoc) provides the functionality to support the provision of:
- system clocking and initialisation
- hardware debug communications port which supports serial communications and ASCII Debug Port agent control and diagnostics
- an Arm Serial Wire Debug controller model for validating software debugger connection and functionality
- an Arm CPU trace model that replicates internal processor state and allows simulated instruction and data trace (both for RTL and gate-level netlist simulation verification)
FPGA prototyping platform
Two example FPGA example targets for Xilinx(R) Vivado (R) have been provided to date that support hardware prototyping and verification of the nanosoc functionality:
- Xilinx® FPGA platform target with a wrapper layer that provides the mapping from nanosoc chip-level ports (inside the pad-ring) to the FPGA pads as well as providing the target clock and reset control from the board-specific peripherals. This supports board-level evaluation and debug at the desk, usually with a USB-connected JTAG interface.
- Xilinx PYNQ® platform target, that supports fully networked validation support and that can be used as a shareable development resource. This uses the integrated Zynq® Arm Cortex-A processor subsystem to provide the linux OS, network stack and python environment with jupyter notebook test code. The target example is the Xilinx ZCU104 evaluation board, that has first-class PYNQ software support.
The baseline FPGA target simply requires programmable logic and memory block resources, and would normally be connected by USB cable direct to the host development system:
The Xilinx Zynq 'PYNQ' platform system development target uses the Programmable Logic (PL) resources to implement the nanosoc design and the Processing System (PS) integrated Zynq-Arm subsystem to provide to run the PYNQ software environment over an Ethernet network connection and allow browser-based SW test and verification remotely from web browser:
Address Map
The address map is kept closely compatible with the Arm CMSDK reference design, to allow reuse of the documentation and the example test programs provided as a staring point. The bus matrix fabric supports additional expansion memory banks and a large uncommitted address mapped region for experimental sub-system interfacing - sufficient to configure, control and source and sink workload data to and from memory.
start-address | end-address | region | notes |
---|---|---|---|
0xF0000000 | 0xF0003FFF | System table ROM | CPU/DBG config |
0xA0000000 | 0xDFFFFFFF | Expansion IO space | Experimental IO |
0x90000000 | 0x9FFFFFFF | Expansion RAM (hi) | (DMA memory buffers) |
0x80000000 | 0x8FFFFFFF | Expansion RAM (lo) | (DMA memory buffers) |
0x60000000 | 0x7FFFFFFF | Expansion IO space | Experimental IO |
0x40000000 | 0x4FFFFFFF | System IO | (CPU MCU peripherals |
0x30000000 | 0x3FFFFFFF | Data memory (RAM) | (CPU heap/stack) |
0x20000000 | 0x2FFFFFFF | Code memory (RAM) | (CPU execution memory) |
0x10000000 | 0x1FFFFFFF | Bootstrap ROM | synthesized, mapped to 0 |
0x00000000 | 0x0FFFFFFF | Vectors, run-time code | Boot ROM -> Code RAM (remapped by boot monitor) |
This address map is fully visible to the CPU software environment and the ADP hardware debug agent.
The optional 1 or 2 DMA controller(s) do not have visibility
For the system IO region:
start-address | end-address | notes |
---|---|---|
0x40000000 | 0x40000FFF | Timer 0 |
0x40001000 | 0x40001FFF | Timer 1 |
0x40002000 | 0x40003FFF | Dual Timer |
0x40004000 | 0x40004FFF | UART 0 |
0x40005000 | 0x40005FFF | UART 1 |
0x40006000 | 0x40006FFF | UART 2 |
0x40008000 | 0x40008FFF | Watchdog Timer |
0x4000E000 | 0x4000EFFF | USRT 2 |
0x4000F000 | 0x4000FFFF | DMA 0 Base |
0x40010000 | 0x40010FFF | GPIO 0 |
0x40011000 | 0x40011FFF | GPIO 1 |
0x4001F000 | 0x4001FFFF | System Control |
CPU Interrupts
Interrupt Number | Interrupt Name | Interrupt Source |
---|---|---|
0 | EXP0_IRQn | From accelerator - definable by user |
1 | EXP1_IRQn | From accelerator - definable by user |
2 | EXP2_IRQn | From accelerator - definable by user |
3 | EXP3_IRQn | From accelerator - definable by user |
4 | UARTRX2_IRQn | Uart 2 RX interrupt |
5 | UARTTX2_IRQn | Uart 2 TX interrupt |
6 | PORT0_ALL_IRQn | Combined interrupt for any pins on GPIO port 0 |
7 | PORT1_ALL_IRQn | Combined interrupt for any pins on GPIO port 1 |
8 | TIMER0_IRQn | Timer 0 interrupt |
9 | TIMER1_IRQn | Timer 1 interrupt |
10 | DUALTIMER_IRQn | Dual timer interrupt |
11 | EXPB_IRQn | Unused |
12 | EXPC_IRQn | Uart 0 overflow Interrupt |
13 | EXPD_IRQn | Uart 1 overflow Interrupt |
14 | UARTOVF2_IRQn | Uart 2 overflow Interrupt |
15 | DMA_IRQn | Interrupt from DMA, this is a combined interrupt so can be from any channel |
16-31 | PORT0_x_IRQn | I/O pins on Port 0, from pin x, each pin has it's own dedicated interrupt |
Communications channel
nanosoc supports interfacing to an external testbench via an off-chip protocol (Future Devices "FT1248" serial interface). This allows both FPGA and in hardware at the board-level to a use a standard USB host communication port (Future Devices FT232H chip or similar).
Unlike a conventional Universal Asynchronous Receiver-Transmitter, UART, this interface is chosen as it supports the serial communications clock to be sourced from the SoC, so there is no need for a known accurate baud-rate clock (and the on-chip clock source can even be a basic R-C oscillator that drifts in frequency over temperature and time). The FT1248 protocol is supported with 1, 2, 4 or 8 bit bidirectional data bus width; nanosoc implements the single bit serial protocol to minimize pin use and the interface provides full duplex hardware handshaking over the half-duplex physical channel.
The 4-pin interface is mapped onto the four lower pins of the GPIO Port-1 interface:
IO pad mapping | signal name | description |
---|---|---|
P1[0] | FT_MISO | status input from FT232H USB bridge (pin 26*) |
P1[1] | FT_SCLK | serialiser clock output to FT232H USB bridge (pin 21*) |
P1[2] | FT_MIOSIO | bidirectional serial data to/from FT232H USB bridge (pin 13*) |
P1[3] | FT_SSN | seriliser select output to FT232H USB bridge (pin 25*) |
(* where the FT232H interface chip is configured by serial EEPROM for FT1248 interface mode.)
This provides nanosoc with a robust handshaking serial communications channel. The channel defaults to providing standard input/output character IO, mapped to STDIN/STDOUT for the micro-contoller.
However an ASCII 'ESC' (0x1B) escape character is interpreted by the on-chip ADP (ASCII Debug Protocol) agent as the code to enter the ADP hardware monitor mode, signalled by an ASCII ']' character prompt that then allows the host console to debug and control the SoC address map directly, regardless of whether the CPU is running, and may be used to pre-initialise memory and registers and even download code images to run on the CPU. The functionality of ADP is described more fully at https://soclabs.org/project/hardware-soc-bus-level-debugger
For systems that have known-frequency stable clock generation there is also the option of using a standard two-pin UART interface:
IO pad mapping | signal name | description |
P1[4] | UART_RXD | serial receive data input (from FT232H etc) |
P1[5] | UART_TXD | serial transmit data output (to FT232H etc) |
(Note: standard baud-rate programming typically results in significant simulation overheads in communicating over UART channels)
Pinlist
The default pinout found in nanosoc_chip_pads.v is as follows
Pin/port | Function |
---|---|
SE | Scan enable, for use with scan chains |
CLK | System clock input, currently entire system is driven from this external clock input |
TEST | Test mode control, if held high during system reset this will enter into scan mode |
NRST | Active low system reset |
P0 [15:0] | GPIO port 0 |
P1 [15:0] | GPIO port 1 (pins of this can be overriden for communications as outlined previously |
SWDIO | I/O for serial wire debug |
SWDCK | Clock for serial wire debug |
Additional implementation wrappers for recent TSMC65 nm tape outs have different pin numbers, 28, 38, and 54. The only difference between these pinouts is that they use a different number of pins on each GPIO port; 4, 8 and 16.
Using nanoSoC
If you'd like to use nanoSoC for your accelerator you can find all the files on the nanoSoC tech git. In order to use this in a project we suggest that you implement your accelerator as part of our Accelerator Project structure which allows for easy integration.
Add new comment
To post a comment on this article, please log in to your account. New users can create an account.