SDIO Controller Verification
Project Overview
SoC Labs reference designs, such as megaSoC, are developed to allow integration of open source IP along with the core Arm AAA licensed IP to create complete System on Chip implementation.
This projects aim is to integrate of an open source SD Card interface submodule into megaSoC with a focus on developing a clear verification strategy to ensure it is know to work, end-to-end, to provide a persistent storage mechanism.
Modern embedded SoCs usually require some persistent storage to store configuration, log data, and exchange files. The industry standard SD Card is an ideal medium for this. MegaSoC can integrate open source IP to support SD Card. This project uses the open source ZipCPU sdspi controller as the basis for a SD card interface submodule of MegaSoC that supports SPI, SDIO, and eMMC modes from a single RTL codebase.
The SD Card standards supports a variety bus interface and data transfer performance needs. This project targets the SDIO 4-bit native mode path, using the sdiodrv.c driver.
Integrating a new submodule and knowing it works end-to-end are two very different things. In this case, a controller that synthesises cleanly and passes unit-level RTL simulation can still stall indefinitely during card initialisation, produce silent data corruption on block boundaries, or behave differently across the many and varied SD card brands. A goal of this project is to develop a layered verification strategy to look to address these end to end system challenges. The motivation is partly motivational, any MegaSoC application that depends on persistent storage needs a trustworthy foundation and partly practical as the sdspi controller has a known latent issue in its sdrxframe receive module, identified through formal verification, that simulation alone may never have exposed.
The first stage in the layered verification strategy is confirming the correct raw block read and write behaviour at the driver level through C-based integration tests, then building upward to verify the filesystem operations over the same hardware path.
Software-to-Hardware flow
Writing a file through the full software stack involves three layers, each provided by a different source and each with a clearly defined responsibility. Understanding exactly how they connect — and where the boundaries are — is both the intellectual contribution of this integration work and the foundation of the verification methodology.
To implement the top level file system support, FatFs, the open-source filesystem middleware by ChaN, has been chosen as it fits with the SoC Labs ethos of minimising system resource use, and therefore cost, as this implementation has been used in many microcontrollers and embedded systems as it takes up little space in memory.
FatFS handles all filesystem-level concerns: FAT table maintenance, cluster chain resolution, directory entry management, and sector allocation. FatFs is deliberately hardware-agnostic. It communicates downward through exactly five stub functions — disk_initialize, disk_status, disk_read, disk_write, and disk_ioctl — which it calls without any knowledge of what lies beneath them.
Those five stubs are implemented in diskio.c, which is provided as part of the ZipCPU sdspi submodule rather than written from scratch. What makes this implementation architecturally interesting is that it does not call the SDIO driver directly. Instead, every stub routes through a driver abstraction table — DRIVES[], defined in diskiodrvr.h — where each entry holds a hardware base address (fd_addr), a vtable of function pointers (fd_driver: dio_init, dio_read, dio_write, dio_ioctl), and a live driver handle (fd_data) populated at initialisation time. A disk_write() call therefore resolves through the chain: DRIVES[pdrv].fd_driver->dio_write(DRIVES[pdrv].fd_data, sector, count, buff). The SDIO driver is plugged into this table at startup — meaning the same diskio.c can serve either the SDSPI or SDIO backend without modification, simply by changing the table entry. This is a clean separation of concerns. This separation of concerns was not only determined but was also verified as part of this project, not assumed.
The bottom layer is sdiodrv.c, part of the SD Card submodule, which the DRIVES[] vtable points to at runtime. It translates dio_read and dio_write calls into native SDIO commands over the 4-bit parallel bus: CMD17 for single block reads, CMD24 for single block writes, CMD25 for multi-block writes. CRC generation and checking is handled entirely in hardware by the sdspi RTL controller. The addressing mode — byte for SDSC, block for SDHC/SDXC — is determined once during sdio_init() from the CCS bit in the ACMD41 response and applied automatically to every subsequent transfer.
There is one detail the layer description alone does not make visible: disk_initialize calls dio_init, which calls sdio_init(), which executes the full SDIO initialisation sequence. FatFs triggers this once, at f_mount() time, and trusts the result completely. If sdio_init() returns NULL, disk_initialize returns STA_NODISK, FatFs returns FR_NOT_READY, and nothing in that chain tells you which step of initialisation failed. This opacity is precisely why the initialisation sequence is tested and documented independently before FatFs is involved — the integration test results in this project provide the evidence that the full sequence completes correctly in simulation, confirmed by PASS: sdio_init returned a valid driver handle in the test output.



SDIO Initialisation:
Fully understanding the initialisation sequence matters for integration testing because each stage introduces a new failure mode that C test code needs to handle correctly. The sequence is orchestrated entirely by sdio_init(), and it does far more than simply wake the card up — it negotiates the operating voltage, identifies the card, assigns it an address on the bus, selects it, widens the bus from 1-bit to 4-bit, and then attempts to step the clock up through a hierarchy of high-speed modes depending on what both the card and the PHY can support. If any stage fails, the driver has explicit error paths that either bail entirely or degrade gracefully to a slower mode. The acknowledged limitations stated are the driver currently handles only 3.3V mode even if 1.8V switching hardware exists, and it does not recover well from a failed initialisation.
| Command / Action | Driver Function | Register Value | Response Expected | What to Check | Failure Mode / Notes |
|---|---|---|---|---|---|
Controller reset + PHY setup | sdio_init() | sd_cmd = SDIO_RESET | SDIO_REMOVEDsd_phy = SPEED_SLOW | SECTOR_512B | Poll SDIO_HWRESET until clear | Clock confirmed at 400 kHz. Poll until SPEED_SLOW reflected back in PHY register. | [PHY phase] Clock phase set based on frontend type: SERDES→20, DDR/raw→16. Wrong phase causes marginal data capture — CMD19 tuning later corrects this. |
CMD0 — GO_IDLE_STATE | sdio_go_idle() | sd_data = 0sd_cmd = SDIO_REMOVED | SDIO_CMD | SDIO_RNONE | SDIO_ERR | None | No response byte to decode. Simply wait for busy to clear. Card enters idle state. | Check for timeout conditions while waiting. |
CMD8 — SEND_IF_COND | sdio_send_if_cond() | sd_data = 0x01a5 (VHS=1, pattern=0xA5)sd_cmd = SDIO_READREG + 8VHS=1 -> voltage rage 2.7 - 3.6V | R7 via SDIO_READREG — lower byte echoes pattern (0x01a5) | Check lower byte of response = 0xA5. If 8 != (hcs & 0x80ff) → card is not high capacity capable → hcs=0. | [Critical branch] If echo ≠ 0xA5 → driver frees memory and returns NULL, No retry. If no response → assume non-HCS, continue with hcs=0. |
ACMD41 — query (opcond=0) | sdio_send_op_cond() | sd_data = op_cond_query (initially 0)sd_cmd = SDIO_READREG + 41 | R3 — OCR register. Stored in dev->d_OCR | Mask response with 0x0ff8000 to extract voltage ranges. If result = 0, no compatible voltages → bail. | [Voltage probe] First ACMD41 is a query with opcond=0. If SDPHY_1P8VSPT set in PHY, driver adds S18R | XPC flags to next call. |
ACMD41 — poll loop | sdio_send_op_cond() | sd_data = op_cond_query (includes HCS/S18R/XPC flags after initial probe) | R3 (OCR) with bit 31 = 1 indicates card power-up complete; loop exits when set | Poll OCR[31] until set (card ready). Verify CCS (bit 30) captured in dev->d_OCR for addressing mode selection. | [No timeout — Critical] Poll loop has no timeout guard. A card that never raises bit 31 hangs forever. Wrap sdio_init() with an external watchdog in your test code. |
CMD11 — VOLTAGE_SWITCH (conditional) | sdio_send_voltage_switch() | sd_cmd = (SDIO_ERR | SDIO_READREG) + 11 | R1 — check SDIO_ERR bit | Only executed if SDPHY_1P8VSPT set AND d_OCR has CCS and S18R. | [Known limitation] This controller only handles 3.3V mode. Even if hardware exists for switching to 1.8V, this driver doesn't (yet) enable it. The controller doesn't really recover well from a failed init. |
CMD2 — ALL_SEND_CID | sdio_all_send_cid() | sd_cmd = (SDIO_ERR | SDIO_READR2) + 2 | R2 (136-bit) — 4×32-bit words from sd_fifa into d_CID[] | CID contains manufacturer ID, product name, serial number, manufacture date. Logged via sdio_dump_cid() if SDINFO=1. | [SDIO only] Does not exist in SPI mode. Useful during bring-up to confirm correct card identity. |
CMD3 — SEND_RELATIVE_ADDR | sdio_send_rca() | sd_cmd = (SDIO_ERR | SDIO_READREG) + 3 | R6 — upper 16 bits = RCA. Stored as dev->d_RCA = (r >> 16) & 0xffff | RCA used by every subsequent targeted command. If CMD3 errors, driver retries once before returning RCA=0. | [Silent failure risk] If both CMD3 attempts fail, RCA=0 is returned silently. CMD7 then selects no card. All subsequent reads/writes time out with no clear error. |
| CMD10 SEND_CID3.3V path — called before CMD7 | sdio_send_cid() | sd_data = d_RCA << 16 sd_cmd = (SDIO_ERR | SDIO_READR2) + 10 | R2 (136-bit) — 4×32-bit words from sd_fifa into d_CID[] | Reads manufacturer ID, product name, serial number, manufacture date using the RCA assigned by CMD3. Card must be in Stand-by state (not yet selected by CMD7) | In the 3.3V path, CMD10 is called before CMD7 because the card is still in Stand-by state and accepts CID read commands. In the 1.8V path this is reversed — card must be temporarily unselected because cannot execute either CMD10 or CMD9 from the transfer state. |
| CMD9 SEND_CSD - called before CMD7 | sdio_read_csd() | sd_data = d_RCA << 16 sd_cmd = (SDIO_CMD | SDIO_R2 | SDIO_ERR) + 9 | R2 (136-bit) — 16 bytes of CSD register read via sd_fifa into d_CSD[] | Driver decodes CSD_STRUCTURE (bits [127:126]) to choose parsing path. CSD_STRUCTURE=1 (SDHC) → d_sector_count = (C_SIZE+1) * 1024. CSD_STRUCTURE=0 (SDSC) → more complex calculation | [Silent failure risk] d_sector_count and d_block_size are only populated here they are 0 until CMD9 completes. . If CMD9 fails silently or returns invalid data, disk_ioctl(GET_SECTOR_COUNT) returns 0, and FatFs refuses to mount. |
CMD7 — SELECT_CARD | sdio_select_card() | sd_data = d_RCA << 16sd_cmd = (SDIO_ERR | SDIO_READREGb) + 7 | R1b — card transitions Stand-by → Transfer state | Without CMD7, all data commands (CMD17, CMD24) are ignored. Uses R1b to handle card-busy phase after selection. | [Critical] If RCA from CMD3 was 0 (silent failure), CMD7 selects no card. All subsequent operations time out with no diagnostic distinguishing this from other failures. |
| ACMD51 - SEND_SCR | sdio_read_scr() | sd_phy sector = SECTOR_64B (3<<24) sd_cmd = (SDIO_ERR | SDIO_MEM | SDIO_READREG) + 51 Preceded by CMD55 via sdio_send_app_cmd() | 64-bit SCR register — 2×32-bit words from sd_fifa into d_SCR[8] | Key bits in SCR[1]: bit 2 = 4-bit bus support. If set → ACMD6 issued next and SDPHY_W4 enabled. If clear → bus stays 1-bit.line 2220: if (dv->d_SCR[1] & 0x04 | [Bus width gate] SCR is the sole decision point for 4-bit bus width in the 3.3V path. On real PYNQ-Z2 hardware with a real SD card, this bit will be 1, enabling ACMD6 and 4× throughput improvement. Verifying this transition is a concrete hardware-vs-simulation difference to document. |
ACMD6 — SET_BUS_WIDTH | sdio_set_bus_width(dv,2) | sd_data = 2 (4-bit)sd_cmd = SDIO_READREG + 6sd_phy |= SDPHY_W4 | R1 — check no error bits set | Widens bus from 1-bit to 4-bit. PHY and card must both be updated — mismatch causes all data transfers to fail. | [1.8V path only] ACMD6 currently only called in 1.8V PHY path. At default 3.3V, bus stays 1-bit — limits throughput by 4×. L |
CMD6 — SWITCH_FUNCSpeed negotiation | sdio_switch() | Query: 0x00fffff3Switch: 0x80fffff1–4 (mode dependent) | R1 + 512-bit status block via sd_fifa | Driver probes SDR104→DDR50→SDR50→SDR25 in order. Falls back to SDR12 if none succeed or PHY cannot sustain speed. | [Speed hierarchy] SDR104 (200 MHz) → DDR50 → SDR50 → SDR25 → SDR12 (25 MHz default). Each checked against PHY before card switch committed. Lines 1977–2150. |
| Init complete | sdio_init() returns SDIODRV *dv | sd_phy reflects final negotiated speed + bus width | Non-NULL = success. NULL = fatal failure (memory, voltage mismatch, or CMD8 echo fail). | Addressing mode determined by dev->d_OCR & 0x40000000 (CCS bit). Used in every subsequent read/write call. |
Verification Strategy:
This project addresses verification at two distinct levels: functional correctness of the SDIO controller integration through C-based testing in simulation, and physical correctness through hardware validation on a PYNQ-Z2 FPGA platform. The downstream design flow — from gate-level netlist through to ASIC implementation — is handled within the main MegaSoC project and is out of scope here.
The verification approach is intentionally layered. Simulation establishes that the software stack and RTL controller interact correctly under ideal conditions. Hardware validation then confirms that the same integration survives contact with a real SD card, a real AXI bus, and real signal timing — conditions that behavioural simulation cannot reproduce. Simulation without hardware leaves timing and card variability unverified, while hardware bring-up without prior simulation makes debugging exponentially harder because software bugs and hardware bugs become indistinguishable.
C-based Integration Testing in Simulation
Functional correctness is established through a structured C test suite (sdio_tests.c) compiled against the MegaSoC software stack and executed against the RTL simulation. The test suite exercises fourteen scenarios covering initialisation, card information validity, single block read, read repeatability, single block write with readback, multi-block write with readback, invalid argument handling, a repeated transfer stress test across eight iterations and corner cases verification as well . Pass criteria are explicit for each test: sdio_init() must return a non-NULL handle, card parameters (OCR, RCA, block size, sector count) must be non-zero and self-consistent, and every write/readback comparison must match byte-for-byte using compare_buffers().
The simulation model (mdl_sdio.v) was itself modified during this project to correctly respond to ACMD51 (SCR register transfer), CMD6 (SWITCH_FUNC status block), and the CMD24/CMD25 write path with DAT0 busy indication. This is an important limitation to state explicitly: the simulation model was modified while executing the tests. This is not a weakness of the testing methodology — it is a known and documented constraint that motivates the hardware validation phase.
All fourteen test cases currently pass in simulation with zero errors, confirmed by the ** TEST PASSED ** output and Total SDIO errors = 0. The simulation log provides timestamped evidence of each command transaction — CMD17 and CMD24 sector addresses, data patterns at write and read, and CRC results — making the result reproducible and auditable.
Hardware Validation on PYNQ-Z2
For full integration confidence, the design will be validated on a physical PYNQ-Z2 development board connected to a real microSD card. The PYNQ-Z2 carries a Xilinx Zynq-7000 SoC (XC7Z020), which combines Artix-7 FPGA fabric with a hard dual-core ARM Cortex-A9 Processing System. In this project, the FPGA fabric hosts the MegaSoC RTL — including the sdspi controller — while the ARM Cortex-A9 acts as the host processor, running the same C test suite used in simulation. This mirrors the intended MegaSoC deployment architecture closely enough to constitute a meaningful integration test rather than a purely academic exercise.
The hardware validation is structured as five sequential stages, each of which must pass before the next begins. Stage 1 confirms the SDIO controller clock reaches 400 kHz before any command is issued, verified by reading the sd_phy register directly. Stage 2 issues only CMD0 and CMD8 to confirm physical card presence and command line integrity. Stage 3 runs the full sdio_init() sequence, where two specific differences from simulation are expected and will be documented: the ACMD41 poll loop will iterate significantly more times than in simulation , and the SCR register will report 4-bit bus width support, causing ACMD6 to fire for the first time. Stage 4 runs the existing test suite unchanged on real hardware. Stage 5 adds a FatFs layer test (f_mount, f_open, f_write, f_read, f_close) that exercises the complete path from application code through the diskio.c vtable through sdiodrv.c to the physical card and back. Passing Stage 5 closes the stated project goal.
Three specific hardware-vs-simulation differences are anticipated and will be explicitly documented when observed: the 4-bit bus width activation via ACMD6 (SCR bit 2 = 1 on real cards versus 0 in the simulation model), the ACMD41 timing on a cold card versus the single-cycle response of the model, and the byte ordering through the OPT_LITTLE_ENDIAN = 1'b1 AXI path which can only be verified as correct by confirming specific byte values at specific buffer offsets on real hardware. These are not expected failures — they are predicted observations that the verification plan is designed to capture and document.
Add new comment
To post a comment on this article, please log in to your account. New users can create an account.