Collaborative
Active Project

PTP Hardware Clock: Sub-Nanosecond Timekeeping for Chiplet Systems

Introduction

Precision timekeeping is a foundational service in any distributed system. Whether synchronising Ethernet frames to a PTP grandmaster, timestamping die-to-die packet exchanges between chiplets, or scheduling time-critical hardware events, the system needs a clock that is accurate, capturable at multiple points simultaneously, and adjustable by both hardware servo loops and software without stopping.

The PTP Hardware Clock (PHC) is an IEEE 1588-aligned timestamp counter IP designed for embedded SoC and chiplet applications. It maintains a 110-bit timestamp (48-bit seconds + 30-bit nanoseconds + 32-bit sub-nanosecond fraction), provides four independent capture banks for simultaneous timestamping from different sources, integrates a dual-source hardware servo multiplexer for runtime-selectable clock discipline, and exposes a programmable alarm with pulse-per-second output.

At approximately 10,000 um^2 in TSMC 65nm (including the AHB-to-APB bridge), the PHC is small enough to instantiate on every chiplet in a multi-die system, giving each node its own high-resolution time reference that can be synchronised to a global master via TideLink PTP or IEEE 1588 Ethernet PTP.

This article describes the PHC's architecture, timestamp format, capture and servo mechanisms, register interface, and verification, developed as part of the SoC Labs platform.

Timestamp Architecture

Format

The PHC maintains a continuously incrementing timestamp with three fields:

  ┌──────────────────┬──────────────────────┬──────────────────────────┐
  │ seconds (48-bit) │ nanoseconds (30-bit) │ sub-nanoseconds (32-bit) │
  │ 0 to 2^48 - 1    │  0 to 999,999,999    │ fractional accumulator   │
  └──────────────────┴──────────────────────┴──────────────────────────┘
   ~8.9 million years    1-second range         ~0.23 ps granularity

The 48-bit seconds field matches the IEEE 1588 PTP timestamp format. The 30-bit nanoseconds field counts from 0 to 999,999,999, rolling over into the seconds field. The 32-bit sub-nanosecond accumulator provides fractional precision --- its carry-out adds 1 ns to the nanoseconds field.

Per-Cycle Increment

Two registers control how much time advances each clock cycle:

  • NS_INCR (8-bit): Integer nanoseconds added per cycle. Default 4 for a 250 MHz clock (1 cycle = 4 ns).
  • NS_INCR_FRAC (32-bit): Sub-nanosecond fraction added per cycle. The accumulator carries into the nanosecond field when it overflows.

This split representation allows the PHC to track non-integer clock periods. For example, a 200 MHz clock (5.0 ns period) uses NS_INCR=5, NS_INCR_FRAC=0, while a 233.33 MHz clock uses NS_INCR=4, NS_INCR_FRAC=0x55555555 (the fractional part represents 0.2857 ns, accumulating to produce the correct carry pattern).

Carries and Rollover

The increment logic uses a 33-bit adder for the sub-nanosecond accumulator (detecting carry into nanoseconds) and a 31-bit adder for nanoseconds (detecting second rollover at 999,999,999). On second rollover, the seconds field increments, nanoseconds wraps to the residual, and a pulse-per-second (PPS) output asserts for exactly one clock cycle.

  Every cycle:
    sub_ns_sum    = sub_nanoseconds + ns_incr_frac       (33-bit)
    carry         = sub_ns_sum[32]                        (overflow)
    ns_sum        = nanoseconds + ns_incr + carry         (31-bit)

    if (ns_sum >= 1,000,000,000):
        seconds      <= seconds + 1
        nanoseconds  <= ns_sum - 1,000,000,000
        pps          <= 1                                 (one cycle pulse)
    else:
        nanoseconds  <= ns_sum
        pps          <= 0

Four Capture Banks

The PHC provides four independent capture banks, each atomically snapshotting the full 110-bit timestamp on its respective trigger. All captures are combinational --- they complete in a single clock cycle with no bus stalling or multi-cycle sequencing.

                              ┌──────────────┐
  CTRL[2] (APB write) ───────►│  SW Capture  │──► CAP_SECONDS/NS/FRAC (0x020)
                              ├──────────────┤
  hw_capture (external) ─────►│  HW Capture  │──► HW_CAP_SECONDS/NS/FRAC (0x040)
                              ├──────────────┤
  eth_rx_capture ────────────►│ ETH RX Cap   │──► ETH_RX_CAP_SECONDS/NS/FRAC (0x060)
                              ├──────────────┤
  eth_tx_capture ────────────►│ ETH TX Cap   │──► ETH_TX_CAP_SECONDS/NS/FRAC (0x080)
                              └──────────────┘

Software Capture

Triggered by writing CAPTURE=1 (W1S) to the CTRL register via APB. The current timestamp is latched into the CAP_* registers, which firmware then reads. Used for periodic status checks, synchronisation verification, and diagnostic snapshots.

Hardware Capture

Triggered by an external hw_capture pulse, typically from the TideLink PTP autonomous servo. The captured timestamp appears in the HW_CAP_* registers, readable by both the local APB interface and the external servo source. This enables hardware servo loops to read timestamps without any APB bus transaction, eliminating software latency from the synchronisation loop.

Ethernet PTP Captures

Two dedicated capture banks for IEEE 1588 PTP frame timestamping:

  • RX Capture: Triggered by eth_rx_capture when a PTP frame (EtherType 0x88F7) is received on an Ethernet MAC's MII interface. Records the ingress timestamp (t2 or t4 in PTP terminology).
  • TX Capture: Triggered by eth_tx_capture when a PTP frame is transmitted. Records the egress timestamp (t1 or t3).

These triggers come from the PTP event detector in a Ethernet Subsystem, which parses MII nibbles at line speed and generates single-cycle pulses on PTP frame detection.

Capture Independence

All four banks operate independently. A hardware capture can occur simultaneously with an Ethernet RX capture without either being lost or corrupted. Each bank has its own set of registers, and the snapshot logic is purely combinational (a parallel register load), so there is no arbitration or priority between captures.

Dual-Source Hardware Servo

The PHC integrates a two-source servo multiplexer, allowing the clock to be disciplined from either of two external sources at runtime:

  ┌───────────────────────┐
  │  Source 0 (TideLink)  │──┐
  │  hw_capture_0         │  │      ┌──────────┐
  │  hw_set_time_0        │  ├─────►│ Servo    │──► Clock Core
  │  hw_adj_ns_incr_frac_0│  │  ┌──►│   Mux    │   (set_time, adj_frac)
  └───────────────────────┘  │  │   │          │
                             │  │   │ SRC_SEL  │◄── SERVO_CTRL[0]
  ┌───────────────────────┐  │  │   └──────────┘
  │  Source 1 (HA1588)    │──┘  │
  │  hw_capture_1         │─────┘
  │  hw_set_time_1        │
  │  hw_adj_ns_incr_frac_1│
  └───────────────────────┘

The default source. The TideLink PTP subsystem performs autonomous inter-chiplet time synchronisation using hardware-timestamped PTP exchanges over the die-to-die link. It drives hw_capture_0 to snapshot the local time, reads back the captured values, computes offset and delay, and applies corrections via hw_set_time_0 (phase step) or hw_adj_ns_incr_frac_0 (frequency steering).

Source 1: HA1588 Hardware Servo

An alternative source for Ethernet-based time synchronisation. The HA1588 servo module (in the Ethernet Subsystem) periodically compares the HA1588 RTC time to the PHC time and applies corrections. This is used when the PHC should be disciplined from the Ethernet PTP grandmaster rather than from a neighbouring chiplet.

Runtime Switching

The servo source is selected by SERVO_CTRL[0] (SRC_SEL). Switching is immediate (combinational mux) --- the clock continues running without interruption. Both sources always observe the same hw_cap_* outputs, so a source can monitor the clock even when it is not the active disciplining source.

Fractional Increment Arbitration

The NS_INCR_FRAC register can be written by both the APB interface (software) and the active hardware servo (hw_adj_valid). The PHC uses a last-writer-wins protocol: when the hardware servo asserts hw_adj_valid, it takes ownership of the fractional increment; when software writes a new value to NS_INCR_FRAC via APB, software reclaims ownership. This prevents the hardware servo's adjustments from being silently overwritten by stale software values, while still allowing software to override the servo when needed.

Alarm and PPS

Programmable Alarm

The PHC includes a time-of-day alarm comparator:

  1. Load the target time into ALARM_SECONDS_LO/HI and ALARM_NANOSECONDS
  2. Set ALARM_CTRL[0] = ARM to enable the comparator
  3. When (seconds == alarm_seconds) && (nanoseconds >= alarm_nanoseconds), alarm_hit asserts
  4. If ALARM_CTRL[1] = AUTO_DISARM, the alarm automatically disarms after firing (one-shot mode)

The alarm drives STATUS[2] and the alarm_irq output (gated by INT_EN[1]). Nanosecond-granularity alarm resolution enables precise hardware event scheduling.

Pulse-Per-Second

The PPS output asserts for exactly one clock cycle on every second rollover, aligned to nanoseconds = 0. It drives:

  • The pps_out top-level output (for external test equipment or oscilloscope triggering)
  • A sticky status bit STATUS[1] (cleared on register read)
  • The pps_irq interrupt (gated by INT_EN[0])

Register Map

Core Configuration (0x000--0x01C)

OffsetNameAccessDescription
0x000CTRLRW[0] EN, [1] SET_TIME (W1S), [2] CAPTURE (W1S)
0x004STATUSRO[0] RUNNING, [1] PPS (sticky, clear-on-read), [2] ALARM_HIT
0x008NS_INCRRW[7:0] Integer ns per cycle (default 4)
0x00CNS_INCR_FRACRW[31:0] Fractional sub-ns per cycle
0x010SET_SECONDS_LORW[31:0] Seconds to load (lower)
0x014SET_SECONDS_HIRW[15:0] Seconds to load (upper)
0x018SET_NANOSECONDSRW[29:0] Nanoseconds to load
0x01CINT_ENRW[0] PPS_IRQ_EN, [1] ALARM_IRQ_EN

Software Capture + Alarm (0x020--0x03C)

OffsetNameAccessDescription
0x020CAP_SECONDS_LOROCaptured seconds (lower)
0x024CAP_SECONDS_HIROCaptured seconds (upper)
0x028CAP_NANOSECONDSROCaptured nanoseconds
0x02CCAP_NS_FRACROCaptured sub-nanoseconds
0x030ALARM_SECONDS_LORWAlarm target seconds (lower)
0x034ALARM_SECONDS_HIRWAlarm target seconds (upper)
0x038ALARM_NANOSECONDSRWAlarm target nanoseconds
0x03CALARM_CTRLRW[0] ARM, [1] AUTO_DISARM

Hardware Capture (0x040--0x04C)

OffsetNameAccessDescription
0x040--0x04CHW_CAP_*ROHardware-captured seconds, nanoseconds, sub-ns

Ethernet PTP Captures (0x060--0x08C)

OffsetNameAccessDescription
0x060--0x06CETH_RX_CAP_*ROEthernet RX PTP capture
0x080--0x08CETH_TX_CAP_*ROEthernet TX PTP capture

Servo Configuration (0x0A0--0x0A8)

OffsetNameAccessDescription
0x0A0SERVO_CTRLRW[0] SRC_SEL (0=TideLink, 1=HA1588), [1] HA1588_SERVO_EN
0x0A4SYNC_INTERVALRW[29:0] Sync period in nanoseconds (default ~1 Hz)
0x0A8SERVO_STATUSRO[0] LOCKED, [1] PHASE_STEP_ACTIVE

Hardware Architecture

The PHC is structured as four SystemVerilog modules:

  PHC_AHB (top-level, AHB slave)
  └── cmsdk_ahb_to_apb (CMSDK bridge)
       └── phc (APB top-level)
            ├── phc_apb_regs (register interface, alarm, interrupts)
            └── phc_clock_core (timestamp counter, 4 capture banks, PPS)
                 └── servo source mux (combinational, in phc.sv)
ModuleLinesDescription
phc_clock_core.sv235Timestamp counter, increment logic, 4 capture banks, PPS
phc_apb_regs.sv403APB register decode, alarm comparator, interrupt gating
phc.sv323APB top-level, servo source mux, signal routing
PHC_AHB.sv211AHB-to-APB bridge wrapper

Estimated area (TSMC 65nm): ~10,000 um^2 including the CMSDK AHB-to-APB bridge. The clock core alone is approximately 3,000--4,000 um^2.

Software Driver

A CMSIS-compliant C driver (phc.h, phc.c) provides the firmware interface:

phc_t clock;
phc_init(&clock, PHC_BASE_ADDR);

// Configure for 250 MHz
phc_set_increment(&clock, 4, 0x00000000);
phc_enable(&clock);

// Set initial time
phc_timestamp_t t = {.seconds_lo = 1000, .nanoseconds = 0};
phc_set_time(&clock, &t);

// Read current time via software capture
phc_capture(&clock);
phc_timestamp_t now;
phc_read_capture(&clock, &now);

// Set alarm for 5 seconds from now
phc_timestamp_t alarm = {.seconds_lo = now.seconds_lo + 5};
phc_set_alarm(&clock, &alarm);
phc_arm_alarm(&clock);
phc_enable_alarm_irq(&clock);

// Select TideLink PTP as servo source
phc_set_servo_source(&clock, 0);

Register definitions are auto-generated from SystemRDL (phc_apb_regs.rdl to phc_apb_regs.generated.h).

Integration

In the Ethernet-NanoSoClet

The PHC sits on the base tier's AHB interconnect, accessible to the host Cortex-M0 and via the external slave ports. It integrates with both the Ethernet subsystem and TideLink:

  • Source 0 (TideLink PTP): The TideLink PTP subsystem drives hw_capture_0 during autonomous SYNC/DELAY_REQ exchanges, reads the captured time, computes offset, and applies corrections via hw_adj_ns_incr_frac_0. This runs entirely in hardware --- no firmware involvement after initial configuration.
  • Source 1 (HA1588): The HA1588 hardware servo in the Ethernet subsystem periodically compares the HA1588 RTC time to the PHC and applies corrections. This disciplines the PHC from the Ethernet network's PTP grandmaster.
  • Ethernet captures: The PTP event detector in the Ethernet subsystem drives eth_rx_capture and eth_tx_capture on PTP frame detection, latching timestamps into the dedicated Ethernet capture banks.

Time Distribution Hierarchy

In a multi-chiplet system with Ethernet:

  Ethernet PTP Grandmaster
         │ (IEEE 1588 over MII)
         ▼
  HA1588 RTC (Ethernet-NanoSoClet)
         │ (HA1588 servo, Source 1)
         ▼
  PHC (Ethernet-NanoSoClet, Grandmaster)
         │ (TideLink PTP, Source 0)
         ▼
  PHC (Codec-NanoSoClet, Subordinate)
         │ (TideLink PTP, Source 0)
         ▼
  PHC (Sensor-NanoSoClet, Subordinate)

Each hop adds the TideLink PTP jitter budget (< 10 ns hardware, dependent on software servo loop performance). End-to-end accuracy from the Ethernet grandmaster to the leaf chiplet depends on the Ethernet PTP performance (typically sub-microsecond) plus the accumulated TideLink PTP jitter.

Verification

The PHC is verified through 99 cocotb tests across four hierarchy levels:

LevelTestbenchTestsFocus
1phc_clock_core21Timestamp counter, increment, rollover, PPS, captures
2phc_apb_regs46Register read/write, alarm, interrupts, defaults
3phc20Integration: servo mux, cross-module interaction
4PHC_AHB12AHB bridge, C driver, end-to-end register access

Additional verification:

  • Cross-IP tests (in ethernet-mac-ahb): 4 tests for end-to-end PTP sync from Ethernet MAC to PHC, 10 tests for HA1588 servo CDC handshake
  • Formal: X-propagation analysis on all modules
  • CDC: SpyGlass single-clock validation (PHC core is fully synchronous)
  • CI pipeline: Lint, regression, synthesis (DC + RTLA, TSMC 65nm), coverage merge, dashboard generation

Key Test Scenarios

  • Fractional carry accumulation: Verify that NS_INCR_FRAC=0x80000000 (0.5 ns) produces a carry every other cycle
  • Second rollover and PPS: Set nanoseconds near 999,999,999, verify rollover, PPS pulse width (exactly 1 cycle), and seconds increment
  • Atomic capture: Trigger capture mid-rollover and verify the snapshot is consistent (no partial update)
  • Servo source switching: Switch from source 0 to source 1 mid-operation and verify the clock continues without glitch
  • Last-writer-wins: Hardware servo writes NS_INCR_FRAC, then APB writes a different value --- verify APB wins and retains ownership
  • Alarm one-shot: Arm alarm with AUTO_DISARM, verify it fires once and automatically disarms

Performance

MetricValue
Timestamp precision~0.23 ps (32-bit sub-nanosecond)
Maximum time range~8.9 million years (48-bit seconds)
PPS accuracyExactly 1 clock cycle, aligned to ns=0
Capture latency0 cycles (combinational snapshot)
Servo switch latency0 cycles (combinational mux)
Register access latency2 APB cycles (via CMSDK bridge)
Alarm comparisonCombinational (same cycle as match)
Clock frequency range200--250 MHz (configurable increment)


 

Team

Comments

Some work is needed to understand how this IP is acting as the coordinating system clock in a reference for chiplet based systems within an Arm based ecosystem. 

The Arm Chiplet System Architecture defines two System Types, a Hub based model and a decentralised compute fabric based model within which time must be coordinated. 

In Chapter 6, the role of the System Counter  is defined. The system counter is specified by the Arm Architecture Reference Manual as a sub-component of the Generic Timer, a necessary component in an Arm system. It measures the passing of time in real-time, providing a uniform view of system time to all components in the Arm system that require it. This includes, in addition to the Generic Timer, trace timestamps and RAS telemetry.

In a system that is composed of chiplets the system counter is distributed over multiple chiplets. There is a primary system counter in a single chiplet that provides the source of the count, and secondary system counters in all other chiplets that require a view of system time. These secondary system counters are used for local distribution of the system count within their respective chiplet.

A system counter meets the requirements as specified by the Arm ARM Generic Timer and by the Arm Base System Architecture Clock and Timer Subsystem. 

There a separate Arm Architecture Reference Manuals for M class and A class implementations.

Looking towards the NanoSoC reference design this is an M class architecture. That architecture states:

The system timer, SysTick 

Generated by the SysTick timer that is an integral component of an Armv7-M processor. SysTick is permanently enabled. An Armv7-M implementation must include a system timer, SysTick, that provides a simple, 24-bit clear-on-write, decrementing, wrap-on-zero counter with a flexible control mechanism.

The timer is clocked by a reference clock. Whether the reference clock is the processor clock or an external clock source is implementation defined. If an implementation uses an external clock, it must document the relationship between the processor clock and the external reference. This is required for system timing calibration, taking account of metastability, clock skew and jitter.

Global timestamping 

When an implementation includes global timestamping, the ITM includes an external ITM timestamp interface, providing, 48-bit or 64-bit global timestamp count value and clock change signal that the system asserts if there is a change in the ratio between the global timestamp clock frequency and the processor clock frequency.

This project will need to consider the interaction between the various time elements within an Arm based Chiplet system.

 

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
Profile picture David Mapstone

SoC Labs Team at University of Southampton

Submitted on

Actions

Log-in to Join the Team