Skip to content

key2/amaranth-pcie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

amaranth-pcie

PCIe controller for Amaranth HDL — a complete PCIe endpoint stack with TLP processing, crossbar routing, DMA, and Wishbone/AXI bridges.

Overview

amaranth-pcie is a port of LitePCIe to the Amaranth HDL ecosystem. It provides a layered PCIe endpoint architecture:

┌─────────────────────────────────────────────────────────────────┐
│                        User Logic                               │
│   ┌──────────┐  ┌──────────────┐  ┌──────────────┐             │
│   │ Wishbone │  │   PCIeDMA    │  │  PCIeAXI     │             │
│   │  Bridge  │  │ (SG + R/W)   │  │   Slave      │             │
│   └────┬─────┘  └──────┬───────┘  └──────┬───────┘             │
│        │               │                 │        Frontend      │
├────────┴───────────────┴─────────────────┴──────────────────────┤
│                      PCIeCrossbar                               │
│   Slave ports (Host→FPGA)    Master ports (FPGA→Host)          │
│        │                          │               Core          │
├────────┴──────────────────────────┴─────────────────────────────┤
│              TLP Depacketizer / Packetizer                       │
│                    TLPController                    TLP          │
├─────────────────────────────────────────────────────────────────┤
│                    PCIe PHY                                      │
│   SimPCIePHY │ S7PCIEPHY (Xilinx) │ GW5ASTPCIePHY (Gowin) PHY  │
└─────────────────────────────────────────────────────────────────┘

Supported Platforms

Vendor Device PHY Class PCIe Gen Max Lanes Status
Xilinx 7-Series (Artix-7, Kintex-7, Virtex-7) S7PCIEPHY Gen1, Gen2 x8 Ported from LitePCIe
Gowin GW5AST-138 (FCPBGA676A) GW5ASTPCIePHY Gen2, Gen3 x4 New
Simulation SimPCIePHY For testing

Installation

# Clone the monorepo and install with PDM
cd amaranth-pcie/
pdm install

# Install with dev dependencies (pytest)
pdm install -G dev

Dependencies

Package Description
amaranth Amaranth HDL core
amaranth-soc SoC components (Wishbone, CSR)
amaranth-stream Stream infrastructure (FIFO, arbiter, CDC, etc.)
amaranth-lib Generic DMA engine and utilities

Quick Start

"""Minimal PCIe endpoint with Wishbone bridge and DMA."""
from amaranth import *
from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.wishbone import PCIeWishboneMaster
from amaranth_pcie.frontend.dma import PCIeDMA

class MyPCIeDesign(Elaboratable):
    def elaborate(self, platform):
        m = Module()

        # 1. Create PHY (simulation for testing)
        m.submodules.phy = phy = SimPCIePHY(data_width=64)

        # 2. Create endpoint (connects PHY ↔ TLP ↔ Crossbar)
        endpoint = PCIeEndpoint(phy)
        m.submodules.endpoint = endpoint

        # 3. Add Wishbone bridge for host CSR access
        m.submodules.wb = wb = PCIeWishboneMaster(endpoint)

        # 4. Add DMA with loopback for testing
        m.submodules.dma = dma = PCIeDMA(
            endpoint, with_loopback=True,
        )

        # 5. Connect user logic to DMA streams
        # dma.source = data from host (reader output)
        # dma.sink   = data to host (writer input)

        return m

Architecture

Layer Diagram

PHY Layer          TLP Layer              Core Layer           Frontend Layer
─────────          ─────────              ──────────           ──────────────
                   ┌──────────────┐
SimPCIePHY ──────► │TLPDepacketizer│──► ┌──────────────┐    ┌──────────────────┐
S7PCIEPHY          │              │    │              │    │ PCIeWishboneMaster│
GW5ASTPCIePHY      │              │    │              │    │                    │
                   │  (PHY→typed) │    │  PCIeCrossbar│◄──►│ PCIeWishboneSlave │
                   └──────────────┘    │              │    │ PCIeDMA           │
                   ┌──────────────┐    │  (routing)   │    │ PCIeAXISlave      │
              ◄──  │TLPPacketizer │◄── │              │    └──────────────────┘
                   │              │    └──────────────┘
                   │  (typed→PHY) │         ▲
                   └──────────────┘         │
                   ┌──────────────┐    ┌────┴─────┐
                   │TLPController │◄──►│PCIeEndpoint│
                   │ (tag mgmt)   │    │ (assembly)│
                   └──────────────┘    └──────────┘

PHY Feature Comparison

Feature SimPCIePHY S7PCIEPHY GW5ASTPCIePHY
Vendor Xilinx Gowin
Target device Simulation 7-Series GW5AST-138
PCIe Gen Gen1/Gen2 Gen2/Gen3
Max lanes x8 x4
Data widths 64, 128, 256 64, 128 64, 128, 256
Native data width user-selected 64 or 128 256 (IP-fixed)
Endianness "big" "big" "big"
Shared channel False True False
LTSSM width 6 (default) 6 5
Clock domain sync pcie (IP-generated) user-provided (sync)
CDC required No Yes (sync↔pcie) No
Width conversion No Optional (64↔128) Optional (64/128↔256)
BAR hit No No Yes (bar_hit in RX stream)
TX credits No No Yes (extended: header+data counts)
MSI interface Basic (valid/ready) Basic (valid/ready) Extended (ack/status/msinum)
Config access (DRP) No No Yes (read/write)
IP generation N/A External (Vivado) Automatic (gowin_pcie_gen.py)
LTSSM tracer No No Yes (built-in)

Shared vs. Separate Channel Modes

The endpoint supports two PHY channel modes:

Shared channel (default) — single sink/source on PHY:

PHY.source → Depacketizer → {req_source → Crossbar.phy_slave_sink,
                              cmp_source → Crossbar.phy_master_sink}
{Crossbar.phy_slave_source → Packetizer.cmp_sink,
 Crossbar.phy_master_source → Packetizer.req_sink} → PHY.sink

Separate channelsreq_source/cmp_source and req_sink/cmp_sink on PHY:

PHY.req_source → req_depacketizer → Crossbar.phy_slave_sink
PHY.cmp_source → cmp_depacketizer → Crossbar.phy_master_sink
Crossbar.phy_slave_source → cmp_packetizer → PHY.cmp_sink
Crossbar.phy_master_source → req_packetizer → PHY.req_sink
PHY shared_channel Description
SimPCIePHY False Independent TX/RX
S7PCIEPHY True Xilinx 7-Series shared channel
GW5ASTPCIePHY False Gowin GW5AST separate request/completion paths

API Reference — Common

amaranth_pcie/common.py

Size Helpers

Constant Value Description
KB 1024 Kilobyte
MB 1024² Megabyte
GB 1024³ Gigabyte

get_bar_mask(size)

Compute the BAR mask for a given BAR size in bytes (must be power of two).

from amaranth_pcie.common import get_bar_mask, MB
mask = get_bar_mask(1 * MB)  # → 0xFFF00000

Stream Signatures

All stream signatures use amaranth_stream.Signature with has_first_last=True for packet framing.

Function Payload Fields Description
phy_signature(data_width) dat, be PHY-level raw data + byte enables
request_signature(data_width, address_width=32) req_id, we, adr, len, tag, dat, channel, user_id Memory read/write request
completion_signature(data_width, address_width=32) req_id, cmp_id, adr, len, end, err, tag, dat, channel, user_id Completion
configuration_signature(data_width) req_id, we, bus_number, device_no, func, ext_reg, register_no, tag, dat, channel Configuration request
ptm_signature(data_width) request, response, requester_id, length, message_code, master_time, dat, channel PTM
msi_signature() dat(8) MSI interrupt (no framing)
dma_signature(data_width) payload(data_width) DMA data with first/last

Payload Layouts

Function Returns Key Fields
phy_layout(data_width) StructLayout dat(data_width), be(data_width//8)
request_layout(data_width, address_width) StructLayout req_id(16), we(1), adr(addr_w), len(10), tag(8), dat(data_w), channel(8), user_id(8)
completion_layout(data_width, address_width) StructLayout req_id(16), cmp_id(16), adr(addr_w), len(10), end(1), err(1), tag(8), dat(data_w), channel(8), user_id(8)
msi_layout() StructLayout dat(8)

API Reference — TLP Layer

amaranth_pcie/tlp/

TLP Constants

amaranth_pcie/tlp/common.py

Constant Value Description
max_payload_size 512 Maximum TLP payload (bytes)
max_request_size 512 Maximum TLP request (bytes)
tlp_common_header_length 16 Common header = 4 DWORDs = 16 bytes

Format/Type Dictionaries

from amaranth_pcie.tlp.common import fmt_type_dict

fmt_type_dict["mem_rd32"]  # 0b00_00000 — Memory Read 32-bit
fmt_type_dict["mem_rd64"]  # 0b01_00000 — Memory Read 64-bit
fmt_type_dict["mem_wr32"]  # 0b10_00000 — Memory Write 32-bit
fmt_type_dict["mem_wr64"]  # 0b11_00000 — Memory Write 64-bit
fmt_type_dict["cpld"]      # 0b10_01010 — Completion with Data
fmt_type_dict["cpl"]       # 0b00_01010 — Completion without Data

Completion Status

from amaranth_pcie.tlp.common import cpl_dict

cpl_dict["sc"]   # 0b000 — Successful Completion
cpl_dict["ur"]   # 0b001 — Unsupported Request
cpl_dict["crs"]  # 0b010 — Configuration Request Retry Status
cpl_dict["ca"]   # 0b011 — Completer Abort

Header Layouts

Both HeaderLayout (with byte offsets for wire-level packing) and StructLayout (for logical field access) are provided:

Layout Type Fields
tlp_request_header HeaderLayout fmt, type, tc, td, ep, attr, length, requester_id, tag, last_be, first_be, address(64)
tlp_completion_header HeaderLayout fmt, type, tc, td, ep, attr, length, completer_id, status, bcm, byte_count, requester_id, tag, lower_address
tlp_request_header_layout StructLayout Same fields as above, flat
tlp_completion_header_layout StructLayout Same fields as above, flat

TLP Stream Signatures

Function Description
tlp_raw_signature(data_width) Raw TLP: fmt(2) + header(128) + dat + be
tlp_request_signature(data_width) Request header fields + dat + be
tlp_completion_signature(data_width) Completion header fields + dat + be

dword_endianness_swap(src, dst, data_width, endianness, *, mode="dat", ndwords=None)

Generates combinational assignments for DWORD-level endianness swap. Used by packetizer/depacketizer for big-endian PHYs.


TLPDepacketizer

amaranth_pcie/tlp/depacketizer.py

Converts raw PHY streams into typed request/completion streams.

Parameters

Parameter Type Default Description
data_width int PHY data width (64 or 128 bits)
endianness str "big" or "little"
address_mask int 0 BAR0 address mask for request filtering
capabilities list[str] List of "REQUEST", "COMPLETION", "PTM", "CONFIGURATION"

Ports

Port Direction Description
sink In PHY stream input (phy_signature)
req_source Out Request stream output (if "REQUEST" in capabilities)
cmp_source Out Completion stream output (if "COMPLETION" in capabilities)

TLPPacketizer

amaranth_pcie/tlp/packetizer.py

Converts typed request/completion streams into raw PHY streams. Supports both 3DW and 4DW TLP headers.

Parameters

Parameter Type Default Description
data_width int PHY data width (64 or 128 bits)
endianness str "big" or "little"
address_width int 32 Address width (32 or 64)
capabilities list[str] List of "REQUEST", "COMPLETION", "PTM"

Ports

Port Direction Description
req_sink In Request stream input
cmp_sink In Completion stream input
source Out PHY stream output (phy_signature)

TLPController

amaranth_pcie/tlp/controller.py

Manages tag allocation for outstanding read requests and reorders completions to match request order.

Parameters

Parameter Type Default Description
data_width int Data width in bits
max_pending_requests int Maximum outstanding read requests
cmp_bufs_buffered bool True Use buffered FIFOs for completion buffers
address_width int 32 Address width

Ports

Port Direction Description
req_sink In Request stream from crossbar
req_source Out Request stream to PHY (with tag assigned)
cmp_sink In Completion stream from PHY
cmp_source Out Completion stream to crossbar (reordered)
ctrl_rst In(1) Controller reset

API Reference — Core Layer

amaranth_pcie/core/

Port Types

amaranth_pcie/core/common.py

Class Direction User sees Description
PCIeSlaveInternalPort Host → FPGA Internal crossbar port
PCIeSlavePort Host → FPGA sink=requests, source=completions User-facing slave port
PCIeMasterInternalPort FPGA → Host Internal crossbar port
PCIeMasterPort FPGA → Host sink=completions, source=requests User-facing master port

PCIeCrossbar

amaranth_pcie/core/crossbar.py

Central routing fabric connecting frontend ports to the TLP layer through arbitration and dispatch logic.

Parameters

Parameter Type Default Description
data_width int PHY data width in bits
address_width int 32 Address width (32 or 64)
max_pending_requests int 4 Max outstanding read requests for TLP controller
cmp_bufs_buffered bool True Buffered completion FIFOs
with_configuration bool False Support configuration TLPs

Methods

get_slave_port(address_decoder)

Register and return a new slave port with address-based routing.

# address_decoder: function(address_signal) → 1-bit match signal
port = crossbar.get_slave_port(lambda a: a[20:] == 0)
# port.sink = request stream (from host)
# port.source = completion stream (to host)
get_master_port(write_only=False, read_only=False)

Register and return a new master port with auto-assigned channel ID.

rd_port = crossbar.get_master_port(read_only=True)
wr_port = crossbar.get_master_port(write_only=True)
# port.source = request stream (to host)
# port.sink = completion stream (from host)

Routing Architecture

Slave path (Host → FPGA):
  PHY slave sink (requests) → Dispatcher (by address_decoder) → user slave sources
  user slave sinks (completions) → Arbiter → PHY slave source

Master path (FPGA → Host):
                  ┌─────────┐
    RD ports ────►│ Arb/Disp├──► TLPController ──┐
    RW ports ────►│         │                     │
                  └─────────┘                     ├──► Arb/Disp ──► PHY master
                  ┌─────────┐                     │
    WR ports ────►│ Arb/Disp├─────────────────────┘
                  └─────────┘

Write-only ports bypass the TLPController to avoid blocking writes when reads are throttled.


PCIeEndpoint

amaranth_pcie/core/endpoint.py

Top-level assembly connecting PHY ↔ TLP ↔ Crossbar. This is the main entry point for building a PCIe design.

Parameters

Parameter Type Default Description
phy object or dict PHY instance or config dict
max_pending_requests int 4 Max outstanding read requests
address_width int 32 Address width (32 or 64)
endianness str "big" Byte order within each DWORD
cmp_bufs_buffered bool True Buffered completion FIFOs
with_ptm bool False Support PTM TLPs
with_configuration bool False Support configuration TLPs

Key Attributes

Attribute Type Description
crossbar PCIeCrossbar The internal crossbar instance
data_width int PHY data width
bar0_mask int BAR0 address mask
get_slave_port() method Delegates to crossbar.get_slave_port()
get_master_port() method Delegates to crossbar.get_master_port()

Example

from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint

# With a real PHY
phy = SimPCIePHY(data_width=64)
endpoint = PCIeEndpoint(phy)

# With a config dict (for testing without PHY)
endpoint = PCIeEndpoint({
    "data_width": 64,
    "bar0_mask": 0xFFF00000,
    "id": 0x0001,
    "max_request_size": 512,
    "max_payload_size": 128,
})

PCIeMSI / PCIeMSIMultiVector / PCIeMSIX

amaranth_pcie/core/msi.py

MSI interrupt controllers.

PCIeMSI — Single-vector, edge-triggered

Parameter Type Default Description
width int 32 Number of IRQ sources
Port Direction Width Description
irqs In width One bit per IRQ source
source Out msi_signature() MSI output stream
enable In width Per-IRQ enable mask
clear In width Per-IRQ clear mask
clear_strobe In 1 Strobe for clear
vector Out width Current pending IRQ vector

PCIeMSIMultiVector — Multi-vector, priority-encoded

Same ports as PCIeMSI but source.payload.dat carries the IRQ number (lower index = higher priority). No clear/clear_strobe — cleared automatically on MSI acceptance.

PCIeMSIX — MSI-X via TLP writes

Port Direction Width Description
irqs In width One bit per IRQ source
enable In width Per-IRQ enable mask
pba Out width Pending Bit Array

Exposes msix_wr_valid, msix_wr_ready, msix_wr_adr, msix_wr_dat signals for external TLP write connection.


API Reference — Frontend Layer

amaranth_pcie/frontend/

PCIeDMA

amaranth_pcie/frontend/dma.py

Full scatter-gather bi-directional DMA over PCIe with optional plugins.

Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint instance
data_width int or None None User data width (defaults to PHY data width)
table_depth int 256 Scatter-gather table depth
address_width int 32 Address width
with_loopback bool False Enable loopback plugin
with_synchronizer bool False Enable synchronizer plugin
with_buffering bool False Enable buffering plugin
buffering_depth int 2048 Depth for buffering FIFOs (bytes)
writer_buffering_depth int or None None Override writer buffering depth
reader_buffering_depth int or None None Override reader buffering depth
with_reader bool True Enable DMA reader
with_writer bool True Enable DMA writer

Key Attributes

Attribute Type Description
source stream Data from host (reader output)
sink stream Data to host (writer input)
irq Signal Combined IRQ from reader + writer
reader PCIeDMAReader Reader sub-component (if enabled)
writer PCIeDMAWriter Writer sub-component (if enabled)
loopback DMALoopback Loopback plugin (if enabled)
synchronizer DMASynchronizer Synchronizer plugin (if enabled)
buffering DMABuffering Buffering plugin (if enabled)

DMA Architecture

Reader path (Host → FPGA):
  ScatterGather → Splitter → DMAReader ←→ ReaderAdapter ←→ CrossbarMasterPort(read_only)
                                 ↓
                            data_source → [plugins] → user source

Writer path (FPGA → Host):
  user sink → [plugins] → data_sink
                              ↓
  ScatterGather → Splitter → DMAWriter ←→ WriterAdapter ←→ CrossbarMasterPort(write_only)

Example

from amaranth import *
from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.dma import PCIeDMA

class DMADesign(Elaboratable):
    def elaborate(self, platform):
        m = Module()

        m.submodules.phy = phy = SimPCIePHY(data_width=64)
        endpoint = PCIeEndpoint(phy)
        m.submodules.endpoint = endpoint

        m.submodules.dma = dma = PCIeDMA(
            endpoint,
            with_loopback=True,
            with_buffering=True,
            buffering_depth=4096,
        )

        # User logic: consume data from host
        with m.If(dma.source.valid & dma.source.ready):
            # Process dma.source.payload (data_width bits)
            pass

        return m

PCIeDMAReader / PCIeDMAWriter

amaranth_pcie/frontend/dma.py

Individual DMA reader/writer with scatter-gather + splitter + bus adapter. Used internally by PCIeDMA but can be instantiated separately.

PCIeDMAReader Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint
port PCIeMasterPort Crossbar master port (read_only)
table_depth int 256 Scatter-gather table depth
address_width int 32 Address width
data_width int or None None User data width

PCIeDMAWriter Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint
port PCIeMasterPort Crossbar master port (write_only)
table_depth int 256 Scatter-gather table depth
address_width int 32 Address width
data_width int or None None User data width

Sub-components (accessible for CSR mapping)

Attribute Type Description
table DMAScatterGather Descriptor table
splitter DMADescriptorSplitter Descriptor splitter
reader/writer DMAReader/DMAWriter Generic DMA core
adapter PCIeDMAReaderAdapter/PCIeDMAWriterAdapter PCIe bus adapter

PCIeWishboneMaster

amaranth_pcie/frontend/wishbone.py

Host accesses FPGA's Wishbone bus via PCIe BAR. Gets a slave port from the crossbar.

Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint
address_decoder callable or None None Address match function (default: match all)
base_address int 0x00000000 Base address offset for Wishbone
qword_aligned bool False Handle 64-bit aligned access
wb_addr_width int 32 Wishbone address width
wb_data_width int 32 Wishbone data width

Key Attributes

Attribute Type Description
port PCIeSlavePort Crossbar slave port
wb Wishbone interface Wishbone master bus

FSM

IDLE → DO-WRITE → IDLE           (host writes to FPGA register)
IDLE → DO-READ → ISSUE-READ-COMPLETION → IDLE  (host reads FPGA register)

Example

from amaranth import *
from amaranth_soc.wishbone.sram import WishboneSRAM

class WishboneDesign(Elaboratable):
    def elaborate(self, platform):
        m = Module()

        # ... create phy, endpoint ...

        # Wishbone bridge — host can read/write FPGA registers
        m.submodules.wb_bridge = wb_bridge = PCIeWishboneMaster(
            endpoint,
            base_address=0x00000000,
        )

        # Connect Wishbone bus to an SRAM
        m.submodules.sram = sram = WishboneSRAM(size=4096)
        # Connect wb_bridge.wb to sram.wb ...

        return m

PCIeWishboneSlave

amaranth_pcie/frontend/wishbone.py

FPGA accesses Host memory via Wishbone interface. Gets a master port from the crossbar.

Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint
wb_addr_width int 32 Wishbone address width
wb_data_width int 32 Wishbone data width
qword_aligned bool False Handle 64-bit aligned access

Key Attributes

Attribute Type Description
port PCIeMasterPort Crossbar master port
wb Wishbone interface Wishbone slave bus (with err feature)

FSM

IDLE → ISSUE-WRITE → IDLE                          (FPGA writes to host memory)
IDLE → ISSUE-READ → RECEIVE-READ-COMPLETION → IDLE (FPGA reads from host memory)

Includes a WaitTimer(2**16) for timeout/error handling.


PCIeAXISlave

amaranth_pcie/frontend/axi.py

AXI4 slave interface for PCIe DMA. Converts AXI4 read/write transactions into PCIe DMA operations without scatter-gather tables.

Parameters

Parameter Type Default Description
endpoint PCIeEndpoint PCIe endpoint
data_width int 32 AXI data width
address_width int 32 AXI address width
id_width int 1 AXI ID width

AXI4 Signal Groups

Channel Signals
Write Address (AW) aw_valid, aw_ready, aw_addr, aw_len, aw_id
Write Data (W) w_valid, w_ready, w_data, w_last
Write Response (B) b_valid, b_ready, b_id, b_resp
Read Address (AR) ar_valid, ar_ready, ar_addr, ar_len, ar_id
Read Data (R) r_valid, r_ready, r_data, r_last, r_id, r_resp

API Reference — PHY Layer

amaranth_pcie/phy/

SimPCIePHY

amaranth_pcie/phy/sim.py

Simulation-only PCIe PHY. Implements the same interface as a real PHY but without vendor IP. TX data is optionally looped back to RX through a FIFO.

Parameters

Parameter Type Default Description
data_width int 64 Data bus width in bits
bar0_size int 1 * MB BAR0 region size in bytes
max_request_size int 512 Simulated max request size
max_payload_size int 128 Simulated max payload size
with_loopback bool True Loop TX back to RX

Attributes

Attribute Type Description
data_width int Data bus width
endianness str Always "big"
id Signal(16) PCIe device ID (init=0x0001)
bar0_size int BAR0 size
bar0_mask int BAR0 mask
max_request_size Signal(16) Max request size
max_payload_size Signal(16) Max payload size
sink stream TX stream (core → "host")
source stream RX stream ("host" → core)
msi stream MSI interrupt stream
link_up Signal Link-up status (always 1)

S7PCIEPHY

amaranth_pcie/phy/s7pciephy.py

Xilinx 7-Series PCIe PHY wrapper. Wraps the Xilinx PCIe hard IP block with proper datapath handling (CDC, width conversion, endianness).

Parameters

Parameter Type Default Description
nlanes int 1 Number of PCIe lanes (1, 2, 4, or 8)
data_width int 64 Core-side data width (64 or 128)
pcie_data_width int or None None PHY-side data width (defaults to data_width)
bar0_size int 1 * MB BAR0 size
clock_domain str "sync" Core clock domain

Attributes

Attribute Type Description
data_width int Core-side data width
endianness str Always "big"
shared_channel bool Always True
config PCIeConfig Standardized config interface
link_up Signal Link-up status
ltssm_state Signal(6) LTSSM state
sink stream TX stream (core → PCIe)
source stream RX stream (PCIe → core)
msi stream MSI interrupt stream

Clock Architecture

The Xilinx 7-Series PCIe IP generates its own clock (user_clk_out), which drives the pcie clock domain. CDC (clock domain crossing) is required between the pcie domain and the user's sync domain. The PHYTXDatapath and PHYRXDatapath handle this automatically.


GW5ASTPCIePHY

amaranth_pcie/phy/gowin.py

Gowin GW5AST-138 PCIe PHY wrapper. Wraps the Gowin SerDes_Top hard IP block for use with the amaranth-pcie stack.

Overview

The GW5ASTPCIePHY wraps the Gowin GW5AST-138 PCIe hard IP for use with the amaranth-pcie stack. It supports:

  • Lane widths: X1, X2, X4
  • Link speeds: Gen2 (5.0 GT/s), Gen3 (8.0 GT/s)
  • TLP clock frequencies: 100 MHz, 125 MHz, 150 MHz
  • Up to 6 BARs with configurable sizes (16B to 1MB+), 32-bit or 64-bit, prefetchable
  • MSI interrupts (optional, with extended ack/status/msinum interface)
  • Max payload sizes: 128B to 4096B
  • Configurable Vendor/Device/Class IDs
  • DRP config space access (runtime read/write of PCIe configuration registers)
  • TX flow control credits (extended format with header/data counts)
  • Hardware BAR decode (bar_hit field in RX stream)
  • Built-in LTSSM tracer for link training debug

Constructor Parameters

Parameter Type Default Description
data_width int 64 User-facing data width (64, 128, or 256). The Gowin IP always uses 256-bit internally; width conversion is inserted automatically when data_width < 256.
gowin_config GowinPCIeConfig IP generation configuration (device, lanes, BARs, vendor/device ID, etc.). See GowinPCIeConfig below.
gowin_path str Path to the Gowin IDE installation directory (e.g. "/opt/gowin/IDE"). Required for IP generation.
ip_output_dir str or None None Optional override for IP output directory. If None, uses a cache directory under ~/.cache/amaranth-pcie/gowin/<config_hash>.
clock_domain str "sync" Clock domain name for the TLP clock. The Gowin IP uses a user-provided TLP clock (unlike Xilinx which generates its own).

Attributes / Interfaces

Attribute Type Description
data_width int User-facing data width
endianness str Always "big"
shared_channel bool Always False (separate request/completion paths)
ltssm_width int Always 5
config PCIeConfig Standardized config interface (bus/device/function numbers, max payload/request sizes)
bar0_size int BAR0 region size in bytes (from gowin_config.bars[0])
bar0_mask int BAR0 address mask
link_up Signal Link-up status
ltssm Signal(5) 5-bit LTSSM state
source stream RX stream (PCIe → core), with bar_hit field
sink stream TX stream (core → PCIe)
msi stream Basic MSI stream (valid/ready/dat)
msi_ext PCIeMSIInterface Extended MSI interface (ack/status/msinum)
credits PCIeCreditInterface TX credit availability (extended mode with counts)
config_access PCIeConfigAccess DRP configuration space read/write port
ltssm_tracer LTSSMTracer Built-in LTSSM state transition tracer
perst_n Signal PERST# input (active low). Connect to board's PERST# pad, or leave unconnected for auto-start (tied high).
nlanes int Number of PCIe lanes
msi_enable bool Whether MSI is enabled in the IP configuration

Clock Architecture

Unlike Xilinx PHYs which generate their own clock domain, the Gowin GW5AST PCIe IP uses a user-provided TLP clock. The clock is supplied via the pcie_tl_clk_i input of the SerDes_Top instance, driven by ClockSignal(clock_domain).

Key implication: No CDC is needed. Since the IP runs in the user's clock domain (typically sync), there is no clock domain crossing between the PHY and the rest of the stack. This simplifies the design and reduces latency compared to Xilinx PHYs.

Xilinx:  IP generates pcie_clk → CDC needed between pcie and sync domains
Gowin:   User provides sync_clk → IP runs in sync domain → no CDC needed

When data_width < 256, the PHYTXDatapath and PHYRXDatapath are used for width conversion only (no CDC), since the clock domain is the same on both sides.

Reset Sequence

The PHY implements a two-phase reset sequence controlled by the perst_n signal:

  1. PERST# debounce — A 26-bit counter at 100 MHz (≈670 ms) debounces the PERST# input. The counter resets to zero whenever PERST# is asserted (low).

  2. Delayed start — After PERST# deasserts and the debounce counter saturates, a 20-bit delay counter (≈10 ms) runs before releasing the IP's pcie_rstn signal.

PERST# asserted (low)  →  debounce_cnt = 0, pcie_rstn = 0
PERST# deasserted (high) →  debounce_cnt counts up
debounce_cnt saturates   →  delay_cnt counts up
delay_cnt saturates      →  pcie_rstn = 1 (IP starts link training)

If perst_n is left unconnected, it defaults to high (auto-start mode).

Width Conversion (256 → 64/128)

The Gowin PCIe IP always uses a 256-bit internal TLP bus. When the user requests a narrower data_width (64 or 128), the PHY automatically inserts width conversion datapaths:

TX path:  user sink (64/128-bit) → PHYTXDatapath → StrideConverter → adapter sink (256-bit) → IP
RX path:  IP → adapter source (256-bit) → PHYRXDatapath → StrideConverter → user source (64/128-bit)

The bar_hit signal is not carried through the width converter. Instead, it is latched from the adapter on SOP (start of packet) and held for the duration of the packet.

When data_width == 256, the adapter connects directly to the user-facing streams with no conversion.

Credit Format

The Gowin IP provides TX credit information via three 32-bit registers (creditsp, creditsnp, creditscpl):

Bit Range Field Description
[31] available Credits available (boolean)
[15:8] header_count Number of header credits
[11:0] data_count Number of data credits

These are decoded into the PCIeCreditInterface (extended mode):

Credit Channel Available Signal Header Count Data Count
Posted credits.posted_header_available credits.posted_header_count credits.posted_data_count
Non-Posted credits.non_posted_header_available credits.non_posted_header_count credits.non_posted_data_count
Completion credits.completion_header_available credits.completion_header_count credits.completion_data_count

The TLPPacketizer automatically gates TX based on credit availability when phy.has_credits() returns True.

MSI Interface

When MSI is enabled in the IP configuration (gowin_config.msi_enable=True), the PHY provides two MSI interfaces:

Basic MSI (phy.msi) — Compatible with the standard amaranth-pcie MSI flow:

phy.msi.valid   # Assert to request MSI
phy.msi.ready   # Asserted when MSI is accepted (driven by IP's ack)
phy.msi.payload.dat  # 8-bit vector number

Extended MSI (phy.msi_ext) — Gowin-specific with additional status:

phy.msi_ext.valid    # MSI request
phy.msi_ext.ack      # MSI acknowledge from IP
phy.msi_ext.status   # Signal(3) — MSI status
phy.msi_ext.msinum   # Signal(5) — MSI vector number (up to 32 vectors)

When MSI is disabled, phy.msi.ready is tied to 0.

DRP / Config Access

The PHY exposes a PCIeConfigAccess port for runtime read/write access to the PCIe configuration space via the Gowin DRP (Dynamic Reconfiguration Port):

# Read a config register
m.d.comb += [
    phy.config_access.read_en.eq(1),
    phy.config_access.read_addr.eq(0x000),  # 12-bit DWORD-aligned address
]
# phy.config_access.read_data is valid when phy.config_access.read_valid is asserted

# Write a config register
m.d.comb += [
    phy.config_access.write_en.eq(1),
    phy.config_access.write_addr.eq(0x004),
    phy.config_access.write_data.eq(0xDEADBEEF),
    phy.config_access.write_be.eq(0xF),  # 4-bit byte enables
]

Check capability at construction time: phy.has_config_access() returns True.

BAR Configuration Examples

BARs are configured via the BARConfig dataclass in the GowinPCIeConfig:

from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig as GowinPCIeConfig, BARConfig

# Example 1: Single 1MB 64-bit BAR
config = GowinPCIeConfig(
    bars=[
        BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True),   # BAR0: 1MB, 64-bit
        BARConfig(),                                                      # BAR1: consumed by BAR0
        BARConfig(), BARConfig(), BARConfig(), BARConfig(),              # BAR2-5: disabled
    ],
)

# Example 2: Three 32-bit BARs
config = GowinPCIeConfig(
    bars=[
        BARConfig(enabled=True, size_bytes=1024),                        # BAR0: 1KB
        BARConfig(enabled=True, size_bytes=2048),                        # BAR1: 2KB
        BARConfig(enabled=True, size_bytes=4096, is_prefetchable=True),  # BAR2: 4KB, prefetchable
        BARConfig(), BARConfig(), BARConfig(),                           # BAR3-5: disabled
    ],
)

# Example 3: Mixed 64-bit and 32-bit BARs
config = GowinPCIeConfig(
    bars=[
        BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True),   # BAR0-1: 1MB, 64-bit
        BARConfig(),                                                      # BAR1: consumed
        BARConfig(enabled=True, size_bytes=64*1024),                     # BAR2: 64KB, 32-bit
        BARConfig(), BARConfig(), BARConfig(),                           # BAR3-5: disabled
    ],
)

BAR size constraints: Must be a power of 2, minimum 16 bytes.

Integration with PCIeEndpoint

from amaranth import *
from amaranth_pcie.phy.gowin import GW5ASTPCIePHY
from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig as GowinPCIeConfig, BARConfig
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.wishbone import PCIeWishboneMaster
from amaranth_pcie.frontend.dma import PCIeDMA

class GowinPCIeDesign(Elaboratable):
    def elaborate(self, platform):
        m = Module()

        # 1. Configure the Gowin PCIe IP
        gowin_config = GowinPCIeConfig(
            lane_width="X4",
            gen="gen3",
            vendor_id="1234",
            device_id="5678",
            class_code="0580",       # Memory controller
            msi_enable=True,
            max_payload=1024,
            tl_clk_freq=100,
            bars=[
                BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True),
                BARConfig(),
                BARConfig(enabled=True, size_bytes=64*1024),
                BARConfig(), BARConfig(), BARConfig(),
            ],
        )

        # 2. Create the PHY
        m.submodules.phy = phy = GW5ASTPCIePHY(
            data_width=64,
            gowin_config=gowin_config,
            gowin_path="/opt/gowin/IDE",
            clock_domain="sync",
        )

        # 3. Connect PERST# (optional — defaults to auto-start)
        # phy.perst_n can be connected to a board PERST# pad

        # 4. Create endpoint
        endpoint = PCIeEndpoint(phy)
        m.submodules.endpoint = endpoint

        # 5. Add Wishbone bridge for host CSR access
        m.submodules.wb = wb = PCIeWishboneMaster(endpoint)

        # 6. Add DMA
        m.submodules.dma = dma = PCIeDMA(
            endpoint,
            with_loopback=True,
            with_buffering=True,
            buffering_depth=4096,
        )

        # 7. Connect MSI interrupt from DMA
        m.d.comb += phy.msi.valid.eq(dma.irq)

        return m

Key Classes

Class Module Description
GW5ASTPCIePHY gowin.py Main PHY class (inherits PCIePHY)
GowinTLPAdapter gowin.py Bridge between Gowin SOP/EOP/valid[7:0] and amaranth stream interface
GowinPCIeConfig gowin_pcie_gen.py IP generation configuration dataclass
BARConfig gowin_pcie_gen.py Per-BAR configuration dataclass
GowinPCIeGenerator gowin_pcie_gen.py IP generation pipeline orchestrator

GowinTLPAdapter

amaranth_pcie/phy/gowin.py

Bridge between the Gowin PCIe IP's 256-bit TLP interface (SOP/EOP/valid[7:0]/data[255:0]) and the amaranth-pcie stream interface (valid/ready/first/last with dat/be payload).

DWORD Ordering

The Gowin IP uses big-endian DWORD ordering: DWORD7 at bits [255:224] is the first DWORD on the wire. The adapter reverses the DWORD order so that the first wire DWORD appears at bits [31:0], matching the convention used by the rest of the stack.

RX Path (IP → Stream)

IP Signal Width Stream Mapping
rx_sop 1 source.first
rx_eop 1 source.last
rx_data 256 source.payload.dat (DWORD-swapped)
rx_valid 8 source.payload.be (each bit → 4 byte enables, reversed)
rx_bardec 6 source.payload.bar_hit
rx_wait 1 ~source.ready (backpressure)

TX Path (Stream → IP)

Stream Signal IP Mapping
sink.first & sink.valid tx_sop
sink.last & sink.valid tx_eop
sink.payload.dat tx_data (DWORD-swapped)
sink.payload.be tx_valid (4 bytes → 1 bit, reversed)
~tx_wait sink.ready (backpressure)

GowinPCIeConfig

amaranth_pcie/phy/gowin_pcie_gen.py

Dataclass containing all parameters needed to generate a Gowin PCIe IP core.

Fields

Field Type Default Description
gowin_dir str "" Path to Gowin IDE installation
output_dir str "" Output directory for generated files
device str "GW5AST-138" Target device
device_version str "C" Device version
package str "FCPBGA676A" Package type
part_number str "GW5AST-LV138FPG676AC1/I0" Full part number
device_id_short str "gw5ast138c-003" Short device ID for project files
vendor_id str "22C2" PCIe Vendor ID (hex)
device_id str "1100" PCIe Device ID (hex)
class_code str "0580" PCIe Class Code (hex, e.g. "0580" = Memory controller)
revision str "00" PCIe Revision ID (hex)
lane_width str "X4" Lane width: "X1", "X2", or "X4"
gen str "gen3" PCIe generation: "gen2" or "gen3"
ref_clock str "100MHz" Reference clock frequency
ref_clock_source str "Q0 REFCLK0" Reference clock source
bars list[BARConfig] 3 enabled (1KB, 2KB, 2KB) List of 6 BAR configurations
msi_enable bool True Enable MSI capability
module_name str "SerDes_Top" Top-level module name
file_name str "serdes" Base file name for generated files
pcie_module_name str "PCIE_Controller_Top" PCIe controller module name
max_payload int 1024 Max payload size in bytes
tl_clk_freq int 100 TLP clock frequency in MHz

Derived Properties

Property Returns Description
series str Device series (e.g. "GW5AST")
device_name_with_version str Device with version (e.g. "GW5AST-138C")
lane_count int Number of lanes (1, 2, or 4)
base_class str First two hex digits of class_code
sub_class str Last two hex digits of class_code
is_gen3 bool Whether gen == "gen3"

BARConfig

amaranth_pcie/phy/gowin_pcie_gen.py

Dataclass for a single PCIe Base Address Register configuration.

Field Type Default Description
enabled bool True Whether this BAR is enabled
size_bytes int 1024 BAR size in bytes (must be power of 2, minimum 16)
is_64bit bool False Whether this is a 64-bit BAR (consumes the next BAR slot)
is_prefetchable bool False Whether this BAR is prefetchable

GowinPCIeGenerator — IP Generation Tool

amaranth_pcie/phy/gowin_pcie_gen.py

Programmatically generates Gowin PCIe IP cores without the GUI, replicating the exact 4-step pipeline that the Gowin IDE performs.

Generation Pipeline

Step Tool Description
1 GowinSynthesis Synthesize PCIe controller from encrypted Verilog
2 GowinSynthesis Synthesize UPAR arbiter from Verilog sources
3 GowinModGen Generate SerDes top-level wrapper (serdes.v)
4 serdes_toml_to_csr Convert TOML config to CSR register writes (serdes.csr)

Generated Files

File Description
src/serdes/serdes.v SerDes top-level wrapper (Verilog)
src/serdes/serdes.csr CSR register configuration
src/serdes/pcie_controller/pcie_controller.v Synthesized PCIe controller netlist
src/serdes/upar_arbiter/upar_arbiter.v Synthesized UPAR arbiter netlist
src/serdes/serdes.ipc IP configuration file
src/serdes/serdes.mod Module definition file
src/serdes/serdes_tmp.toml SerDes TOML configuration

Caching

Generated files are cached based on a SHA-256 hash of the configuration parameters. The cache directory is ~/.cache/amaranth-pcie/gowin/<hash>. Regeneration only occurs when parameters change.

Programmatic Usage

from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig, GowinPCIeGenerator

config = PCIeConfig(
    gowin_dir="/opt/gowin/IDE",
    output_dir="./my_pcie_project",
    vendor_id="AABB",
    device_id="1234",
    lane_width="X4",
    gen="gen3",
    msi_enable=True,
    max_payload=1024,
    tl_clk_freq=100,
)
gen = GowinPCIeGenerator(config)
gen.generate()

CLI Usage

The generator can also be run as a standalone command-line tool:

python -m amaranth_pcie.phy.gowin_pcie_gen \
    --gowin-dir /opt/gowin/IDE \
    --output-dir ./output \
    --vendor-id AABB \
    --device-id 1234 \
    --lanes X4 \
    --gen gen3 \
    --bars 1048576,0,65536 \
    --max-payload 1024 \
    --tl-clk-freq 100 \
    -v
CLI Option Default Description
--gowin-dir (required) Path to Gowin IDE installation
--output-dir (required) Output directory
--device GW5AST-138 Target device
--device-version C Device version
--package FCPBGA676A Package
--part-number GW5AST-LV138FPG676AC1/I0 Full part number
--vendor-id 22C2 PCIe Vendor ID (hex)
--device-id 1100 PCIe Device ID (hex)
--class-code 0580 PCIe Class Code (hex)
--revision 00 Revision ID (hex)
--lanes X4 Lane width (X1, X2, X4)
--gen gen3 PCIe generation (gen2, gen3)
--ref-clock 100MHz Reference clock frequency
--bars 1024,2048,2048 Comma-separated BAR sizes (0=disabled)
--no-msi Disable MSI capability
--max-payload 1024 Max payload size
--tl-clk-freq 100 TLP clock frequency (MHz)
-v / --verbose Enable verbose logging

PHYTXDatapath / PHYRXDatapath

amaranth_pcie/phy/common.py

Common datapath modules for CDC, width conversion, and pipeline buffering between core and PHY clock domains.

PHYTXDatapath — Core → PCIe TX

Pipeline: sink → PipeValid → StreamCDC(sync→pcie) → StrideConverter → PipeReady → source

Parameter Type Description
phy_data_width int PHY-side data width
core_data_width int Core-side data width
clock_domain str Core clock domain (default "sync")

PHYRXDatapath — PCIe → Core RX

Pipeline: sink → [Aligner] → PipeReady → StrideConverter → CDC(pcie→sync) → PipeValid → source

Parameter Type Description
phy_data_width int PHY-side data width
core_data_width int Core-side data width
clock_domain str Core clock domain (default "sync")
with_aligner bool Include 128-bit aligner

LTSSMTracer

amaranth_pcie/phy/common.py

Debug module tracking LTSSM state transitions via a synchronous FIFO. Records state changes for software to trace link training.

Parameters

Parameter Type Description
ltssm Signal or None LTSSM state signal (in sync domain)
ltssm_width int Width of the LTSSM state signal (default 6; Gowin uses 5)

Ports

Port Direction Width Description
history_new Out ltssm_width New LTSSM state
history_old Out ltssm_width Previous LTSSM state
history_ovfl Out 1 Overflow flag
history_valid Out 1 FIFO output valid
history_re In 1 Read-enable to pop entry

Multi-Vendor PHY Support

amaranth_pcie/phy/common.py

The PHY layer provides a set of abstract interfaces and base classes that enable writing vendor-independent PCIe endpoint logic. These were introduced as 12 improvement proposals (A1–H1) to support PHYs beyond Xilinx 7-Series.

PCIePHY Base Class (A1)

All PHY implementations inherit from PCIePHY, which defines the formal contract:

from amaranth_pcie.phy.common import PCIePHY

class MyVendorPHY(PCIePHY):
    endianness = "big"       # or "little" or "native"
    shared_channel = False   # True if TX/RX share one stream pair

    def __init__(self, ...):
        self.data_width = 64
        self.bar0_size = 1 * MB
        self.config = PCIeConfig()  # standardized config interface
        self.link_up = Signal()
        # Create sink/source/msi streams ...

    def elaborate(self, platform):
        ...

Capability query helpers avoid hasattr() checks:

Method Returns True when…
phy.has_bar_hit() PHY provides hardware BAR decode (bar_hit attribute)
phy.has_credits() PHY provides credit-based flow control (credits attribute)
phy.has_config_access() PHY provides config space read/write (config_access attribute)

Capability support by PHY:

Capability SimPCIePHY S7PCIEPHY GW5ASTPCIePHY
has_bar_hit()
has_credits()
has_config_access()

Legacy property accessors (phy.id, phy.max_request_size, phy.max_payload_size) delegate to phy.config for backward compatibility.

PCIeConfig — Standardized Config Interface (C1)

PCIeConfig provides runtime-negotiated configuration values as Amaranth Signals:

Attribute Width Description
bus_number 8 PCI bus number
device_number 5 PCI device number
function_number 3 PCI function number
max_payload_size 16 Negotiated max payload size (bytes)
max_request_size 16 Negotiated max request size (bytes)
command 16 PCI command register
status 16 PCI status register
id 16 Composed BDF: Cat(function, device, bus)

Signal vs. int convention (C2): phy.config.max_request_size is a Signal for hardware use. phy.max_request_size_bytes is a Python int for construction-time arithmetic (FIFO depths, etc.).

PCIeCreditInterface — Credit-Based Flow Control (D1)

PCIeCreditInterface tracks TX credit availability:

from amaranth_pcie.phy.common import PCIeCreditInterface

# Basic mode — boolean availability flags
credits = PCIeCreditInterface()
credits.posted_header_available     # Signal
credits.completion_data_available   # Signal
# ... 6 channels total

# Extended mode — adds per-channel credit counts
credits = PCIeCreditInterface(extended=True)
credits.posted_header_count         # Signal(8)
credits.posted_data_count           # Signal(12)

When a PHY sets self.credits = PCIeCreditInterface(), the TLPPacketizer automatically gates TX based on credit availability.

PHYs with credit support: GW5ASTPCIePHY (extended mode with header/data counts).

PCIeMSIInterface — MSI PHY Interface (F1)

PCIeMSIInterface abstracts the MSI handshake:

from amaranth_pcie.phy.common import PCIeMSIInterface

# Basic mode (Xilinx-compatible)
msi = PCIeMSIInterface(extended=False)
msi.valid   # MSI request valid
msi.ready   # MSI request accepted
msi.dat     # Signal(8) — vector number

# Extended mode (Gowin-compatible)
msi = PCIeMSIInterface(extended=True)
msi.ack     # MSI acknowledge
msi.status  # Signal(3) — MSI status
msi.msinum  # Signal(5) — MSI vector number

PHYs with extended MSI: GW5ASTPCIePHY (when msi_enable=True).

PCIeConfigAccess — Runtime Config Space Access (H1)

PCIeConfigAccess provides register-level read/write access to the PCIe configuration space:

Signal Width Description
read_en 1 Read enable
read_addr 12 Read address (DWORD-aligned)
read_data 32 Read data
read_valid 1 Read data valid
write_en 1 Write enable
write_addr 12 Write address
write_data 32 Write data
write_be 4 Write byte enables
write_done 1 Write completion

PHYs with config access: GW5ASTPCIePHY (via Gowin DRP interface).

PCIeResource — Board Resource Descriptor (G1)

PCIeResource captures physical PCIe slot parameters:

from amaranth_pcie.phy.common import PCIeResource

res = PCIeResource(0,
    lanes=4,
    refclk_freq=100e6,
    refclk=platform.request("pcie_refclk"),
    perst=platform.request("pcie_perst"),
    extras={"speed": "gen2"},
)

256-bit Data Width Support (B1)

All stream signatures, PHY models, and TLP modules support 256-bit data widths:

phy = SimPCIePHY(data_width=256)
depkt = TLPDepacketizer(data_width=256, endianness="big")
pkt = TLPPacketizer(data_width=256, endianness="big")

The Gowin GW5AST PCIe IP natively uses 256-bit data, making it the primary use case for this width.

"native" Endianness (B2)

PHYs that deliver data in the host's native byte order can declare endianness = "native", which skips the DWORD byte-swap in the packetizer/depacketizer:

class NativeOrderPHY(PCIePHY):
    endianness = "native"  # no swap needed

Hardware BAR Decode Passthrough (E1)

PHYs that provide hardware BAR decode can include a bar_hit field in the PHY stream:

sig = phy_signature(64, with_bar_hit=True)
# sig.payload has: dat(64), be(8), bar_hit(6)

PHYs with BAR hit: GW5ASTPCIePHY provides bar_hit on the RX stream, decoded from the Gowin IP's rx_bardec output.

LTSSM Width (A2)

LTSSMTracer supports configurable LTSSM state signal widths (5, 6, or 8 bits) via the ltssm_width parameter.

PHY LTSSM Width
SimPCIePHY 6 (default)
S7PCIEPHY 6
GW5ASTPCIePHY 5

How to Write a New PHY Backend

  1. Subclass PCIePHY and set class attributes:

    class GW5APCIEPHY(PCIePHY):
        endianness = "little"
        shared_channel = False
        ltssm_width = 5
  2. Create PCIeConfig in __init__:

    self.config = PCIeConfig()
    self.max_request_size_bytes = 256
    self.max_payload_size_bytes = 256
  3. Create streams using phy_signature(), msi_signature():

    self.sink = phy_signature(data_width).create(path=("sink",))
    self.source = phy_signature(data_width).create(path=("source",))
    self.msi = msi_signature().create(path=("msi",))
  4. Optionally add credit interface, config access, BAR hit:

    self.credits = PCIeCreditInterface(extended=True)
    self.config_access = PCIeConfigAccess()
  5. Implement elaborate() with vendor IP instantiation, clock domain creation, and datapath wiring.


Examples

See the examples/ directory:

Example Description
basic_endpoint.py Minimal PCIe endpoint with Wishbone + DMA
dma_loopback.py DMA loopback simulation test
wishbone_bridge.py Wishbone bridge with FPGA registers

Testing

# Run all tests
cd amaranth-pcie/
pdm run pytest tests/ -v

# Run specific tests
pdm run pytest tests/test_common.py -v
pdm run pytest tests/test_tlp_common.py -v
pdm run pytest tests/test_msi.py -v
pdm run pytest tests/test_endpoint_sim.py -v
pdm run pytest tests/test_gowin_phy.py -v

Available Tests

Test file Covers
tests/test_common.py Stream signatures, layouts, BAR mask
tests/test_tlp_common.py TLP headers, endianness swap
tests/test_msi.py MSI/MSI-X controllers
tests/test_endpoint_sim.py Full endpoint simulation
tests/test_phy_abstractions.py Multi-vendor PHY abstractions (90 tests)
tests/test_gowin_phy.py Gowin GW5AST PHY, TLP adapter, and IP generator

Comparison with LitePCIe

LitePCIe (Migen) amaranth-pcie (Amaranth) Notes
LitePCIePHY7Series S7PCIEPHY Xilinx 7-Series wrapper
GW5ASTPCIePHY Gowin GW5AST-138 wrapper (new)
GowinPCIeGenerator Gowin IP generation tool (new)
LitePCIeEndpoint PCIeEndpoint Top-level assembly
LitePCIeCrossbar PCIeCrossbar Routing fabric
LitePCIeTLPDepacketizer TLPDepacketizer PHY → typed streams
LitePCIeTLPPacketizer TLPPacketizer Typed streams → PHY
LitePCIeTLPController TLPController Tag management
LitePCIeMSI PCIeMSI Single-vector MSI
LitePCIeMSIMultiVector PCIeMSIMultiVector Multi-vector MSI
LitePCIeMSIX PCIeMSIX MSI-X
LitePCIeWishboneMaster PCIeWishboneMaster Host → FPGA Wishbone
LitePCIeWishboneSlave PCIeWishboneSlave FPGA → Host Wishbone
LitePCIeDMA PCIeDMA Full DMA with plugins
LitePCIeDMAScatterGather DMAScatterGather Moved to amaranth-lib
LitePCIeDMAReader DMAReader Moved to amaranth-lib
LitePCIeDMAWriter DMAWriter Moved to amaranth-lib
LitePCIeDMALoopback DMALoopback Moved to amaranth-lib
LitePCIeDMASynchronizer DMASynchronizer Moved to amaranth-lib
LitePCIeDMABuffering DMABuffering Moved to amaranth-lib

Key architectural difference: The DMA engine is split into bus-agnostic cores (amaranth-lib) and PCIe-specific adapters (amaranth-pcie), enabling reuse with other bus protocols.

About

PCIe core for Amaranth HDL - TLP packetizer/depacketizer, DMA engine, MSI/MSI-X, Gowin GW5AST PHY support

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages