PCIe controller for Amaranth HDL — a complete PCIe endpoint stack with TLP processing, crossbar routing, DMA, and Wishbone/AXI bridges.
amaranth-pcie is a port of LitePCIe to the Amaranth HDL ecosystem. It provides a layered PCIe endpoint architecture:
┌─────────────────────────────────────────────────────────────────┐
│ User Logic │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Wishbone │ │ PCIeDMA │ │ PCIeAXI │ │
│ │ Bridge │ │ (SG + R/W) │ │ Slave │ │
│ └────┬─────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ Frontend │
├────────┴───────────────┴─────────────────┴──────────────────────┤
│ PCIeCrossbar │
│ Slave ports (Host→FPGA) Master ports (FPGA→Host) │
│ │ │ Core │
├────────┴──────────────────────────┴─────────────────────────────┤
│ TLP Depacketizer / Packetizer │
│ TLPController TLP │
├─────────────────────────────────────────────────────────────────┤
│ PCIe PHY │
│ SimPCIePHY │ S7PCIEPHY (Xilinx) │ GW5ASTPCIePHY (Gowin) PHY │
└─────────────────────────────────────────────────────────────────┘
| Vendor | Device | PHY Class | PCIe Gen | Max Lanes | Status |
|---|---|---|---|---|---|
| Xilinx | 7-Series (Artix-7, Kintex-7, Virtex-7) | S7PCIEPHY |
Gen1, Gen2 | x8 | Ported from LitePCIe |
| Gowin | GW5AST-138 (FCPBGA676A) | GW5ASTPCIePHY |
Gen2, Gen3 | x4 | New |
| — | Simulation | SimPCIePHY |
— | — | For testing |
# Clone the monorepo and install with PDM
cd amaranth-pcie/
pdm install
# Install with dev dependencies (pytest)
pdm install -G dev| Package | Description |
|---|---|
amaranth |
Amaranth HDL core |
amaranth-soc |
SoC components (Wishbone, CSR) |
amaranth-stream |
Stream infrastructure (FIFO, arbiter, CDC, etc.) |
amaranth-lib |
Generic DMA engine and utilities |
"""Minimal PCIe endpoint with Wishbone bridge and DMA."""
from amaranth import *
from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.wishbone import PCIeWishboneMaster
from amaranth_pcie.frontend.dma import PCIeDMA
class MyPCIeDesign(Elaboratable):
def elaborate(self, platform):
m = Module()
# 1. Create PHY (simulation for testing)
m.submodules.phy = phy = SimPCIePHY(data_width=64)
# 2. Create endpoint (connects PHY ↔ TLP ↔ Crossbar)
endpoint = PCIeEndpoint(phy)
m.submodules.endpoint = endpoint
# 3. Add Wishbone bridge for host CSR access
m.submodules.wb = wb = PCIeWishboneMaster(endpoint)
# 4. Add DMA with loopback for testing
m.submodules.dma = dma = PCIeDMA(
endpoint, with_loopback=True,
)
# 5. Connect user logic to DMA streams
# dma.source = data from host (reader output)
# dma.sink = data to host (writer input)
return mPHY Layer TLP Layer Core Layer Frontend Layer
───────── ───────── ────────── ──────────────
┌──────────────┐
SimPCIePHY ──────► │TLPDepacketizer│──► ┌──────────────┐ ┌──────────────────┐
S7PCIEPHY │ │ │ │ │ PCIeWishboneMaster│
GW5ASTPCIePHY │ │ │ │ │ │
│ (PHY→typed) │ │ PCIeCrossbar│◄──►│ PCIeWishboneSlave │
└──────────────┘ │ │ │ PCIeDMA │
┌──────────────┐ │ (routing) │ │ PCIeAXISlave │
◄── │TLPPacketizer │◄── │ │ └──────────────────┘
│ │ └──────────────┘
│ (typed→PHY) │ ▲
└──────────────┘ │
┌──────────────┐ ┌────┴─────┐
│TLPController │◄──►│PCIeEndpoint│
│ (tag mgmt) │ │ (assembly)│
└──────────────┘ └──────────┘
| Feature | SimPCIePHY |
S7PCIEPHY |
GW5ASTPCIePHY |
|---|---|---|---|
| Vendor | — | Xilinx | Gowin |
| Target device | Simulation | 7-Series | GW5AST-138 |
| PCIe Gen | — | Gen1/Gen2 | Gen2/Gen3 |
| Max lanes | — | x8 | x4 |
| Data widths | 64, 128, 256 | 64, 128 | 64, 128, 256 |
| Native data width | user-selected | 64 or 128 | 256 (IP-fixed) |
| Endianness | "big" |
"big" |
"big" |
| Shared channel | False |
True |
False |
| LTSSM width | 6 (default) | 6 | 5 |
| Clock domain | sync |
pcie (IP-generated) |
user-provided (sync) |
| CDC required | No | Yes (sync↔pcie) | No |
| Width conversion | No | Optional (64↔128) | Optional (64/128↔256) |
| BAR hit | No | No | Yes (bar_hit in RX stream) |
| TX credits | No | No | Yes (extended: header+data counts) |
| MSI interface | Basic (valid/ready) | Basic (valid/ready) | Extended (ack/status/msinum) |
| Config access (DRP) | No | No | Yes (read/write) |
| IP generation | N/A | External (Vivado) | Automatic (gowin_pcie_gen.py) |
| LTSSM tracer | No | No | Yes (built-in) |
The endpoint supports two PHY channel modes:
Shared channel (default) — single sink/source on PHY:
PHY.source → Depacketizer → {req_source → Crossbar.phy_slave_sink,
cmp_source → Crossbar.phy_master_sink}
{Crossbar.phy_slave_source → Packetizer.cmp_sink,
Crossbar.phy_master_source → Packetizer.req_sink} → PHY.sink
Separate channels — req_source/cmp_source and req_sink/cmp_sink on PHY:
PHY.req_source → req_depacketizer → Crossbar.phy_slave_sink
PHY.cmp_source → cmp_depacketizer → Crossbar.phy_master_sink
Crossbar.phy_slave_source → cmp_packetizer → PHY.cmp_sink
Crossbar.phy_master_source → req_packetizer → PHY.req_sink
| PHY | shared_channel |
Description |
|---|---|---|
SimPCIePHY |
False |
Independent TX/RX |
S7PCIEPHY |
True |
Xilinx 7-Series shared channel |
GW5ASTPCIePHY |
False |
Gowin GW5AST separate request/completion paths |
| Constant | Value | Description |
|---|---|---|
KB |
1024 | Kilobyte |
MB |
1024² | Megabyte |
GB |
1024³ | Gigabyte |
Compute the BAR mask for a given BAR size in bytes (must be power of two).
from amaranth_pcie.common import get_bar_mask, MB
mask = get_bar_mask(1 * MB) # → 0xFFF00000All stream signatures use amaranth_stream.Signature with has_first_last=True for packet framing.
| Function | Payload Fields | Description |
|---|---|---|
phy_signature(data_width) |
dat, be |
PHY-level raw data + byte enables |
request_signature(data_width, address_width=32) |
req_id, we, adr, len, tag, dat, channel, user_id |
Memory read/write request |
completion_signature(data_width, address_width=32) |
req_id, cmp_id, adr, len, end, err, tag, dat, channel, user_id |
Completion |
configuration_signature(data_width) |
req_id, we, bus_number, device_no, func, ext_reg, register_no, tag, dat, channel |
Configuration request |
ptm_signature(data_width) |
request, response, requester_id, length, message_code, master_time, dat, channel |
PTM |
msi_signature() |
dat(8) |
MSI interrupt (no framing) |
dma_signature(data_width) |
payload(data_width) |
DMA data with first/last |
| Function | Returns | Key Fields |
|---|---|---|
phy_layout(data_width) |
StructLayout |
dat(data_width), be(data_width//8) |
request_layout(data_width, address_width) |
StructLayout |
req_id(16), we(1), adr(addr_w), len(10), tag(8), dat(data_w), channel(8), user_id(8) |
completion_layout(data_width, address_width) |
StructLayout |
req_id(16), cmp_id(16), adr(addr_w), len(10), end(1), err(1), tag(8), dat(data_w), channel(8), user_id(8) |
msi_layout() |
StructLayout |
dat(8) |
| Constant | Value | Description |
|---|---|---|
max_payload_size |
512 | Maximum TLP payload (bytes) |
max_request_size |
512 | Maximum TLP request (bytes) |
tlp_common_header_length |
16 | Common header = 4 DWORDs = 16 bytes |
from amaranth_pcie.tlp.common import fmt_type_dict
fmt_type_dict["mem_rd32"] # 0b00_00000 — Memory Read 32-bit
fmt_type_dict["mem_rd64"] # 0b01_00000 — Memory Read 64-bit
fmt_type_dict["mem_wr32"] # 0b10_00000 — Memory Write 32-bit
fmt_type_dict["mem_wr64"] # 0b11_00000 — Memory Write 64-bit
fmt_type_dict["cpld"] # 0b10_01010 — Completion with Data
fmt_type_dict["cpl"] # 0b00_01010 — Completion without Datafrom amaranth_pcie.tlp.common import cpl_dict
cpl_dict["sc"] # 0b000 — Successful Completion
cpl_dict["ur"] # 0b001 — Unsupported Request
cpl_dict["crs"] # 0b010 — Configuration Request Retry Status
cpl_dict["ca"] # 0b011 — Completer AbortBoth HeaderLayout (with byte offsets for wire-level packing) and StructLayout (for logical field access) are provided:
| Layout | Type | Fields |
|---|---|---|
tlp_request_header |
HeaderLayout |
fmt, type, tc, td, ep, attr, length, requester_id, tag, last_be, first_be, address(64) |
tlp_completion_header |
HeaderLayout |
fmt, type, tc, td, ep, attr, length, completer_id, status, bcm, byte_count, requester_id, tag, lower_address |
tlp_request_header_layout |
StructLayout |
Same fields as above, flat |
tlp_completion_header_layout |
StructLayout |
Same fields as above, flat |
| Function | Description |
|---|---|
tlp_raw_signature(data_width) |
Raw TLP: fmt(2) + header(128) + dat + be |
tlp_request_signature(data_width) |
Request header fields + dat + be |
tlp_completion_signature(data_width) |
Completion header fields + dat + be |
Generates combinational assignments for DWORD-level endianness swap. Used by packetizer/depacketizer for big-endian PHYs.
Converts raw PHY streams into typed request/completion streams.
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
— | PHY data width (64 or 128 bits) |
endianness |
str |
— | "big" or "little" |
address_mask |
int |
0 | BAR0 address mask for request filtering |
capabilities |
list[str] |
— | List of "REQUEST", "COMPLETION", "PTM", "CONFIGURATION" |
| Port | Direction | Description |
|---|---|---|
sink |
In |
PHY stream input (phy_signature) |
req_source |
Out |
Request stream output (if "REQUEST" in capabilities) |
cmp_source |
Out |
Completion stream output (if "COMPLETION" in capabilities) |
Converts typed request/completion streams into raw PHY streams. Supports both 3DW and 4DW TLP headers.
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
— | PHY data width (64 or 128 bits) |
endianness |
str |
— | "big" or "little" |
address_width |
int |
32 | Address width (32 or 64) |
capabilities |
list[str] |
— | List of "REQUEST", "COMPLETION", "PTM" |
| Port | Direction | Description |
|---|---|---|
req_sink |
In |
Request stream input |
cmp_sink |
In |
Completion stream input |
source |
Out |
PHY stream output (phy_signature) |
Manages tag allocation for outstanding read requests and reorders completions to match request order.
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
— | Data width in bits |
max_pending_requests |
int |
— | Maximum outstanding read requests |
cmp_bufs_buffered |
bool |
True |
Use buffered FIFOs for completion buffers |
address_width |
int |
32 | Address width |
| Port | Direction | Description |
|---|---|---|
req_sink |
In |
Request stream from crossbar |
req_source |
Out |
Request stream to PHY (with tag assigned) |
cmp_sink |
In |
Completion stream from PHY |
cmp_source |
Out |
Completion stream to crossbar (reordered) |
ctrl_rst |
In(1) |
Controller reset |
| Class | Direction | User sees | Description |
|---|---|---|---|
PCIeSlaveInternalPort |
Host → FPGA | — | Internal crossbar port |
PCIeSlavePort |
Host → FPGA | sink=requests, source=completions |
User-facing slave port |
PCIeMasterInternalPort |
FPGA → Host | — | Internal crossbar port |
PCIeMasterPort |
FPGA → Host | sink=completions, source=requests |
User-facing master port |
Central routing fabric connecting frontend ports to the TLP layer through arbitration and dispatch logic.
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
— | PHY data width in bits |
address_width |
int |
32 | Address width (32 or 64) |
max_pending_requests |
int |
4 | Max outstanding read requests for TLP controller |
cmp_bufs_buffered |
bool |
True |
Buffered completion FIFOs |
with_configuration |
bool |
False |
Support configuration TLPs |
Register and return a new slave port with address-based routing.
# address_decoder: function(address_signal) → 1-bit match signal
port = crossbar.get_slave_port(lambda a: a[20:] == 0)
# port.sink = request stream (from host)
# port.source = completion stream (to host)Register and return a new master port with auto-assigned channel ID.
rd_port = crossbar.get_master_port(read_only=True)
wr_port = crossbar.get_master_port(write_only=True)
# port.source = request stream (to host)
# port.sink = completion stream (from host)Slave path (Host → FPGA):
PHY slave sink (requests) → Dispatcher (by address_decoder) → user slave sources
user slave sinks (completions) → Arbiter → PHY slave source
Master path (FPGA → Host):
┌─────────┐
RD ports ────►│ Arb/Disp├──► TLPController ──┐
RW ports ────►│ │ │
└─────────┘ ├──► Arb/Disp ──► PHY master
┌─────────┐ │
WR ports ────►│ Arb/Disp├─────────────────────┘
└─────────┘
Write-only ports bypass the TLPController to avoid blocking writes when reads are throttled.
Top-level assembly connecting PHY ↔ TLP ↔ Crossbar. This is the main entry point for building a PCIe design.
| Parameter | Type | Default | Description |
|---|---|---|---|
phy |
object or dict | — | PHY instance or config dict |
max_pending_requests |
int |
4 | Max outstanding read requests |
address_width |
int |
32 | Address width (32 or 64) |
endianness |
str |
"big" |
Byte order within each DWORD |
cmp_bufs_buffered |
bool |
True |
Buffered completion FIFOs |
with_ptm |
bool |
False |
Support PTM TLPs |
with_configuration |
bool |
False |
Support configuration TLPs |
| Attribute | Type | Description |
|---|---|---|
crossbar |
PCIeCrossbar |
The internal crossbar instance |
data_width |
int |
PHY data width |
bar0_mask |
int |
BAR0 address mask |
get_slave_port() |
method | Delegates to crossbar.get_slave_port() |
get_master_port() |
method | Delegates to crossbar.get_master_port() |
from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint
# With a real PHY
phy = SimPCIePHY(data_width=64)
endpoint = PCIeEndpoint(phy)
# With a config dict (for testing without PHY)
endpoint = PCIeEndpoint({
"data_width": 64,
"bar0_mask": 0xFFF00000,
"id": 0x0001,
"max_request_size": 512,
"max_payload_size": 128,
})MSI interrupt controllers.
| Parameter | Type | Default | Description |
|---|---|---|---|
width |
int |
32 | Number of IRQ sources |
| Port | Direction | Width | Description |
|---|---|---|---|
irqs |
In |
width |
One bit per IRQ source |
source |
Out |
msi_signature() |
MSI output stream |
enable |
In |
width |
Per-IRQ enable mask |
clear |
In |
width |
Per-IRQ clear mask |
clear_strobe |
In |
1 | Strobe for clear |
vector |
Out |
width |
Current pending IRQ vector |
Same ports as PCIeMSI but source.payload.dat carries the IRQ number (lower index = higher priority). No clear/clear_strobe — cleared automatically on MSI acceptance.
| Port | Direction | Width | Description |
|---|---|---|---|
irqs |
In |
width |
One bit per IRQ source |
enable |
In |
width |
Per-IRQ enable mask |
pba |
Out |
width |
Pending Bit Array |
Exposes msix_wr_valid, msix_wr_ready, msix_wr_adr, msix_wr_dat signals for external TLP write connection.
Full scatter-gather bi-directional DMA over PCIe with optional plugins.
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint instance |
data_width |
int or None |
None |
User data width (defaults to PHY data width) |
table_depth |
int |
256 | Scatter-gather table depth |
address_width |
int |
32 | Address width |
with_loopback |
bool |
False |
Enable loopback plugin |
with_synchronizer |
bool |
False |
Enable synchronizer plugin |
with_buffering |
bool |
False |
Enable buffering plugin |
buffering_depth |
int |
2048 | Depth for buffering FIFOs (bytes) |
writer_buffering_depth |
int or None |
None |
Override writer buffering depth |
reader_buffering_depth |
int or None |
None |
Override reader buffering depth |
with_reader |
bool |
True |
Enable DMA reader |
with_writer |
bool |
True |
Enable DMA writer |
| Attribute | Type | Description |
|---|---|---|
source |
stream | Data from host (reader output) |
sink |
stream | Data to host (writer input) |
irq |
Signal |
Combined IRQ from reader + writer |
reader |
PCIeDMAReader |
Reader sub-component (if enabled) |
writer |
PCIeDMAWriter |
Writer sub-component (if enabled) |
loopback |
DMALoopback |
Loopback plugin (if enabled) |
synchronizer |
DMASynchronizer |
Synchronizer plugin (if enabled) |
buffering |
DMABuffering |
Buffering plugin (if enabled) |
Reader path (Host → FPGA):
ScatterGather → Splitter → DMAReader ←→ ReaderAdapter ←→ CrossbarMasterPort(read_only)
↓
data_source → [plugins] → user source
Writer path (FPGA → Host):
user sink → [plugins] → data_sink
↓
ScatterGather → Splitter → DMAWriter ←→ WriterAdapter ←→ CrossbarMasterPort(write_only)
from amaranth import *
from amaranth_pcie.phy.sim import SimPCIePHY
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.dma import PCIeDMA
class DMADesign(Elaboratable):
def elaborate(self, platform):
m = Module()
m.submodules.phy = phy = SimPCIePHY(data_width=64)
endpoint = PCIeEndpoint(phy)
m.submodules.endpoint = endpoint
m.submodules.dma = dma = PCIeDMA(
endpoint,
with_loopback=True,
with_buffering=True,
buffering_depth=4096,
)
# User logic: consume data from host
with m.If(dma.source.valid & dma.source.ready):
# Process dma.source.payload (data_width bits)
pass
return mIndividual DMA reader/writer with scatter-gather + splitter + bus adapter. Used internally by PCIeDMA but can be instantiated separately.
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint |
port |
PCIeMasterPort |
— | Crossbar master port (read_only) |
table_depth |
int |
256 | Scatter-gather table depth |
address_width |
int |
32 | Address width |
data_width |
int or None |
None |
User data width |
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint |
port |
PCIeMasterPort |
— | Crossbar master port (write_only) |
table_depth |
int |
256 | Scatter-gather table depth |
address_width |
int |
32 | Address width |
data_width |
int or None |
None |
User data width |
| Attribute | Type | Description |
|---|---|---|
table |
DMAScatterGather |
Descriptor table |
splitter |
DMADescriptorSplitter |
Descriptor splitter |
reader/writer |
DMAReader/DMAWriter |
Generic DMA core |
adapter |
PCIeDMAReaderAdapter/PCIeDMAWriterAdapter |
PCIe bus adapter |
Host accesses FPGA's Wishbone bus via PCIe BAR. Gets a slave port from the crossbar.
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint |
address_decoder |
callable or None |
None |
Address match function (default: match all) |
base_address |
int |
0x00000000 |
Base address offset for Wishbone |
qword_aligned |
bool |
False |
Handle 64-bit aligned access |
wb_addr_width |
int |
32 | Wishbone address width |
wb_data_width |
int |
32 | Wishbone data width |
| Attribute | Type | Description |
|---|---|---|
port |
PCIeSlavePort |
Crossbar slave port |
wb |
Wishbone interface | Wishbone master bus |
IDLE → DO-WRITE → IDLE (host writes to FPGA register)
IDLE → DO-READ → ISSUE-READ-COMPLETION → IDLE (host reads FPGA register)
from amaranth import *
from amaranth_soc.wishbone.sram import WishboneSRAM
class WishboneDesign(Elaboratable):
def elaborate(self, platform):
m = Module()
# ... create phy, endpoint ...
# Wishbone bridge — host can read/write FPGA registers
m.submodules.wb_bridge = wb_bridge = PCIeWishboneMaster(
endpoint,
base_address=0x00000000,
)
# Connect Wishbone bus to an SRAM
m.submodules.sram = sram = WishboneSRAM(size=4096)
# Connect wb_bridge.wb to sram.wb ...
return mFPGA accesses Host memory via Wishbone interface. Gets a master port from the crossbar.
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint |
wb_addr_width |
int |
32 | Wishbone address width |
wb_data_width |
int |
32 | Wishbone data width |
qword_aligned |
bool |
False |
Handle 64-bit aligned access |
| Attribute | Type | Description |
|---|---|---|
port |
PCIeMasterPort |
Crossbar master port |
wb |
Wishbone interface | Wishbone slave bus (with err feature) |
IDLE → ISSUE-WRITE → IDLE (FPGA writes to host memory)
IDLE → ISSUE-READ → RECEIVE-READ-COMPLETION → IDLE (FPGA reads from host memory)
Includes a WaitTimer(2**16) for timeout/error handling.
AXI4 slave interface for PCIe DMA. Converts AXI4 read/write transactions into PCIe DMA operations without scatter-gather tables.
| Parameter | Type | Default | Description |
|---|---|---|---|
endpoint |
PCIeEndpoint |
— | PCIe endpoint |
data_width |
int |
32 | AXI data width |
address_width |
int |
32 | AXI address width |
id_width |
int |
1 | AXI ID width |
| Channel | Signals |
|---|---|
| Write Address (AW) | aw_valid, aw_ready, aw_addr, aw_len, aw_id |
| Write Data (W) | w_valid, w_ready, w_data, w_last |
| Write Response (B) | b_valid, b_ready, b_id, b_resp |
| Read Address (AR) | ar_valid, ar_ready, ar_addr, ar_len, ar_id |
| Read Data (R) | r_valid, r_ready, r_data, r_last, r_id, r_resp |
Simulation-only PCIe PHY. Implements the same interface as a real PHY but without vendor IP. TX data is optionally looped back to RX through a FIFO.
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
64 | Data bus width in bits |
bar0_size |
int |
1 * MB |
BAR0 region size in bytes |
max_request_size |
int |
512 | Simulated max request size |
max_payload_size |
int |
128 | Simulated max payload size |
with_loopback |
bool |
True |
Loop TX back to RX |
| Attribute | Type | Description |
|---|---|---|
data_width |
int |
Data bus width |
endianness |
str |
Always "big" |
id |
Signal(16) |
PCIe device ID (init=0x0001) |
bar0_size |
int |
BAR0 size |
bar0_mask |
int |
BAR0 mask |
max_request_size |
Signal(16) |
Max request size |
max_payload_size |
Signal(16) |
Max payload size |
sink |
stream | TX stream (core → "host") |
source |
stream | RX stream ("host" → core) |
msi |
stream | MSI interrupt stream |
link_up |
Signal |
Link-up status (always 1) |
Xilinx 7-Series PCIe PHY wrapper. Wraps the Xilinx PCIe hard IP block with proper datapath handling (CDC, width conversion, endianness).
| Parameter | Type | Default | Description |
|---|---|---|---|
nlanes |
int |
1 | Number of PCIe lanes (1, 2, 4, or 8) |
data_width |
int |
64 | Core-side data width (64 or 128) |
pcie_data_width |
int or None |
None |
PHY-side data width (defaults to data_width) |
bar0_size |
int |
1 * MB |
BAR0 size |
clock_domain |
str |
"sync" |
Core clock domain |
| Attribute | Type | Description |
|---|---|---|
data_width |
int |
Core-side data width |
endianness |
str |
Always "big" |
shared_channel |
bool |
Always True |
config |
PCIeConfig |
Standardized config interface |
link_up |
Signal |
Link-up status |
ltssm_state |
Signal(6) |
LTSSM state |
sink |
stream | TX stream (core → PCIe) |
source |
stream | RX stream (PCIe → core) |
msi |
stream | MSI interrupt stream |
The Xilinx 7-Series PCIe IP generates its own clock (user_clk_out), which drives the pcie clock domain. CDC (clock domain crossing) is required between the pcie domain and the user's sync domain. The PHYTXDatapath and PHYRXDatapath handle this automatically.
Gowin GW5AST-138 PCIe PHY wrapper. Wraps the Gowin SerDes_Top hard IP block for use with the amaranth-pcie stack.
The GW5ASTPCIePHY wraps the Gowin GW5AST-138 PCIe hard IP for use with the amaranth-pcie stack. It supports:
- Lane widths: X1, X2, X4
- Link speeds: Gen2 (5.0 GT/s), Gen3 (8.0 GT/s)
- TLP clock frequencies: 100 MHz, 125 MHz, 150 MHz
- Up to 6 BARs with configurable sizes (16B to 1MB+), 32-bit or 64-bit, prefetchable
- MSI interrupts (optional, with extended ack/status/msinum interface)
- Max payload sizes: 128B to 4096B
- Configurable Vendor/Device/Class IDs
- DRP config space access (runtime read/write of PCIe configuration registers)
- TX flow control credits (extended format with header/data counts)
- Hardware BAR decode (
bar_hitfield in RX stream) - Built-in LTSSM tracer for link training debug
| Parameter | Type | Default | Description |
|---|---|---|---|
data_width |
int |
64 | User-facing data width (64, 128, or 256). The Gowin IP always uses 256-bit internally; width conversion is inserted automatically when data_width < 256. |
gowin_config |
GowinPCIeConfig |
— | IP generation configuration (device, lanes, BARs, vendor/device ID, etc.). See GowinPCIeConfig below. |
gowin_path |
str |
— | Path to the Gowin IDE installation directory (e.g. "/opt/gowin/IDE"). Required for IP generation. |
ip_output_dir |
str or None |
None |
Optional override for IP output directory. If None, uses a cache directory under ~/.cache/amaranth-pcie/gowin/<config_hash>. |
clock_domain |
str |
"sync" |
Clock domain name for the TLP clock. The Gowin IP uses a user-provided TLP clock (unlike Xilinx which generates its own). |
| Attribute | Type | Description |
|---|---|---|
data_width |
int |
User-facing data width |
endianness |
str |
Always "big" |
shared_channel |
bool |
Always False (separate request/completion paths) |
ltssm_width |
int |
Always 5 |
config |
PCIeConfig |
Standardized config interface (bus/device/function numbers, max payload/request sizes) |
bar0_size |
int |
BAR0 region size in bytes (from gowin_config.bars[0]) |
bar0_mask |
int |
BAR0 address mask |
link_up |
Signal |
Link-up status |
ltssm |
Signal(5) |
5-bit LTSSM state |
source |
stream | RX stream (PCIe → core), with bar_hit field |
sink |
stream | TX stream (core → PCIe) |
msi |
stream | Basic MSI stream (valid/ready/dat) |
msi_ext |
PCIeMSIInterface |
Extended MSI interface (ack/status/msinum) |
credits |
PCIeCreditInterface |
TX credit availability (extended mode with counts) |
config_access |
PCIeConfigAccess |
DRP configuration space read/write port |
ltssm_tracer |
LTSSMTracer |
Built-in LTSSM state transition tracer |
perst_n |
Signal |
PERST# input (active low). Connect to board's PERST# pad, or leave unconnected for auto-start (tied high). |
nlanes |
int |
Number of PCIe lanes |
msi_enable |
bool |
Whether MSI is enabled in the IP configuration |
Unlike Xilinx PHYs which generate their own clock domain, the Gowin GW5AST PCIe IP uses a user-provided TLP clock. The clock is supplied via the pcie_tl_clk_i input of the SerDes_Top instance, driven by ClockSignal(clock_domain).
Key implication: No CDC is needed. Since the IP runs in the user's clock domain (typically sync), there is no clock domain crossing between the PHY and the rest of the stack. This simplifies the design and reduces latency compared to Xilinx PHYs.
Xilinx: IP generates pcie_clk → CDC needed between pcie and sync domains
Gowin: User provides sync_clk → IP runs in sync domain → no CDC needed
When data_width < 256, the PHYTXDatapath and PHYRXDatapath are used for width conversion only (no CDC), since the clock domain is the same on both sides.
The PHY implements a two-phase reset sequence controlled by the perst_n signal:
-
PERST# debounce — A 26-bit counter at 100 MHz (≈670 ms) debounces the PERST# input. The counter resets to zero whenever PERST# is asserted (low).
-
Delayed start — After PERST# deasserts and the debounce counter saturates, a 20-bit delay counter (≈10 ms) runs before releasing the IP's
pcie_rstnsignal.
PERST# asserted (low) → debounce_cnt = 0, pcie_rstn = 0
PERST# deasserted (high) → debounce_cnt counts up
debounce_cnt saturates → delay_cnt counts up
delay_cnt saturates → pcie_rstn = 1 (IP starts link training)
If perst_n is left unconnected, it defaults to high (auto-start mode).
The Gowin PCIe IP always uses a 256-bit internal TLP bus. When the user requests a narrower data_width (64 or 128), the PHY automatically inserts width conversion datapaths:
TX path: user sink (64/128-bit) → PHYTXDatapath → StrideConverter → adapter sink (256-bit) → IP
RX path: IP → adapter source (256-bit) → PHYRXDatapath → StrideConverter → user source (64/128-bit)
The bar_hit signal is not carried through the width converter. Instead, it is latched from the adapter on SOP (start of packet) and held for the duration of the packet.
When data_width == 256, the adapter connects directly to the user-facing streams with no conversion.
The Gowin IP provides TX credit information via three 32-bit registers (creditsp, creditsnp, creditscpl):
| Bit Range | Field | Description |
|---|---|---|
[31] |
available |
Credits available (boolean) |
[15:8] |
header_count |
Number of header credits |
[11:0] |
data_count |
Number of data credits |
These are decoded into the PCIeCreditInterface (extended mode):
| Credit Channel | Available Signal | Header Count | Data Count |
|---|---|---|---|
| Posted | credits.posted_header_available |
credits.posted_header_count |
credits.posted_data_count |
| Non-Posted | credits.non_posted_header_available |
credits.non_posted_header_count |
credits.non_posted_data_count |
| Completion | credits.completion_header_available |
credits.completion_header_count |
credits.completion_data_count |
The TLPPacketizer automatically gates TX based on credit availability when phy.has_credits() returns True.
When MSI is enabled in the IP configuration (gowin_config.msi_enable=True), the PHY provides two MSI interfaces:
Basic MSI (phy.msi) — Compatible with the standard amaranth-pcie MSI flow:
phy.msi.valid # Assert to request MSI
phy.msi.ready # Asserted when MSI is accepted (driven by IP's ack)
phy.msi.payload.dat # 8-bit vector numberExtended MSI (phy.msi_ext) — Gowin-specific with additional status:
phy.msi_ext.valid # MSI request
phy.msi_ext.ack # MSI acknowledge from IP
phy.msi_ext.status # Signal(3) — MSI status
phy.msi_ext.msinum # Signal(5) — MSI vector number (up to 32 vectors)When MSI is disabled, phy.msi.ready is tied to 0.
The PHY exposes a PCIeConfigAccess port for runtime read/write access to the PCIe configuration space via the Gowin DRP (Dynamic Reconfiguration Port):
# Read a config register
m.d.comb += [
phy.config_access.read_en.eq(1),
phy.config_access.read_addr.eq(0x000), # 12-bit DWORD-aligned address
]
# phy.config_access.read_data is valid when phy.config_access.read_valid is asserted
# Write a config register
m.d.comb += [
phy.config_access.write_en.eq(1),
phy.config_access.write_addr.eq(0x004),
phy.config_access.write_data.eq(0xDEADBEEF),
phy.config_access.write_be.eq(0xF), # 4-bit byte enables
]Check capability at construction time: phy.has_config_access() returns True.
BARs are configured via the BARConfig dataclass in the GowinPCIeConfig:
from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig as GowinPCIeConfig, BARConfig
# Example 1: Single 1MB 64-bit BAR
config = GowinPCIeConfig(
bars=[
BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True), # BAR0: 1MB, 64-bit
BARConfig(), # BAR1: consumed by BAR0
BARConfig(), BARConfig(), BARConfig(), BARConfig(), # BAR2-5: disabled
],
)
# Example 2: Three 32-bit BARs
config = GowinPCIeConfig(
bars=[
BARConfig(enabled=True, size_bytes=1024), # BAR0: 1KB
BARConfig(enabled=True, size_bytes=2048), # BAR1: 2KB
BARConfig(enabled=True, size_bytes=4096, is_prefetchable=True), # BAR2: 4KB, prefetchable
BARConfig(), BARConfig(), BARConfig(), # BAR3-5: disabled
],
)
# Example 3: Mixed 64-bit and 32-bit BARs
config = GowinPCIeConfig(
bars=[
BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True), # BAR0-1: 1MB, 64-bit
BARConfig(), # BAR1: consumed
BARConfig(enabled=True, size_bytes=64*1024), # BAR2: 64KB, 32-bit
BARConfig(), BARConfig(), BARConfig(), # BAR3-5: disabled
],
)BAR size constraints: Must be a power of 2, minimum 16 bytes.
from amaranth import *
from amaranth_pcie.phy.gowin import GW5ASTPCIePHY
from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig as GowinPCIeConfig, BARConfig
from amaranth_pcie.core.endpoint import PCIeEndpoint
from amaranth_pcie.frontend.wishbone import PCIeWishboneMaster
from amaranth_pcie.frontend.dma import PCIeDMA
class GowinPCIeDesign(Elaboratable):
def elaborate(self, platform):
m = Module()
# 1. Configure the Gowin PCIe IP
gowin_config = GowinPCIeConfig(
lane_width="X4",
gen="gen3",
vendor_id="1234",
device_id="5678",
class_code="0580", # Memory controller
msi_enable=True,
max_payload=1024,
tl_clk_freq=100,
bars=[
BARConfig(enabled=True, size_bytes=1024*1024, is_64bit=True),
BARConfig(),
BARConfig(enabled=True, size_bytes=64*1024),
BARConfig(), BARConfig(), BARConfig(),
],
)
# 2. Create the PHY
m.submodules.phy = phy = GW5ASTPCIePHY(
data_width=64,
gowin_config=gowin_config,
gowin_path="/opt/gowin/IDE",
clock_domain="sync",
)
# 3. Connect PERST# (optional — defaults to auto-start)
# phy.perst_n can be connected to a board PERST# pad
# 4. Create endpoint
endpoint = PCIeEndpoint(phy)
m.submodules.endpoint = endpoint
# 5. Add Wishbone bridge for host CSR access
m.submodules.wb = wb = PCIeWishboneMaster(endpoint)
# 6. Add DMA
m.submodules.dma = dma = PCIeDMA(
endpoint,
with_loopback=True,
with_buffering=True,
buffering_depth=4096,
)
# 7. Connect MSI interrupt from DMA
m.d.comb += phy.msi.valid.eq(dma.irq)
return m| Class | Module | Description |
|---|---|---|
GW5ASTPCIePHY |
gowin.py |
Main PHY class (inherits PCIePHY) |
GowinTLPAdapter |
gowin.py |
Bridge between Gowin SOP/EOP/valid[7:0] and amaranth stream interface |
GowinPCIeConfig |
gowin_pcie_gen.py |
IP generation configuration dataclass |
BARConfig |
gowin_pcie_gen.py |
Per-BAR configuration dataclass |
GowinPCIeGenerator |
gowin_pcie_gen.py |
IP generation pipeline orchestrator |
Bridge between the Gowin PCIe IP's 256-bit TLP interface (SOP/EOP/valid[7:0]/data[255:0]) and the amaranth-pcie stream interface (valid/ready/first/last with dat/be payload).
The Gowin IP uses big-endian DWORD ordering: DWORD7 at bits [255:224] is the first DWORD on the wire. The adapter reverses the DWORD order so that the first wire DWORD appears at bits [31:0], matching the convention used by the rest of the stack.
| IP Signal | Width | Stream Mapping |
|---|---|---|
rx_sop |
1 | source.first |
rx_eop |
1 | source.last |
rx_data |
256 | source.payload.dat (DWORD-swapped) |
rx_valid |
8 | source.payload.be (each bit → 4 byte enables, reversed) |
rx_bardec |
6 | source.payload.bar_hit |
rx_wait |
1 | ~source.ready (backpressure) |
| Stream Signal | IP Mapping |
|---|---|
sink.first & sink.valid |
tx_sop |
sink.last & sink.valid |
tx_eop |
sink.payload.dat |
tx_data (DWORD-swapped) |
sink.payload.be |
tx_valid (4 bytes → 1 bit, reversed) |
~tx_wait |
sink.ready (backpressure) |
Dataclass containing all parameters needed to generate a Gowin PCIe IP core.
| Field | Type | Default | Description |
|---|---|---|---|
gowin_dir |
str |
"" |
Path to Gowin IDE installation |
output_dir |
str |
"" |
Output directory for generated files |
device |
str |
"GW5AST-138" |
Target device |
device_version |
str |
"C" |
Device version |
package |
str |
"FCPBGA676A" |
Package type |
part_number |
str |
"GW5AST-LV138FPG676AC1/I0" |
Full part number |
device_id_short |
str |
"gw5ast138c-003" |
Short device ID for project files |
vendor_id |
str |
"22C2" |
PCIe Vendor ID (hex) |
device_id |
str |
"1100" |
PCIe Device ID (hex) |
class_code |
str |
"0580" |
PCIe Class Code (hex, e.g. "0580" = Memory controller) |
revision |
str |
"00" |
PCIe Revision ID (hex) |
lane_width |
str |
"X4" |
Lane width: "X1", "X2", or "X4" |
gen |
str |
"gen3" |
PCIe generation: "gen2" or "gen3" |
ref_clock |
str |
"100MHz" |
Reference clock frequency |
ref_clock_source |
str |
"Q0 REFCLK0" |
Reference clock source |
bars |
list[BARConfig] |
3 enabled (1KB, 2KB, 2KB) | List of 6 BAR configurations |
msi_enable |
bool |
True |
Enable MSI capability |
module_name |
str |
"SerDes_Top" |
Top-level module name |
file_name |
str |
"serdes" |
Base file name for generated files |
pcie_module_name |
str |
"PCIE_Controller_Top" |
PCIe controller module name |
max_payload |
int |
1024 |
Max payload size in bytes |
tl_clk_freq |
int |
100 |
TLP clock frequency in MHz |
| Property | Returns | Description |
|---|---|---|
series |
str |
Device series (e.g. "GW5AST") |
device_name_with_version |
str |
Device with version (e.g. "GW5AST-138C") |
lane_count |
int |
Number of lanes (1, 2, or 4) |
base_class |
str |
First two hex digits of class_code |
sub_class |
str |
Last two hex digits of class_code |
is_gen3 |
bool |
Whether gen == "gen3" |
Dataclass for a single PCIe Base Address Register configuration.
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool |
True |
Whether this BAR is enabled |
size_bytes |
int |
1024 |
BAR size in bytes (must be power of 2, minimum 16) |
is_64bit |
bool |
False |
Whether this is a 64-bit BAR (consumes the next BAR slot) |
is_prefetchable |
bool |
False |
Whether this BAR is prefetchable |
Programmatically generates Gowin PCIe IP cores without the GUI, replicating the exact 4-step pipeline that the Gowin IDE performs.
| Step | Tool | Description |
|---|---|---|
| 1 | GowinSynthesis |
Synthesize PCIe controller from encrypted Verilog |
| 2 | GowinSynthesis |
Synthesize UPAR arbiter from Verilog sources |
| 3 | GowinModGen |
Generate SerDes top-level wrapper (serdes.v) |
| 4 | serdes_toml_to_csr |
Convert TOML config to CSR register writes (serdes.csr) |
| File | Description |
|---|---|
src/serdes/serdes.v |
SerDes top-level wrapper (Verilog) |
src/serdes/serdes.csr |
CSR register configuration |
src/serdes/pcie_controller/pcie_controller.v |
Synthesized PCIe controller netlist |
src/serdes/upar_arbiter/upar_arbiter.v |
Synthesized UPAR arbiter netlist |
src/serdes/serdes.ipc |
IP configuration file |
src/serdes/serdes.mod |
Module definition file |
src/serdes/serdes_tmp.toml |
SerDes TOML configuration |
Generated files are cached based on a SHA-256 hash of the configuration parameters. The cache directory is ~/.cache/amaranth-pcie/gowin/<hash>. Regeneration only occurs when parameters change.
from amaranth_pcie.phy.gowin_pcie_gen import PCIeConfig, GowinPCIeGenerator
config = PCIeConfig(
gowin_dir="/opt/gowin/IDE",
output_dir="./my_pcie_project",
vendor_id="AABB",
device_id="1234",
lane_width="X4",
gen="gen3",
msi_enable=True,
max_payload=1024,
tl_clk_freq=100,
)
gen = GowinPCIeGenerator(config)
gen.generate()The generator can also be run as a standalone command-line tool:
python -m amaranth_pcie.phy.gowin_pcie_gen \
--gowin-dir /opt/gowin/IDE \
--output-dir ./output \
--vendor-id AABB \
--device-id 1234 \
--lanes X4 \
--gen gen3 \
--bars 1048576,0,65536 \
--max-payload 1024 \
--tl-clk-freq 100 \
-v| CLI Option | Default | Description |
|---|---|---|
--gowin-dir |
(required) | Path to Gowin IDE installation |
--output-dir |
(required) | Output directory |
--device |
GW5AST-138 |
Target device |
--device-version |
C |
Device version |
--package |
FCPBGA676A |
Package |
--part-number |
GW5AST-LV138FPG676AC1/I0 |
Full part number |
--vendor-id |
22C2 |
PCIe Vendor ID (hex) |
--device-id |
1100 |
PCIe Device ID (hex) |
--class-code |
0580 |
PCIe Class Code (hex) |
--revision |
00 |
Revision ID (hex) |
--lanes |
X4 |
Lane width (X1, X2, X4) |
--gen |
gen3 |
PCIe generation (gen2, gen3) |
--ref-clock |
100MHz |
Reference clock frequency |
--bars |
1024,2048,2048 |
Comma-separated BAR sizes (0=disabled) |
--no-msi |
— | Disable MSI capability |
--max-payload |
1024 |
Max payload size |
--tl-clk-freq |
100 |
TLP clock frequency (MHz) |
-v / --verbose |
— | Enable verbose logging |
Common datapath modules for CDC, width conversion, and pipeline buffering between core and PHY clock domains.
Pipeline: sink → PipeValid → StreamCDC(sync→pcie) → StrideConverter → PipeReady → source
| Parameter | Type | Description |
|---|---|---|
phy_data_width |
int |
PHY-side data width |
core_data_width |
int |
Core-side data width |
clock_domain |
str |
Core clock domain (default "sync") |
Pipeline: sink → [Aligner] → PipeReady → StrideConverter → CDC(pcie→sync) → PipeValid → source
| Parameter | Type | Description |
|---|---|---|
phy_data_width |
int |
PHY-side data width |
core_data_width |
int |
Core-side data width |
clock_domain |
str |
Core clock domain (default "sync") |
with_aligner |
bool |
Include 128-bit aligner |
Debug module tracking LTSSM state transitions via a synchronous FIFO. Records state changes for software to trace link training.
| Parameter | Type | Description |
|---|---|---|
ltssm |
Signal or None |
LTSSM state signal (in sync domain) |
ltssm_width |
int |
Width of the LTSSM state signal (default 6; Gowin uses 5) |
| Port | Direction | Width | Description |
|---|---|---|---|
history_new |
Out |
ltssm_width |
New LTSSM state |
history_old |
Out |
ltssm_width |
Previous LTSSM state |
history_ovfl |
Out |
1 | Overflow flag |
history_valid |
Out |
1 | FIFO output valid |
history_re |
In |
1 | Read-enable to pop entry |
The PHY layer provides a set of abstract interfaces and base classes that enable writing vendor-independent PCIe endpoint logic. These were introduced as 12 improvement proposals (A1–H1) to support PHYs beyond Xilinx 7-Series.
All PHY implementations inherit from PCIePHY, which defines the formal contract:
from amaranth_pcie.phy.common import PCIePHY
class MyVendorPHY(PCIePHY):
endianness = "big" # or "little" or "native"
shared_channel = False # True if TX/RX share one stream pair
def __init__(self, ...):
self.data_width = 64
self.bar0_size = 1 * MB
self.config = PCIeConfig() # standardized config interface
self.link_up = Signal()
# Create sink/source/msi streams ...
def elaborate(self, platform):
...Capability query helpers avoid hasattr() checks:
| Method | Returns True when… |
|---|---|
phy.has_bar_hit() |
PHY provides hardware BAR decode (bar_hit attribute) |
phy.has_credits() |
PHY provides credit-based flow control (credits attribute) |
phy.has_config_access() |
PHY provides config space read/write (config_access attribute) |
Capability support by PHY:
| Capability | SimPCIePHY |
S7PCIEPHY |
GW5ASTPCIePHY |
|---|---|---|---|
has_bar_hit() |
✗ | ✗ | ✓ |
has_credits() |
✗ | ✗ | ✓ |
has_config_access() |
✗ | ✗ | ✓ |
Legacy property accessors (phy.id, phy.max_request_size, phy.max_payload_size) delegate to phy.config for backward compatibility.
PCIeConfig provides runtime-negotiated configuration values as Amaranth Signals:
| Attribute | Width | Description |
|---|---|---|
bus_number |
8 | PCI bus number |
device_number |
5 | PCI device number |
function_number |
3 | PCI function number |
max_payload_size |
16 | Negotiated max payload size (bytes) |
max_request_size |
16 | Negotiated max request size (bytes) |
command |
16 | PCI command register |
status |
16 | PCI status register |
id |
16 | Composed BDF: Cat(function, device, bus) |
Signal vs. int convention (C2): phy.config.max_request_size is a Signal for hardware use. phy.max_request_size_bytes is a Python int for construction-time arithmetic (FIFO depths, etc.).
PCIeCreditInterface tracks TX credit availability:
from amaranth_pcie.phy.common import PCIeCreditInterface
# Basic mode — boolean availability flags
credits = PCIeCreditInterface()
credits.posted_header_available # Signal
credits.completion_data_available # Signal
# ... 6 channels total
# Extended mode — adds per-channel credit counts
credits = PCIeCreditInterface(extended=True)
credits.posted_header_count # Signal(8)
credits.posted_data_count # Signal(12)When a PHY sets self.credits = PCIeCreditInterface(), the TLPPacketizer automatically gates TX based on credit availability.
PHYs with credit support: GW5ASTPCIePHY (extended mode with header/data counts).
PCIeMSIInterface abstracts the MSI handshake:
from amaranth_pcie.phy.common import PCIeMSIInterface
# Basic mode (Xilinx-compatible)
msi = PCIeMSIInterface(extended=False)
msi.valid # MSI request valid
msi.ready # MSI request accepted
msi.dat # Signal(8) — vector number
# Extended mode (Gowin-compatible)
msi = PCIeMSIInterface(extended=True)
msi.ack # MSI acknowledge
msi.status # Signal(3) — MSI status
msi.msinum # Signal(5) — MSI vector numberPHYs with extended MSI: GW5ASTPCIePHY (when msi_enable=True).
PCIeConfigAccess provides register-level read/write access to the PCIe configuration space:
| Signal | Width | Description |
|---|---|---|
read_en |
1 | Read enable |
read_addr |
12 | Read address (DWORD-aligned) |
read_data |
32 | Read data |
read_valid |
1 | Read data valid |
write_en |
1 | Write enable |
write_addr |
12 | Write address |
write_data |
32 | Write data |
write_be |
4 | Write byte enables |
write_done |
1 | Write completion |
PHYs with config access: GW5ASTPCIePHY (via Gowin DRP interface).
PCIeResource captures physical PCIe slot parameters:
from amaranth_pcie.phy.common import PCIeResource
res = PCIeResource(0,
lanes=4,
refclk_freq=100e6,
refclk=platform.request("pcie_refclk"),
perst=platform.request("pcie_perst"),
extras={"speed": "gen2"},
)All stream signatures, PHY models, and TLP modules support 256-bit data widths:
phy = SimPCIePHY(data_width=256)
depkt = TLPDepacketizer(data_width=256, endianness="big")
pkt = TLPPacketizer(data_width=256, endianness="big")The Gowin GW5AST PCIe IP natively uses 256-bit data, making it the primary use case for this width.
PHYs that deliver data in the host's native byte order can declare endianness = "native", which skips the DWORD byte-swap in the packetizer/depacketizer:
class NativeOrderPHY(PCIePHY):
endianness = "native" # no swap neededPHYs that provide hardware BAR decode can include a bar_hit field in the PHY stream:
sig = phy_signature(64, with_bar_hit=True)
# sig.payload has: dat(64), be(8), bar_hit(6)PHYs with BAR hit: GW5ASTPCIePHY provides bar_hit on the RX stream, decoded from the Gowin IP's rx_bardec output.
LTSSMTracer supports configurable LTSSM state signal widths (5, 6, or 8 bits) via the ltssm_width parameter.
| PHY | LTSSM Width |
|---|---|
SimPCIePHY |
6 (default) |
S7PCIEPHY |
6 |
GW5ASTPCIePHY |
5 |
-
Subclass
PCIePHYand set class attributes:class GW5APCIEPHY(PCIePHY): endianness = "little" shared_channel = False ltssm_width = 5
-
Create
PCIeConfigin__init__:self.config = PCIeConfig() self.max_request_size_bytes = 256 self.max_payload_size_bytes = 256
-
Create streams using
phy_signature(),msi_signature():self.sink = phy_signature(data_width).create(path=("sink",)) self.source = phy_signature(data_width).create(path=("source",)) self.msi = msi_signature().create(path=("msi",))
-
Optionally add credit interface, config access, BAR hit:
self.credits = PCIeCreditInterface(extended=True) self.config_access = PCIeConfigAccess()
-
Implement
elaborate()with vendor IP instantiation, clock domain creation, and datapath wiring.
See the examples/ directory:
| Example | Description |
|---|---|
basic_endpoint.py |
Minimal PCIe endpoint with Wishbone + DMA |
dma_loopback.py |
DMA loopback simulation test |
wishbone_bridge.py |
Wishbone bridge with FPGA registers |
# Run all tests
cd amaranth-pcie/
pdm run pytest tests/ -v
# Run specific tests
pdm run pytest tests/test_common.py -v
pdm run pytest tests/test_tlp_common.py -v
pdm run pytest tests/test_msi.py -v
pdm run pytest tests/test_endpoint_sim.py -v
pdm run pytest tests/test_gowin_phy.py -v| Test file | Covers |
|---|---|
tests/test_common.py |
Stream signatures, layouts, BAR mask |
tests/test_tlp_common.py |
TLP headers, endianness swap |
tests/test_msi.py |
MSI/MSI-X controllers |
tests/test_endpoint_sim.py |
Full endpoint simulation |
tests/test_phy_abstractions.py |
Multi-vendor PHY abstractions (90 tests) |
tests/test_gowin_phy.py |
Gowin GW5AST PHY, TLP adapter, and IP generator |
| LitePCIe (Migen) | amaranth-pcie (Amaranth) | Notes |
|---|---|---|
LitePCIePHY7Series |
S7PCIEPHY |
Xilinx 7-Series wrapper |
| — | GW5ASTPCIePHY |
Gowin GW5AST-138 wrapper (new) |
| — | GowinPCIeGenerator |
Gowin IP generation tool (new) |
LitePCIeEndpoint |
PCIeEndpoint |
Top-level assembly |
LitePCIeCrossbar |
PCIeCrossbar |
Routing fabric |
LitePCIeTLPDepacketizer |
TLPDepacketizer |
PHY → typed streams |
LitePCIeTLPPacketizer |
TLPPacketizer |
Typed streams → PHY |
LitePCIeTLPController |
TLPController |
Tag management |
LitePCIeMSI |
PCIeMSI |
Single-vector MSI |
LitePCIeMSIMultiVector |
PCIeMSIMultiVector |
Multi-vector MSI |
LitePCIeMSIX |
PCIeMSIX |
MSI-X |
LitePCIeWishboneMaster |
PCIeWishboneMaster |
Host → FPGA Wishbone |
LitePCIeWishboneSlave |
PCIeWishboneSlave |
FPGA → Host Wishbone |
LitePCIeDMA |
PCIeDMA |
Full DMA with plugins |
LitePCIeDMAScatterGather |
DMAScatterGather |
Moved to amaranth-lib |
LitePCIeDMAReader |
DMAReader |
Moved to amaranth-lib |
LitePCIeDMAWriter |
DMAWriter |
Moved to amaranth-lib |
LitePCIeDMALoopback |
DMALoopback |
Moved to amaranth-lib |
LitePCIeDMASynchronizer |
DMASynchronizer |
Moved to amaranth-lib |
LitePCIeDMABuffering |
DMABuffering |
Moved to amaranth-lib |
Key architectural difference: The DMA engine is split into bus-agnostic cores (amaranth-lib) and PCIe-specific adapters (amaranth-pcie), enabling reuse with other bus protocols.