Skip to content

Use Metal ICBs for multi-draw indirect count#9679

Draft
matthargett wants to merge 22 commits into
gfx-rs:trunkfrom
rebeckerspecialties:metal-icb-multi-draw-indirect-count
Draft

Use Metal ICBs for multi-draw indirect count#9679
matthargett wants to merge 22 commits into
gfx-rs:trunkfrom
rebeckerspecialties:metal-icb-multi-draw-indirect-count

Conversation

@matthargett

@matthargett matthargett commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Connections

Stacked on #9640, which adds the Metal ICB infrastructure for fixed-count multi-draw indirect.

Related alternative: #9659 implements MULTI_DRAW_INDIRECT_COUNT by GPU-preprocessing into a temporary buffer followed by CPU-driven indirect draw loops. This PR instead keeps the command count GPU-resident and executes through Metal Indirect Command Buffers.

Description

This exposes Features::MULTI_DRAW_INDIRECT_COUNT on Metal when the adapter supports both render and compute Indirect Command Buffers, then routes non-indexed, indexed, and mesh count-buffer multi-draw calls through the existing ICB generation path.

Metal does not have a Vulkan-style draw_indirect_count entrypoint, but it can execute an MTLIndirectCommandBuffer with an indirect execution range. The implementation records up to max_count ICB commands on the GPU from the WebGPU indirect-argument buffer, runs a tiny compute kernel that clamps the guest-provided count buffer value to max_count, and then calls executeCommandsInBuffer:indirectBuffer:indirectBufferOffset:. No CPU readback of the count buffer is needed.

This PR also makes the ICB resume path restore active render bind groups before executing the ICB. Without that, count draws with material/uniform bind groups can hit the ICB path but render with missing inherited buffer state after the render pass is suspended and resumed. The restore reuses the existing Metal bind-group update code and keeps the existing immediates/query/resume gates in place.

Testing

Local M4 Max / Metal validation only; I do not have the iPhone XS/A12 device with me for this follow-up.

  • cargo fmt --check
  • git diff --check
  • cargo check -p wgpu-hal --features metal --no-default-features
  • WGPU_BACKEND=metal WGPU_METAL_REQUIRE_ICB_MDI=1 cargo test -p wgpu-test --test wgpu-gpu multi_draw_indirect_count -- --test-threads=1 --nocapture, 3 passed; diagnostics showed non-indexed count and mesh count executing through Metal ICB.
  • WGPU_BACKEND=metal WGPU_METAL_REQUIRE_ICB_MDI=1 cargo test -p wgpu-test --test wgpu-gpu multi_draw_indexed_indirect_count -- --test-threads=1 --nocapture, 2 passed; diagnostics showed indexed count executing through Metal ICB.
  • WGPU_BACKEND=metal WGPU_METAL_REQUIRE_ICB_MDI=1 cargo test -p wgpu-test --test wgpu-gpu multi_draw_indirect_bind_group -- --test-threads=1 --nocapture, 1 passed; diagnostics showed fixed-count bind-group MDI now executing through Metal ICB after bind-group restore.
  • WGPU_BACKEND=metal cargo test -p wgpu-test --test wgpu-gpu draw_indirect -- --test-threads=1 --nocapture, 38 passed.
  • cargo xtask cts 'webgpu:api,validation,encoding,cmds,render,indirect_multi_draw:*', 6 / 6 passed.

Squash or Rebase?

Squash before merge. This is intentionally stacked on #9640; once #9640 lands, this should be rebased so the final diff is just the count-buffer support and tests.

Checklist

  • I self-reviewed and fully understand this PR.
  • WebGPU implementations built with wgpu may be affected behaviorally on Metal when they request MULTI_DRAW_INDIRECT_COUNT.
  • Validation and feature gates are in place to confine behavioral changes.
  • Tests demonstrate the validation and altered logic works.
  • CHANGELOG.md entries for the user-facing effects of this change are present.
  • The PR is minimal, and doesn't make sense to land as multiple PRs after use Metal's Indirect Command Buffers for true GPU-side multi draw indirect #9640.
  • Commits are logically scoped and individually reviewable.
  • The PR description has enough context to understand the motivation and solution implemented.

@Bromles

Bromles commented Jun 15, 2026

Copy link
Copy Markdown

One small correction - #9659 doesn't use CPU-side loop. It injects compute shader to fill temp buffer and then issues a single draw_indirect command to render from it

Other than that - thank you, I was looking for this feature for a long time

I have Mac Mini M1, Mac Mini M4 Pro, iPhone 13 Pro Max and iPad Pro 12.9" (6th gen) if you need help with testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants