[UR][L0] Restrict USM residency to peers with enabled P2P access#21889
[UR][L0] Restrict USM residency to peers with enabled P2P access#21889ldorau wants to merge 2 commits intointel:syclfrom
Conversation
7e50ca5 to
d0b4788
Compare
There was a problem hiding this comment.
Pull request overview
Implements peer-access–driven memory residency management for the Level Zero v2 adapter, wiring ext_oneapi_enable_peer_access/disable through UR to update USM pool residency, and adjusting SYCL’s peer-access API to avoid cross-platform usage.
Changes:
- Add L0 v2 peer-access implementation that toggles per-device peer state and propagates residency updates to all tracked contexts.
- Extend USM pool/provider plumbing to support runtime resident-device changes and add pool-manager iteration helpers.
- Update SYCL peer-access enable/disable to validate platforms; add initial (currently placeholder) UR tests.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp | Adds multi-device residency tests for peer-access (currently placeholders). |
| unified-runtime/source/common/ur_pool_manager.hpp | Adds descriptor helpers and pool-manager iteration with descriptor access. |
| unified-runtime/source/common/backtrace_lin.cpp | Introduces a constant for max backtrace frames. |
| unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp | New L0 v2 implementation for peer access enable/disable/info and context propagation. |
| unified-runtime/source/adapters/level_zero/v2/usm.hpp | Exposes USM pool API to change resident devices. |
| unified-runtime/source/adapters/level_zero/v2/usm.cpp | Updates provider creation to use peer-enabled residency model and adds residency-change plumbing. |
| unified-runtime/source/adapters/level_zero/v2/memory.cpp | Switches P2P eligibility check to the new “enabled peers” model. |
| unified-runtime/source/adapters/level_zero/v2/context.hpp | Adds APIs to query enabled peer relationships and to propagate residency changes. |
| unified-runtime/source/adapters/level_zero/v2/context.cpp | Removes precomputed P2P tables; tracks contexts; adds peer-access query helpers and residency propagation. |
| unified-runtime/source/adapters/level_zero/usm_p2p.cpp | Updates v1 behavior to log enable/disable as ignored (always enabled). |
| unified-runtime/source/adapters/level_zero/platform.hpp | Updates platform comment to reflect v2 peer-access usage of tracked contexts. |
| unified-runtime/source/adapters/level_zero/platform.cpp | Initializes per-device peer tables based on L0 P2P capability/properties. |
| unified-runtime/source/adapters/level_zero/device.hpp | Adds peer-status table to devices and stream operators. |
| unified-runtime/source/adapters/level_zero/device.cpp | Implements stream operators for device id and peer status. |
| unified-runtime/source/adapters/level_zero/context.cpp | Minor comment adjustment around context tracking in v1. |
| unified-runtime/source/adapters/level_zero/CMakeLists.txt | Moves usm_p2p.cpp into the v2 adapter build. |
| sycl/source/device.cpp | Adds same-platform validation for enable/disable peer access calls. |
| .github/copilot-instructions.md | Expands repository instructions/documentation for Copilot usage. |
d0b4788 to
24eaab3
Compare
174e3f1 to
e2e57bf
Compare
e2e57bf to
8346b35
Compare
8346b35 to
9e50ee8
Compare
9e50ee8 to
65c706f
Compare
65c706f to
a2b54f4
Compare
a2b54f4 to
08a08ab
Compare
9eff3df to
8709bd0
Compare
|
Please review @lslusarczyk - this is a continuation of #19257 |
8709bd0 to
4b075c4
Compare
4b075c4 to
1c04c0f
Compare
1c04c0f to
b3434ed
Compare
b3434ed to
efb4316
Compare
|
Please review @intel/llvm-reviewers-runtime @intel/unified-runtime-reviewers @intel/unified-runtime-reviewers-level-zero |
Previously, USM memory was made resident on all peer devices regardless
of whether peer access had been explicitly enabled by the user. This
change limits memory residency to only those peer devices for which
peer access has been enabled via urUsmP2PEnablePeerAccessExp.
To support this, a per-device `peers` vector is introduced
(ur_device_handle_t_::PeerStatus) that tracks the peer-access state
(ENABLED, DISABLED, NO_CONNECTION) for every device on the platform.
The vector is populated at platform initialisation time using
zeDeviceGetP2PProperties and updated atomically when
urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp are called.
Key changes:
- device.hpp / device.cpp: introduce the PeerStatus enum
(ENABLED, DISABLED, NO_CONNECTION) and peers vector on
ur_device_handle_t_; add stream operators for logging device id and
PeerStatus values.
- platform.cpp: populate `peers` for every device during device-cache
initialisation.
- v2/context.cpp: add changeResidentDevice(),
getDevicesWhoseAllocationsCanBeAccessedFrom(), and
getDevicesWhichCanAccessAllocationsPresentOn() to update memory
residency in existing USM pools when peer-access state changes.
Fix urContextRelease to only remove the context from
Platform->Contexts and delete it when the reference count actually
reaches zero, preventing live contexts from being missed during
subsequent peer-access updates.
Track contexts created via urContextCreateWithNativeHandle in
Platform->Contexts so that peer-access updates propagate to them.
Add an explicit numDevices > 0 guard in urContextCreate and
urContextCreateWithNativeHandle, returning
UR_RESULT_ERROR_INVALID_VALUE for zero-device contexts; the L0 v2
adapter requires at least one device to initialise platform-level
caches from phDevices[0]->Platform.
Guard urContextRelease's Platform->Contexts removal with a
!getDevices().empty() check to avoid UB (dereferencing hDevices[0])
for contexts with no devices.
Fix a potential double-free of the zeContext handle in urContextCreate:
the constructor now takes v2::raii::ze_context_handle_t&& so the raw
handle is wrapped in RAII immediately after zeContextCreate and moved
into the constructor; if the constructor throws after the hContext
member is initialised, the moved-from local RAII wrapper is empty and
no double-free occurs.
Fix UR_LOG format strings in changeResidentDevice: replace
printf-style %p/%d with {} placeholders (fmtlib-style required by the
logger); use (void*) casts for pointer arguments to avoid dereferencing
potentially invalid handles on the error path.
Fix changeResidentDevice to hold a shared lock on the context Mutex
while iterating usmPoolHandles, preventing a data race with
concurrent addUsmPool/removeUsmPool calls.
Fix a data race in getDevicesWhichCanAccessAllocationsPresentOn where
peerCandidateDevice->peers.size() was read in UR_FASSERT before the
scoped_lock was acquired; the lock is now moved before the assert,
consistent with getDevicesWhoseAllocationsCanBeAccessedFrom.
- v2/context.hpp: update the constructor signature to take
v2::raii::ze_context_handle_t&& instead of (ze_context_handle_t, bool).
- v2/memory.cpp: update getP2PDevices() call to
getDevicesWhoseAllocationsCanBeAccessedFrom(); enhance the UR_LOG(WARN)
when P2P is not accessible to include the requesting device and active
allocation device pointers for easier diagnosis.
- v2/usm.cpp: build the resident-devices list from the source device
plus only the explicitly enabled peer devices (via
getDevicesWhichCanAccessAllocationsPresentOn), instead of passing
all platform devices to umfLevelZeroMemoryProviderParamsSetResidentDevices.
The previous code computed the enabled-peer set but discarded it,
making memory resident on all devices unconditionally.
Use native ZeDevice handle comparison (desc.hDevice->ZeDevice ==
hDevice->ZeDevice) instead of pointer equality when searching pools in
changeResidentDevice, correctly matching pools keyed by any handle
alias for the same physical device (e.g. root vs sub-device handles).
- v2/usm_p2p.cpp: new file implementing urUsmP2PEnablePeerAccessExp,
urUsmP2PDisablePeerAccessExp, and urUsmP2PPeerAccessGetInfoExp for
the v2 adapter. Fix a data race where peers[] was written under a
shared (read) lock; changed to an exclusive scoped_lock.
Copy Platform->Contexts under ContextsMutex and iterate outside the
critical section to avoid holding the mutex during heavy
changeResidentDevice calls, reducing deadlock risk.
Move validateP2PDevicePair call to the top of
urUsmP2PPeerAccessGetInfoExp (before the switch) so it validates
both UR_PEER_ACCESS_SUPPORT and UR_PEER_ATOMICS_SUPPORT cases
uniformly.
Add null checks for both handles in validateP2PDevicePair, returning
UR_RESULT_ERROR_INVALID_NULL_HANDLE as required by the spec; move the
UR_LOG call in urUsmP2PChangePeerAccessExp to after
validateP2PDevicePair to avoid logging unvalidated pointer values.
- CMakeLists.txt: move usm_p2p.cpp from the v1 build into v2/usm_p2p.cpp
in the v2 build so the v2 adapter uses its own P2P implementation.
- usm_p2p.cpp (v1): add diagnostic log messages to
urUsmP2PEnablePeerAccessExp and urUsmP2PDisablePeerAccessExp
clarifying that the operations are no-ops in v1 because P2P is
always enabled there; use (void*) casts for pointer arguments in
UR_LOG calls to avoid dereferencing potentially null handles.
- sycl/source/device.cpp: add a cross-platform check in
ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access and
throw sycl::exception(errc::invalid) when the two devices belong to
different platforms. Fix exception messages: "Can not" -> "Cannot".
- ur_pool_manager.hpp: propagate resident-devices information through
pool descriptors.
- backtrace.hpp / backtrace_lin.cpp / backtrace_win.cpp: replace
#define MAX_BACKTRACE_FRAMES in backtrace.hpp with a static constexpr
int; remove the now-redundant redeclaration from backtrace_lin.cpp,
the Windows equivalent from backtrace_win.cpp, and the duplicate
definition from ur_leak_check.hpp. This avoids macro leakage and
improves type safety.
Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk@intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Fill in the three placeholder multi-device tests in memory_residency.cpp to verify the peer-access-driven residency behaviour introduced by the parent commit. allocationInitiallyAbsentOnPeer Allocates USM memory on devices[0] without calling urUsmP2PEnablePeerAccessExp first and asserts that the allocation is NOT made resident on devices[1]. Verified by checking that free memory on devices[0] (source) decreases by at least allocSize, while free memory on devices[1] (peer) does not decrease by a full allocSize. enablePeerAccessStateMachineAndSourceAllocation Enables peer access from devices[1] to devices[0] (so devices[1] can access allocations on devices[0], causing new allocations on devices[0] to be made resident on devices[1] too) and verifies that a second enable attempt returns UR_RESULT_ERROR_INVALID_OPERATION (state machine check). Allocates USM memory on devices[0] with P2P enabled and asserts that source-device free memory decreases by at least allocSize (the allocation succeeded and memory is on devices[0]). Also verifies end-to-end P2P data transfer: fills the allocation with a known pattern, copies it from devices[0] to devices[1] using devices[1]'s command queue (exercising the enabled P2P path), reads the result back to host memory, and asserts that all bytes match the fill pattern. Disables peer access and frees the allocation on exit. Note: peer-device free memory is not checked because UR_DEVICE_INFO_GLOBAL_MEM_FREE does not reliably reflect zeContextMakeMemoryResident behaviour for device USM allocations. disablePeerAccessStateMachineAndSourceAllocationPersists Enables peer access from devices[1] to devices[0] and allocates USM memory on devices[0]. Disables peer access (the runtime evicts the allocation from devices[1]) and verifies that a second disable attempt returns UR_RESULT_ERROR_INVALID_OPERATION (state machine check). Asserts that source-device free memory still shows the allocation is present. Frees the allocation on exit. Note: peer-device eviction is not checked via free memory for the same reason as above (unreliable for device USM). All three tests: - Skip when either device is not PVC (UR_DEVICE_INFO_GLOBAL_MEM_FREE is only reliably accurate on PVC). - Skip when no hardware P2P connection exists in the direction used by the tests (devices[1] accessing devices[0] allocations), querying UR_EXP_PEER_INFO_UR_PEER_ACCESS_SUPPORT for (devices[1], devices[0]). - Use uint64_t (as required by the UR spec) for UR_DEVICE_INFO_GLOBAL_MEM_FREE query values. - Use the peerAccessEnabled flag in the fixture so that TearDown can disable peer access even if a test assertion fails mid-way, keeping subsequent tests in a clean state. Also fixes the pre-existing allocatingDeviceMemoryWillResultInOOM test to use uint64_t (instead of size_t) for UR_DEVICE_INFO_GLOBAL_MEM_FREE queries, consistent with the UR spec. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
| return exceptionToResult(std::current_exception()); | ||
| ur_context_handle_t *phContext) { | ||
| *phContext = nullptr; | ||
| if (deviceCount == 0 || phDevices == nullptr) { |
| ur_context_handle_t_::ur_context_handle_t_(v2::raii::ze_context_handle_t zeCtx, | ||
| uint32_t numDevices, | ||
| const ur_device_handle_t *phDevices, | ||
| bool ownZeContext) |
There was a problem hiding this comment.
ownZeContext might still be needed because of #21789.
| *phContext = nullptr; | ||
| // The L0 v2 adapter requires at least one device: the constructor | ||
| // initialises platform-level caches using phDevices[0]->Platform. | ||
| if (numDevices == 0 || phDevices == nullptr) { |
There was a problem hiding this comment.
Perhaps the same comment as for urContextCreate applies here too.
| // Ref count dropped to zero - remove from the tracked list before destroying. | ||
| // Only do this when the context has devices, since getPlatform() relies on | ||
| // the first device and would otherwise dereference an empty device list. | ||
| if (!hContext->getDevices().empty()) { |
There was a problem hiding this comment.
You are checking for non-zero device count during context creation in urContextCreate/urContextCreateWithNativeHandle, this might be unnecessary.
| @@ -0,0 +1,130 @@ | |||
| //===----------- usm_p2p.cpp - L0 Adapter ---------------------------------===// | |||
| // | |||
| // Copyright (C) 2023 Intel Corporation | |||
There was a problem hiding this comment.
Wrong date, and the copyright clause was changed recently.
eg.
//===--------- usm.cpp - Level Zero Adapter ------------------------------===//
//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM
// Exceptions. See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
| // cache), and peerDevice's id must be a valid index into commandDevice->peers. | ||
| static ur_result_t validateP2PDevicePair(ur_device_handle_t commandDevice, | ||
| ur_device_handle_t peerDevice) { | ||
| if (!commandDevice || !peerDevice) { |
There was a problem hiding this comment.
commandDevice and peerDevice handle null-checks are already performed at validation layer for urUsmP2P... functions.
@pbalcer should adapter code always assume that parameter handles are valid, or are these checks still necessary given that the validation layer is optional?
|
|
||
| // L0 has peer devices enabled by default | ||
| UR_LOG(INFO, | ||
| "user enables peer access from {} to memory of {}, ignored, in V1 P2P " |
There was a problem hiding this comment.
Consider something more descriptive, eg. ignoring P2P enable request from {} to memory of {}, P2P is enabled by default for the Level Zero V1 adapter.
| if (!uur::isPVC(devices[0]) || !uur::isPVC(devices[1])) { | ||
| GTEST_SKIP() << "Test requires PVC devices"; | ||
| } | ||
| if (!hasHardwareP2PSupport()) { |
There was a problem hiding this comment.
I see that all the urMemoryMultiResidencyTest testcases check for hasHardwareP2PSupport, perhaps this check should be moved to the fixture
Previously, USM memory was made resident on all peer devices regardless
of whether peer access had been explicitly enabled by the user. This
change limits memory residency to only those peer devices for which
peer access has been enabled via urUsmP2PEnablePeerAccessExp.
To support this, a per-device
peersvector is introduced(ur_device_handle_t_::PeerStatus) that tracks the peer-access state
(ENABLED, DISABLED, NO_CONNECTION) for every device on the platform.
The vector is populated at platform initialisation time using
zeDeviceGetP2PProperties and updated atomically when
urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp are called.
Key changes:
(ENABLED, DISABLED, NO_CONNECTION) and peers vector on
ur_device_handle_t_; add stream operators for logging device id and
PeerStatus values.
peersfor every device during device-cacheinitialisation.
getDevicesWhoseAllocationsCanBeAccessedFrom(), and
getDevicesWhichCanAccessAllocationsPresentOn() to update memory
residency in existing USM pools when peer-access state changes.
Fix urContextRelease to only remove the context from
Platform->Contexts and delete it when the reference count actually
reaches zero, preventing live contexts from being missed during
subsequent peer-access updates.
Track contexts created via urContextCreateWithNativeHandle in
Platform->Contexts so that peer-access updates propagate to them.
Add an explicit numDevices > 0 guard in urContextCreate and
urContextCreateWithNativeHandle, returning
UR_RESULT_ERROR_INVALID_VALUE for zero-device contexts; the L0 v2
adapter requires at least one device to initialise platform-level
caches from phDevices[0]->Platform.
Guard urContextRelease's Platform->Contexts removal with a
!getDevices().empty() check to avoid UB (dereferencing hDevices[0])
for contexts with no devices.
Fix a potential double-free of the zeContext handle in urContextCreate:
the constructor now takes v2::raii::ze_context_handle_t&& so the raw
handle is wrapped in RAII immediately after zeContextCreate and moved
into the constructor; if the constructor throws after the hContext
member is initialised, the moved-from local RAII wrapper is empty and
no double-free occurs.
Fix UR_LOG format strings in changeResidentDevice: replace
printf-style %p/%d with {} placeholders (fmtlib-style required by the
logger); use (void*) casts for pointer arguments to avoid dereferencing
potentially invalid handles on the error path.
Fix changeResidentDevice to hold a shared lock on the context Mutex
while iterating usmPoolHandles, preventing a data race with
concurrent addUsmPool/removeUsmPool calls.
Fix a data race in getDevicesWhichCanAccessAllocationsPresentOn where
peerCandidateDevice->peers.size() was read in UR_FASSERT before the
scoped_lock was acquired; the lock is now moved before the assert,
consistent with getDevicesWhoseAllocationsCanBeAccessedFrom.
v2::raii::ze_context_handle_t&& instead of (ze_context_handle_t, bool).
getDevicesWhoseAllocationsCanBeAccessedFrom(); enhance the UR_LOG(WARN)
when P2P is not accessible to include the requesting device and active
allocation device pointers for easier diagnosis.
plus only the explicitly enabled peer devices (via
getDevicesWhichCanAccessAllocationsPresentOn), instead of passing
all platform devices to umfLevelZeroMemoryProviderParamsSetResidentDevices.
The previous code computed the enabled-peer set but discarded it,
making memory resident on all devices unconditionally.
Use native ZeDevice handle comparison (desc.hDevice->ZeDevice ==
hDevice->ZeDevice) instead of pointer equality when searching pools in
changeResidentDevice, correctly matching pools keyed by any handle
alias for the same physical device (e.g. root vs sub-device handles).
urUsmP2PDisablePeerAccessExp, and urUsmP2PPeerAccessGetInfoExp for
the v2 adapter. Fix a data race where peers[] was written under a
shared (read) lock; changed to an exclusive scoped_lock.
Copy Platform->Contexts under ContextsMutex and iterate outside the
critical section to avoid holding the mutex during heavy
changeResidentDevice calls, reducing deadlock risk.
Move validateP2PDevicePair call to the top of
urUsmP2PPeerAccessGetInfoExp (before the switch) so it validates
both UR_PEER_ACCESS_SUPPORT and UR_PEER_ATOMICS_SUPPORT cases
uniformly.
Add null checks for both handles in validateP2PDevicePair, returning
UR_RESULT_ERROR_INVALID_NULL_HANDLE as required by the spec; move the
UR_LOG call in urUsmP2PChangePeerAccessExp to after
validateP2PDevicePair to avoid logging unvalidated pointer values.
in the v2 build so the v2 adapter uses its own P2P implementation.
urUsmP2PEnablePeerAccessExp and urUsmP2PDisablePeerAccessExp
clarifying that the operations are no-ops in v1 because P2P is
always enabled there; use (void*) casts for pointer arguments in
UR_LOG calls to avoid dereferencing potentially null handles.
ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access and
throw sycl::exception(errc::invalid) when the two devices belong to
different platforms. Fix exception messages: "Can not" -> "Cannot".
pool descriptors.
#define MAX_BACKTRACE_FRAMES in backtrace.hpp with a static constexpr
int; remove the now-redundant redeclaration from backtrace_lin.cpp,
the Windows equivalent from backtrace_win.cpp, and the duplicate
definition from ur_leak_check.hpp. This avoids macro leakage and
improves type safety.
Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk AT intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau AT intel.com>