Skip to content

[UR][L0] Restrict USM residency to peers with enabled P2P access#21889

Open
ldorau wants to merge 2 commits intointel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers
Open

[UR][L0] Restrict USM residency to peers with enabled P2P access#21889
ldorau wants to merge 2 commits intointel:syclfrom
ldorau:URL0_Memory_resident_limit_to_enabled_peers

Conversation

@ldorau
Copy link
Copy Markdown
Contributor

@ldorau ldorau commented Apr 28, 2026

Previously, USM memory was made resident on all peer devices regardless
of whether peer access had been explicitly enabled by the user. This
change limits memory residency to only those peer devices for which
peer access has been enabled via urUsmP2PEnablePeerAccessExp.

To support this, a per-device peers vector is introduced
(ur_device_handle_t_::PeerStatus) that tracks the peer-access state
(ENABLED, DISABLED, NO_CONNECTION) for every device on the platform.
The vector is populated at platform initialisation time using
zeDeviceGetP2PProperties and updated atomically when
urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp are called.

Key changes:

  • device.hpp / device.cpp: introduce the PeerStatus enum
    (ENABLED, DISABLED, NO_CONNECTION) and peers vector on
    ur_device_handle_t_; add stream operators for logging device id and
    PeerStatus values.
  • platform.cpp: populate peers for every device during device-cache
    initialisation.
  • v2/context.cpp: add changeResidentDevice(),
    getDevicesWhoseAllocationsCanBeAccessedFrom(), and
    getDevicesWhichCanAccessAllocationsPresentOn() to update memory
    residency in existing USM pools when peer-access state changes.
    Fix urContextRelease to only remove the context from
    Platform->Contexts and delete it when the reference count actually
    reaches zero, preventing live contexts from being missed during
    subsequent peer-access updates.
    Track contexts created via urContextCreateWithNativeHandle in
    Platform->Contexts so that peer-access updates propagate to them.
    Add an explicit numDevices > 0 guard in urContextCreate and
    urContextCreateWithNativeHandle, returning
    UR_RESULT_ERROR_INVALID_VALUE for zero-device contexts; the L0 v2
    adapter requires at least one device to initialise platform-level
    caches from phDevices[0]->Platform.
    Guard urContextRelease's Platform->Contexts removal with a
    !getDevices().empty() check to avoid UB (dereferencing hDevices[0])
    for contexts with no devices.
    Fix a potential double-free of the zeContext handle in urContextCreate:
    the constructor now takes v2::raii::ze_context_handle_t&& so the raw
    handle is wrapped in RAII immediately after zeContextCreate and moved
    into the constructor; if the constructor throws after the hContext
    member is initialised, the moved-from local RAII wrapper is empty and
    no double-free occurs.
    Fix UR_LOG format strings in changeResidentDevice: replace
    printf-style %p/%d with {} placeholders (fmtlib-style required by the
    logger); use (void*) casts for pointer arguments to avoid dereferencing
    potentially invalid handles on the error path.
    Fix changeResidentDevice to hold a shared lock on the context Mutex
    while iterating usmPoolHandles, preventing a data race with
    concurrent addUsmPool/removeUsmPool calls.
    Fix a data race in getDevicesWhichCanAccessAllocationsPresentOn where
    peerCandidateDevice->peers.size() was read in UR_FASSERT before the
    scoped_lock was acquired; the lock is now moved before the assert,
    consistent with getDevicesWhoseAllocationsCanBeAccessedFrom.
  • v2/context.hpp: update the constructor signature to take
    v2::raii::ze_context_handle_t&& instead of (ze_context_handle_t, bool).
  • v2/memory.cpp: update getP2PDevices() call to
    getDevicesWhoseAllocationsCanBeAccessedFrom(); enhance the UR_LOG(WARN)
    when P2P is not accessible to include the requesting device and active
    allocation device pointers for easier diagnosis.
  • v2/usm.cpp: build the resident-devices list from the source device
    plus only the explicitly enabled peer devices (via
    getDevicesWhichCanAccessAllocationsPresentOn), instead of passing
    all platform devices to umfLevelZeroMemoryProviderParamsSetResidentDevices.
    The previous code computed the enabled-peer set but discarded it,
    making memory resident on all devices unconditionally.
    Use native ZeDevice handle comparison (desc.hDevice->ZeDevice ==
    hDevice->ZeDevice) instead of pointer equality when searching pools in
    changeResidentDevice, correctly matching pools keyed by any handle
    alias for the same physical device (e.g. root vs sub-device handles).
  • v2/usm_p2p.cpp: new file implementing urUsmP2PEnablePeerAccessExp,
    urUsmP2PDisablePeerAccessExp, and urUsmP2PPeerAccessGetInfoExp for
    the v2 adapter. Fix a data race where peers[] was written under a
    shared (read) lock; changed to an exclusive scoped_lock.
    Copy Platform->Contexts under ContextsMutex and iterate outside the
    critical section to avoid holding the mutex during heavy
    changeResidentDevice calls, reducing deadlock risk.
    Move validateP2PDevicePair call to the top of
    urUsmP2PPeerAccessGetInfoExp (before the switch) so it validates
    both UR_PEER_ACCESS_SUPPORT and UR_PEER_ATOMICS_SUPPORT cases
    uniformly.
    Add null checks for both handles in validateP2PDevicePair, returning
    UR_RESULT_ERROR_INVALID_NULL_HANDLE as required by the spec; move the
    UR_LOG call in urUsmP2PChangePeerAccessExp to after
    validateP2PDevicePair to avoid logging unvalidated pointer values.
  • CMakeLists.txt: move usm_p2p.cpp from the v1 build into v2/usm_p2p.cpp
    in the v2 build so the v2 adapter uses its own P2P implementation.
  • usm_p2p.cpp (v1): add diagnostic log messages to
    urUsmP2PEnablePeerAccessExp and urUsmP2PDisablePeerAccessExp
    clarifying that the operations are no-ops in v1 because P2P is
    always enabled there; use (void*) casts for pointer arguments in
    UR_LOG calls to avoid dereferencing potentially null handles.
  • sycl/source/device.cpp: add a cross-platform check in
    ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access and
    throw sycl::exception(errc::invalid) when the two devices belong to
    different platforms. Fix exception messages: "Can not" -> "Cannot".
  • ur_pool_manager.hpp: propagate resident-devices information through
    pool descriptors.
  • backtrace.hpp / backtrace_lin.cpp / backtrace_win.cpp: replace
    #define MAX_BACKTRACE_FRAMES in backtrace.hpp with a static constexpr
    int; remove the now-redundant redeclaration from backtrace_lin.cpp,
    the Windows equivalent from backtrace_win.cpp, and the duplicate
    definition from ur_leak_check.hpp. This avoids macro leakage and
    improves type safety.

Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk AT intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau AT intel.com>

@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 7e50ca5 to d0b4788 Compare April 28, 2026 07:23
@ldorau ldorau requested a review from Copilot April 28, 2026 07:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements peer-access–driven memory residency management for the Level Zero v2 adapter, wiring ext_oneapi_enable_peer_access/disable through UR to update USM pool residency, and adjusting SYCL’s peer-access API to avoid cross-platform usage.

Changes:

  • Add L0 v2 peer-access implementation that toggles per-device peer state and propagates residency updates to all tracked contexts.
  • Extend USM pool/provider plumbing to support runtime resident-device changes and add pool-manager iteration helpers.
  • Update SYCL peer-access enable/disable to validate platforms; add initial (currently placeholder) UR tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Adds multi-device residency tests for peer-access (currently placeholders).
unified-runtime/source/common/ur_pool_manager.hpp Adds descriptor helpers and pool-manager iteration with descriptor access.
unified-runtime/source/common/backtrace_lin.cpp Introduces a constant for max backtrace frames.
unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp New L0 v2 implementation for peer access enable/disable/info and context propagation.
unified-runtime/source/adapters/level_zero/v2/usm.hpp Exposes USM pool API to change resident devices.
unified-runtime/source/adapters/level_zero/v2/usm.cpp Updates provider creation to use peer-enabled residency model and adds residency-change plumbing.
unified-runtime/source/adapters/level_zero/v2/memory.cpp Switches P2P eligibility check to the new “enabled peers” model.
unified-runtime/source/adapters/level_zero/v2/context.hpp Adds APIs to query enabled peer relationships and to propagate residency changes.
unified-runtime/source/adapters/level_zero/v2/context.cpp Removes precomputed P2P tables; tracks contexts; adds peer-access query helpers and residency propagation.
unified-runtime/source/adapters/level_zero/usm_p2p.cpp Updates v1 behavior to log enable/disable as ignored (always enabled).
unified-runtime/source/adapters/level_zero/platform.hpp Updates platform comment to reflect v2 peer-access usage of tracked contexts.
unified-runtime/source/adapters/level_zero/platform.cpp Initializes per-device peer tables based on L0 P2P capability/properties.
unified-runtime/source/adapters/level_zero/device.hpp Adds peer-status table to devices and stream operators.
unified-runtime/source/adapters/level_zero/device.cpp Implements stream operators for device id and peer status.
unified-runtime/source/adapters/level_zero/context.cpp Minor comment adjustment around context tracking in v1.
unified-runtime/source/adapters/level_zero/CMakeLists.txt Moves usm_p2p.cpp into the v2 adapter build.
sycl/source/device.cpp Adds same-platform validation for enable/disable peer access calls.
.github/copilot-instructions.md Expands repository instructions/documentation for Copilot usage.

Comment thread sycl/source/device.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/common/backtrace_lin.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from d0b4788 to 24eaab3 Compare April 28, 2026 07:53
@ldorau ldorau requested a review from Copilot April 28, 2026 07:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/common/ur_pool_manager.hpp
Comment thread unified-runtime/source/adapters/level_zero/usm_p2p.cpp
Comment thread sycl/source/device.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch 2 times, most recently from 174e3f1 to e2e57bf Compare April 28, 2026 09:58
@ldorau ldorau requested a review from Copilot April 28, 2026 10:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/platform.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/device.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from e2e57bf to 8346b35 Compare April 28, 2026 12:47
@ldorau ldorau requested a review from Copilot April 28, 2026 12:49
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 8346b35 to 9e50ee8 Compare April 28, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/common/backtrace.hpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 9e50ee8 to 65c706f Compare April 28, 2026 13:34
@ldorau ldorau requested a review from Copilot April 28, 2026 13:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
Comment thread unified-runtime/source/adapters/level_zero/v2/usm_p2p.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 65c706f to a2b54f4 Compare April 28, 2026 14:24
@ldorau ldorau requested a review from Copilot April 28, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from a2b54f4 to 08a08ab Compare April 28, 2026 14:50
@ldorau ldorau requested a review from Copilot April 28, 2026 14:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 9eff3df to 8709bd0 Compare April 29, 2026 10:04
@ldorau ldorau requested review from Copilot and lslusarczyk April 29, 2026 10:05
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 29, 2026

Please review @lslusarczyk - this is a continuation of #19257

@ldorau ldorau changed the title [DRAFT] [UR][L0] Restrict USM residency to peers with enabled P2P access [UR][L0] Restrict USM residency to peers with enabled P2P access Apr 29, 2026
@ldorau ldorau marked this pull request as ready for review April 29, 2026 10:07
@ldorau ldorau requested review from a team as code owners April 29, 2026 10:07
@ldorau ldorau requested a review from againull April 29, 2026 10:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Comment thread unified-runtime/source/adapters/level_zero/v2/usm.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/memory.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 8709bd0 to 4b075c4 Compare April 29, 2026 11:50
@ldorau ldorau requested a review from Copilot April 29, 2026 11:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 4b075c4 to 1c04c0f Compare April 29, 2026 12:07
@ldorau ldorau requested a review from Copilot April 29, 2026 12:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp Outdated
Comment thread unified-runtime/source/adapters/level_zero/v2/context.cpp Outdated
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from 1c04c0f to b3434ed Compare April 29, 2026 12:23
@ldorau ldorau requested a review from Copilot April 29, 2026 12:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/test/adapters/level_zero/v2/memory_residency.cpp
@ldorau ldorau force-pushed the URL0_Memory_resident_limit_to_enabled_peers branch from b3434ed to efb4316 Compare April 29, 2026 13:23
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented Apr 29, 2026

Please review @intel/llvm-reviewers-runtime @intel/unified-runtime-reviewers @intel/unified-runtime-reviewers-level-zero

ldorau added 2 commits April 29, 2026 14:23
Previously, USM memory was made resident on all peer devices regardless
of whether peer access had been explicitly enabled by the user. This
change limits memory residency to only those peer devices for which
peer access has been enabled via urUsmP2PEnablePeerAccessExp.

To support this, a per-device `peers` vector is introduced
(ur_device_handle_t_::PeerStatus) that tracks the peer-access state
(ENABLED, DISABLED, NO_CONNECTION) for every device on the platform.
The vector is populated at platform initialisation time using
zeDeviceGetP2PProperties and updated atomically when
urUsmP2PEnablePeerAccessExp / urUsmP2PDisablePeerAccessExp are called.

Key changes:
- device.hpp / device.cpp: introduce the PeerStatus enum
  (ENABLED, DISABLED, NO_CONNECTION) and peers vector on
  ur_device_handle_t_; add stream operators for logging device id and
  PeerStatus values.
- platform.cpp: populate `peers` for every device during device-cache
  initialisation.
- v2/context.cpp: add changeResidentDevice(),
  getDevicesWhoseAllocationsCanBeAccessedFrom(), and
  getDevicesWhichCanAccessAllocationsPresentOn() to update memory
  residency in existing USM pools when peer-access state changes.
  Fix urContextRelease to only remove the context from
  Platform->Contexts and delete it when the reference count actually
  reaches zero, preventing live contexts from being missed during
  subsequent peer-access updates.
  Track contexts created via urContextCreateWithNativeHandle in
  Platform->Contexts so that peer-access updates propagate to them.
  Add an explicit numDevices > 0 guard in urContextCreate and
  urContextCreateWithNativeHandle, returning
  UR_RESULT_ERROR_INVALID_VALUE for zero-device contexts; the L0 v2
  adapter requires at least one device to initialise platform-level
  caches from phDevices[0]->Platform.
  Guard urContextRelease's Platform->Contexts removal with a
  !getDevices().empty() check to avoid UB (dereferencing hDevices[0])
  for contexts with no devices.
  Fix a potential double-free of the zeContext handle in urContextCreate:
  the constructor now takes v2::raii::ze_context_handle_t&& so the raw
  handle is wrapped in RAII immediately after zeContextCreate and moved
  into the constructor; if the constructor throws after the hContext
  member is initialised, the moved-from local RAII wrapper is empty and
  no double-free occurs.
  Fix UR_LOG format strings in changeResidentDevice: replace
  printf-style %p/%d with {} placeholders (fmtlib-style required by the
  logger); use (void*) casts for pointer arguments to avoid dereferencing
  potentially invalid handles on the error path.
  Fix changeResidentDevice to hold a shared lock on the context Mutex
  while iterating usmPoolHandles, preventing a data race with
  concurrent addUsmPool/removeUsmPool calls.
  Fix a data race in getDevicesWhichCanAccessAllocationsPresentOn where
  peerCandidateDevice->peers.size() was read in UR_FASSERT before the
  scoped_lock was acquired; the lock is now moved before the assert,
  consistent with getDevicesWhoseAllocationsCanBeAccessedFrom.
- v2/context.hpp: update the constructor signature to take
  v2::raii::ze_context_handle_t&& instead of (ze_context_handle_t, bool).
- v2/memory.cpp: update getP2PDevices() call to
  getDevicesWhoseAllocationsCanBeAccessedFrom(); enhance the UR_LOG(WARN)
  when P2P is not accessible to include the requesting device and active
  allocation device pointers for easier diagnosis.
- v2/usm.cpp: build the resident-devices list from the source device
  plus only the explicitly enabled peer devices (via
  getDevicesWhichCanAccessAllocationsPresentOn), instead of passing
  all platform devices to umfLevelZeroMemoryProviderParamsSetResidentDevices.
  The previous code computed the enabled-peer set but discarded it,
  making memory resident on all devices unconditionally.
  Use native ZeDevice handle comparison (desc.hDevice->ZeDevice ==
  hDevice->ZeDevice) instead of pointer equality when searching pools in
  changeResidentDevice, correctly matching pools keyed by any handle
  alias for the same physical device (e.g. root vs sub-device handles).
- v2/usm_p2p.cpp: new file implementing urUsmP2PEnablePeerAccessExp,
  urUsmP2PDisablePeerAccessExp, and urUsmP2PPeerAccessGetInfoExp for
  the v2 adapter. Fix a data race where peers[] was written under a
  shared (read) lock; changed to an exclusive scoped_lock.
  Copy Platform->Contexts under ContextsMutex and iterate outside the
  critical section to avoid holding the mutex during heavy
  changeResidentDevice calls, reducing deadlock risk.
  Move validateP2PDevicePair call to the top of
  urUsmP2PPeerAccessGetInfoExp (before the switch) so it validates
  both UR_PEER_ACCESS_SUPPORT and UR_PEER_ATOMICS_SUPPORT cases
  uniformly.
  Add null checks for both handles in validateP2PDevicePair, returning
  UR_RESULT_ERROR_INVALID_NULL_HANDLE as required by the spec; move the
  UR_LOG call in urUsmP2PChangePeerAccessExp to after
  validateP2PDevicePair to avoid logging unvalidated pointer values.
- CMakeLists.txt: move usm_p2p.cpp from the v1 build into v2/usm_p2p.cpp
  in the v2 build so the v2 adapter uses its own P2P implementation.
- usm_p2p.cpp (v1): add diagnostic log messages to
  urUsmP2PEnablePeerAccessExp and urUsmP2PDisablePeerAccessExp
  clarifying that the operations are no-ops in v1 because P2P is
  always enabled there; use (void*) casts for pointer arguments in
  UR_LOG calls to avoid dereferencing potentially null handles.
- sycl/source/device.cpp: add a cross-platform check in
  ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access and
  throw sycl::exception(errc::invalid) when the two devices belong to
  different platforms. Fix exception messages: "Can not" -> "Cannot".
- ur_pool_manager.hpp: propagate resident-devices information through
  pool descriptors.
- backtrace.hpp / backtrace_lin.cpp / backtrace_win.cpp: replace
  #define MAX_BACKTRACE_FRAMES in backtrace.hpp with a static constexpr
  int; remove the now-redundant redeclaration from backtrace_lin.cpp,
  the Windows equivalent from backtrace_win.cpp, and the duplicate
  definition from ur_leak_check.hpp. This avoids macro leakage and
  improves type safety.

Co-authored-by: Łukasz Ślusarczyk <lukasz.slusarczyk@intel.com>
Co-authored-by: Lukasz Dorau <lukasz.dorau@intel.com>

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Fill in the three placeholder multi-device tests in
memory_residency.cpp to verify the peer-access-driven residency
behaviour introduced by the parent commit.

allocationInitiallyAbsentOnPeer
  Allocates USM memory on devices[0] without calling
  urUsmP2PEnablePeerAccessExp first and asserts that the allocation is
  NOT made resident on devices[1]. Verified by checking that free memory
  on devices[0] (source) decreases by at least allocSize, while free
  memory on devices[1] (peer) does not decrease by a full allocSize.

enablePeerAccessStateMachineAndSourceAllocation
  Enables peer access from devices[1] to devices[0] (so devices[1] can
  access allocations on devices[0], causing new allocations on
  devices[0] to be made resident on devices[1] too) and verifies that a
  second enable attempt returns UR_RESULT_ERROR_INVALID_OPERATION (state
  machine check). Allocates USM memory on devices[0] with P2P enabled
  and asserts that source-device free memory decreases by at least
  allocSize (the allocation succeeded and memory is on devices[0]).
  Also verifies end-to-end P2P data transfer: fills the allocation with
  a known pattern, copies it from devices[0] to devices[1] using
  devices[1]'s command queue (exercising the enabled P2P path), reads
  the result back to host memory, and asserts that all bytes match the
  fill pattern.
  Disables peer access and frees the allocation on exit.
  Note: peer-device free memory is not checked because
  UR_DEVICE_INFO_GLOBAL_MEM_FREE does not reliably reflect
  zeContextMakeMemoryResident behaviour for device USM allocations.

disablePeerAccessStateMachineAndSourceAllocationPersists
  Enables peer access from devices[1] to devices[0] and allocates USM
  memory on devices[0]. Disables peer access (the runtime evicts the
  allocation from devices[1]) and verifies that a second disable attempt
  returns UR_RESULT_ERROR_INVALID_OPERATION (state machine check).
  Asserts that source-device free memory still shows the allocation is
  present. Frees the allocation on exit.
  Note: peer-device eviction is not checked via free memory for the
  same reason as above (unreliable for device USM).

All three tests:
- Skip when either device is not PVC (UR_DEVICE_INFO_GLOBAL_MEM_FREE
  is only reliably accurate on PVC).
- Skip when no hardware P2P connection exists in the direction used by
  the tests (devices[1] accessing devices[0] allocations), querying
  UR_EXP_PEER_INFO_UR_PEER_ACCESS_SUPPORT for (devices[1], devices[0]).
- Use uint64_t (as required by the UR spec) for
  UR_DEVICE_INFO_GLOBAL_MEM_FREE query values.
- Use the peerAccessEnabled flag in the fixture so that TearDown can
  disable peer access even if a test assertion fails mid-way, keeping
  subsequent tests in a clean state.

Also fixes the pre-existing allocatingDeviceMemoryWillResultInOOM test
to use uint64_t (instead of size_t) for UR_DEVICE_INFO_GLOBAL_MEM_FREE
queries, consistent with the UR spec.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Copy link
Copy Markdown
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl/* changes LGTM.

return exceptionToResult(std::current_exception());
ur_context_handle_t *phContext) {
*phContext = nullptr;
if (deviceCount == 0 || phDevices == nullptr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the issue I've mentioned here.

ur_context_handle_t_::ur_context_handle_t_(v2::raii::ze_context_handle_t zeCtx,
uint32_t numDevices,
const ur_device_handle_t *phDevices,
bool ownZeContext)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ownZeContext might still be needed because of #21789.

*phContext = nullptr;
// The L0 v2 adapter requires at least one device: the constructor
// initialises platform-level caches using phDevices[0]->Platform.
if (numDevices == 0 || phDevices == nullptr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the same comment as for urContextCreate applies here too.

// Ref count dropped to zero - remove from the tracked list before destroying.
// Only do this when the context has devices, since getPlatform() relies on
// the first device and would otherwise dereference an empty device list.
if (!hContext->getDevices().empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are checking for non-zero device count during context creation in urContextCreate/urContextCreateWithNativeHandle, this might be unnecessary.

@@ -0,0 +1,130 @@
//===----------- usm_p2p.cpp - L0 Adapter ---------------------------------===//
//
// Copyright (C) 2023 Intel Corporation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong date, and the copyright clause was changed recently.
eg.

//===--------- usm.cpp - Level Zero Adapter ------------------------------===//
//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM
// Exceptions. See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

// cache), and peerDevice's id must be a valid index into commandDevice->peers.
static ur_result_t validateP2PDevicePair(ur_device_handle_t commandDevice,
ur_device_handle_t peerDevice) {
if (!commandDevice || !peerDevice) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commandDevice and peerDevice handle null-checks are already performed at validation layer for urUsmP2P... functions.

@pbalcer should adapter code always assume that parameter handles are valid, or are these checks still necessary given that the validation layer is optional?


// L0 has peer devices enabled by default
UR_LOG(INFO,
"user enables peer access from {} to memory of {}, ignored, in V1 P2P "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider something more descriptive, eg. ignoring P2P enable request from {} to memory of {}, P2P is enabled by default for the Level Zero V1 adapter.

if (!uur::isPVC(devices[0]) || !uur::isPVC(devices[1])) {
GTEST_SKIP() << "Test requires PVC devices";
}
if (!hasHardwareP2PSupport()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that all the urMemoryMultiResidencyTest testcases check for hasHardwareP2PSupport, perhaps this check should be moved to the fixture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants