Skip to content

Commit 5ea16f3

Browse files
[UR][L0] Fix event refcount bugs in L0 v1 adapter barrier handling
Fixes three issues in the Level Zero v1 adapter that caused urEventWait to be called for an internal event crashes and segfaults when mixing multiple in-order queues with ext_oneapi_submit_barrier. 1. Fix insertBarrierIntoCmdList passing InterruptBasedEventsEnabled as IsMultiDevice to createEventAndAssociateQueue. Barrier events do not need multi-device visibility; pass false instead. 2. In urEventWait, replace die() with continue when hasExternalRefs() is false. A recycled event (RefCountExternal == 0) was already completed before recycling, so skipping the wait is safe. 3. In urEventRelease, use CleanupCompletedEvent (which is a no-op when CleanedUp is true) instead of calling urEventReleaseInternal directly. The old code double-released the internal refcount when CleanupEventListFromResetCmdList had already cleaned the event. UR_L0_SERIALIZE=2 masked the race by forcing synchronous execution. UR_L0_DISABLE_EVENTS_CACHING=1 turned the recycling into a delete, escalating the bug from a stale-data read to a segfault. Fixes: #21704 Signed-off-by: Zhang, Winston <winston.zhang@intel.com>
1 parent fa5e54d commit 5ea16f3

1 file changed

Lines changed: 8 additions & 11 deletions

File tree

  • unified-runtime/source/adapters/level_zero

unified-runtime/source/adapters/level_zero/event.cpp

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -198,9 +198,10 @@ ur_result_t urEnqueueEventsWaitWithBarrierExt(
198198
[&Queue](ur_command_list_ptr_t CmdList, ur_ze_event_list_t &EventWaitList,
199199
ur_event_handle_t &Event, bool IsInternal,
200200
bool InterruptBasedEventsEnabled) {
201+
(void)InterruptBasedEventsEnabled;
201202
UR_CALL(createEventAndAssociateQueue(
202203
Queue, &Event, UR_COMMAND_EVENTS_WAIT_WITH_BARRIER, CmdList,
203-
IsInternal, InterruptBasedEventsEnabled));
204+
IsInternal, /* IsMultiDevice */ false));
204205

205206
Event->WaitList = EventWaitList;
206207

@@ -795,7 +796,7 @@ urEventWait(uint32_t NumEvents,
795796
//
796797
ur_event_handle_t_ *Event = ur_cast<ur_event_handle_t_ *>(e);
797798
if (!Event->hasExternalRefs())
798-
die("urEventWait must not be called for an internal event");
799+
continue;
799800

800801
ze_event_handle_t ZeHostVisibleEvent;
801802
if (auto Res = Event->getOrCreateHostVisibleEvent(ZeHostVisibleEvent))
@@ -821,7 +822,7 @@ urEventWait(uint32_t NumEvents,
821822
{
822823
std::shared_lock<ur_shared_mutex> EventLock(Event->Mutex);
823824
if (!Event->hasExternalRefs())
824-
die("urEventWait must not be called for an internal event");
825+
continue;
825826

826827
if (!Event->Completed) {
827828
auto HostVisibleEvent = Event->HostVisibleEvent;
@@ -893,15 +894,11 @@ urEventRelease(/** [in] handle of the event object */ ur_event_handle_t Event) {
893894
UR_CALL(urEventReleaseInternal(Event, &isEventDeleted));
894895
// If this is a Completed Event Wait Out Event, then we need to cleanup the
895896
// event at user release and not at the time of completion.
896-
// If the event is labelled as completed and no additional references are
897-
// removed, then we still need to decrement the event, but not mark as
898-
// completed.
897+
// Use CleanupCompletedEvent which is a no-op if the event was already
898+
// cleaned up (e.g. by CleanupEventListFromResetCmdList), preventing a
899+
// double-release of the internal reference count.
899900
if (isEventsWaitCompleted & !isEventDeleted) {
900-
if (Event->CleanedUp) {
901-
UR_CALL(urEventReleaseInternal(Event));
902-
} else {
903-
UR_CALL(CleanupCompletedEvent((Event), false, false));
904-
}
901+
UR_CALL(CleanupCompletedEvent((Event), false, false));
905902
}
906903

907904
return UR_RESULT_SUCCESS;

0 commit comments

Comments
 (0)