Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW18 2026)#21908

Draft
iclsrc wants to merge 1966 commits intosyclfrom
llvmspirv_pulldown
Draft

LLVM and SPIRV-LLVM-Translator pulldown (WW18 2026)#21908
iclsrc wants to merge 1966 commits intosyclfrom
llvmspirv_pulldown

Conversation

@iclsrc
Copy link
Copy Markdown
Collaborator

@iclsrc iclsrc commented Apr 30, 2026

huntergr-arm and others added 30 commits April 14, 2026 14:47
…plitting (#191417)

If we need to split the memory operation, we'll also need to split the
mask.

This has a performance benefit in some cases when the loop vectorizer is
asked to maximize bandwidth and ends up choosing a VF that's too high
when tail folding. The costs of splitting the masks are not accounted
for in the current model, so this is something of a brute-force approach
to avoiding the wider VFs.
This patch adds the C1-Ultra scheduling model. This model is largely
based on the Neoverse V3 scheduling model with appropriate changes made
based on information available in the software optimization guide for
this core.

https://developer.arm.com/documentation/111079/3-0
Sort the std::set ProcItinList by Record name, not the pointer address.

---------

Co-authored-by: Bao, Qiaojin (Fred) <Qiaojin.Bao@amd.com>
On s390x, the changes to `control_tool.c` cause a different return
address to be returned from the call to `print_current_address(0)`. Due
to the strictness of the current address returned by this call, this
lead to a test failure.

Since the return values of `omp_control_tool` are checked in separate
tests already, revert the changes to ensure that the return address
stays at the expected value.

Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…egal vector types during vector op legalization. (#190914)

This code needs to create a step vector but we only have a mask vector
type. If the step vector is too large it might not be an MVT. This
causes the getSimpleVT() call for getTypeAction to fail. We can replace
that with the EVT version of getTypeAction, but we'll still fail trying
to legalize the vselect. The getOperationAction query will return Expand
for non-simple VTs. ExpandVSELECT will try to unroll the vselect which
will fail for scalable vectors. We could hack that to not unroll
scalable vectors, but that would be a hack.

To fix this, split the FIND_LAST_ACTIVE into two if the step vector
needs to be split. Those will recursively legalize and eventually arrive
at a size we can create a valid step vector for.

One existing test changes because it created an illegal type which
happened to still be an MVT. This allowed getOperationAction to return
Legal, even though the type isn't legal.

Fixes the assertion mentioned in #187458.

Assisted-by: Claude Sonnet 4.5
It already got inconsistent because new changes require complying with
clang-format on CI, while everything old is not complying with it.
…part 44) (#191926)

Tests converted from test/Lower/Intrinsics: verify.f90
Tests converted from test/Lower: io-char-array.f90,
io-implied-do-fixes.f90, io-item-list.f90, io-statement-1.f90
…z%re, z%im) (#191846)

fir.slice with a path component (z%re, z%im) was silently dropped by
FIRToMemRef. Since memref.reinterpret_cast cannot change element type,
layout must come from the projected box descriptor via
fir.box_dims/fir.box_elesize rather than the triplets. Only
complex-array projections are handled here —
sizeof(complex<T>)/sizeof(T) = 2 is always exact for divsi. Derived-type
component projections bail out to downstream FIR-to-LLVM lowering where
strides can be non-integer.
…#190965)

We were bailing out from checking calls expressions in a dependent
context, but if the expression itself was not dependent it's never
checked again.

Fixes #135694
…ws (#176276)

Windows Control Flow Guard (CFG) has two different "mechanisms" or
"patterns":
* Dispatch: the caller calls into the CFG function, which both checks
the target callee and then calls it.
* Check: the caller calls the CFG function which only checks the target
callee and then must separately call the callee.

LLVM has followed MSVC's pattern for selecting the mechanism based on
the target architecture. These defaults in MSVC are based on tests for
performance: Dispatch produces a smaller code size, whereas Check is
more friendly to branch predictors.

It is possible, however, for a given workload, call pattern or target
CPU that someone may want to select a different mechanism to use for
their code.

This change adds a new Clang and CC1 flag to force a CFG mechanism:
`-fwin-cfg-mechanism`. This can be set to `automatic` (lets LLVM choose
a mechanism), `force-dispatch` or `force-check`.

Also adds the support for the equivalent MSVC flag
`/d2guardcfgdispatch`.

NOTE: Arm64EC only supports the check mechanism. It should be noted that
MSVC emits the "dispatch" name for the call checker (for legacy reasons)
but uses the check mechanism.
This patch enables NEON to generate more efficient `cttz` intrinsics by
utilising `rbit` and `ctlz` instructions when they are legal.

# Alive Proof
https://alive2.llvm.org/ce/z/qgrT_7
```
define <8 x i8> @src_v8i8(<8 x i8> %a) {
#0:
  %r = cttz <8 x i8> %a, 1
  ret <8 x i8> %r
}
=>
define <8 x i8> @tgt_v8i8(<8 x i8> %a) {
#0:
  %rbit = bitreverse <8 x i8> %a
  %clz = ctlz <8 x i8> %rbit, 0
  ret <8 x i8> %clz
}
Transformation seems to be correct!


----------------------------------------
define <16 x i8> @src_v16i8(<16 x i8> %a) {
#0:
  %r = cttz <16 x i8> %a, 1
  ret <16 x i8> %r
}
=>
define <16 x i8> @tgt_v16i8(<16 x i8> %a) {
#0:
  %rbit = bitreverse <16 x i8> %a
  %clz = ctlz <16 x i8> %rbit, 0
  ret <16 x i8> %clz
}
Transformation seems to be correct!
```
…able callees (#189244)

The `MLProgramPipelineGlobals` pass crashed with a null pointer dereference
when a `CallOpInterface` operation referred to a callee symbol that could not
be resolved in the IR (e.g. an external function defined outside the module).

Instead  conservatively bail out when a callee symbol cannot be resolved, 
causing the pass to (preserving all loads/stores). This is consistent with
how Value-based callees are handled.

Fixes #109649

Assisted-by: Claude Code
This header assumed these had been imported
Avoid the confusing `Runtime unrolling with count: 0` `LLVM_DEBUG`
statement.
Add an InstCombine fold for masked overwrite patterns where the add
constant matches the cleared bits in the mask:

  (X + C) + (Y & ~C) -> X + (Y | C)

Since `Y & ~C` clears all bits set in C, adding C cannot generate carry
through those bits and is equivalent to setting them with `or`.

Proof: https://alive2.llvm.org/ce/z/277UFK
Fixed: llvm/llvm-project#191171
  CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
When -fclangir is passed and the input is LLVM IR (e.g. during the
backend phase of OpenMP offloading), the CIR frontend pipeline is not
applicable.


Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Adds support for the $arch-unknown-serenity target to the Clang front
end. This makes the compiler look for libraries and headers in the right
places, and enables some security mitigations like stack-smashing
protection and position-independent code by default.

----

A first attempt at upstreaming this patch was made
[here](https://reviews.llvm.org/D154396). I hope I fixed everything
mentioned there.

I intentionally kept `/usr/local/` in the default lookup path. I
consider it the more practical option, and I’d prefer to have the patch
merged as is and revisit the FIXME later. If this is absolutely
unacceptable to the maintainers, I will happily drop it and keep it as a
local patch until we address the underlying issue.

@MaskRay, @brad0 as you already reviewed the original patch.

---------

Co-authored-by: Daniel Bertalan <dani@danielbertalan.dev>
Co-authored-by: kleines Filmröllchen <filmroellchen@serenityos.org>
Co-authored-by: Andrew Kaster <akaster@serenityos.org>
Co-authored-by: Dan Klishch <danilklishch@gmail.com>
Included __llvm-libc-common.h in __futex_word.h to fix a build failure
with GCC.

GCC in C++ mode does not recognize _Alignas without the mapping to
alignas provided in __llvm-libc-common.h.

The failure was introduced in commit 91c0fdf.
…on (#182592)

Preparation change before implementing stride-multiversioning as a
VPlan-based transformation. Might help
llvm/llvm-project#147297 as well.
Without the fix, bytecode serialization roundtrip breaks for types that
don't have custom bytecode serializers and contain quant types, since
the fallback mechanism prints the type and the quant printer coerces
signed to signless types. E.g. `!custom<!quant.uniform<ui8:f32, 0.1>>`
will print as `u8` when serializing and later be created as a signless
`i8` when deserializing.
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGLoopInfo.cpp
…090)

Fixes buildbot report
(https://lab.llvm.org/buildbot/#/builders/66/builds/29379):


/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_bitvector_test.cpp:64:29:
error: format specifies type 'unsigned long' but the argument has type
'uptr' (aka 'unsigned int') [-Werror,-Wformat]
   64 |     fprintf(stderr, "%lu ", idx);
      |                      ~~~    ^~~
      |                      %u
RecordType::getTypeSizeInBits for unions was calling
dataLayout.getTypeSize (which returns bytes) instead of
dataLayout.getTypeSizeInBits.  This returned a value 8x too
small.  Also handle the empty-union case where
getLargestMember returns nullptr.
The memchr LLVM declaration created by MemChrOp lowering had no
arg_attrs, so the lowered IR was missing `noundef` on all three
parameters.  OGCG emits `noundef` on them.

Adds `noundef` to both the `@memchr` declaration and each
`call @memchr` instruction.

Made with [Cursor](https://cursor.com)
…ro (#192038)

This mostly replaces `"0x%" PRIx64` with `"{:x}"`, but also replaces
'%d' (used for register / scheme numbers and CFA offsets) and '%s' with
simple `{}`, removing the now redundant casts and calls to
`GetCString()` / `AsCString()`.

`UnwindLogMsg()` is no longer used and has been removed.
The e32alt and e64alt encodings for vtype are reserved.

Non-fp instructions ignore altfmt and we want to use that to avoid
vtype toggle when using load, store, slide, gather, etc. to manipulate
bf16 vectors. This is why we have a Demanded bit for AltFmt.

We need to make sure we don't keep the AltFmt set when we're changing
SEW to 32 or 64.

A new isValidVType function has been added to help catch illegal
vtype earlier.
jsji and others added 11 commits April 30, 2026 05:28
CONFLICT (content): Merge conflict in libclc/CMakeLists.txt
CONFLICT (content): Merge conflict in libclc/cmake/modules/AddLibclc.cmake
CONFLICT (content): Merge conflict in libclc/test/CMakeLists.txt
)

Some spirv-val invocations are failing with newer SPIRV-Tools
(`e4bceacf`) for PtrCastToGeneric OpSpecConstantOps with the error:

    Expected input and Result Type to point to the same type

Disable the spirv-val steps temporarily.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@7de094bda017e2a
Update for llvm-project commit llvm/llvm-project@193d7a6ace9f
("[MC,CodeGen] Update .prefalign for symbol-based preferred alignment
(#184032)", 2026-04-11).

Temporarily match both the old and new patterns, to ensure the test
also passes with outdated LLVM apt binaries on our CI.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@f417c2db8409dfb
…puts (#3675)

Packed Int4/FP4 conversion builtins only worked correctly when the
packed container was `i8` or `i32`.
Using `i16` or `i64` caused an `"Invalid floating point encoding"`
assertion during forward translation.
Using a vector of integers (e.g. `<2 x i8>`) as the packed container
produced a size-mismatched bitcast that crashed the round-trip
translation.

Extend support to all integer widths (8, 16, 32, 64 bits) and
vector-of-integer packed containers for both Int4 and FP4 conversions in
both directions.

Extended `SPV_INTEL_float4/conversions_packed.ll` and
`SPV_INTEL_int4/conversions_packed.ll` with `i16`, `i64`, and `<2 x i8>`
packed container cases in both directions.

AI-assisted: Claude Sonnet 4.6 (commercial SaaS)

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@31ba7d132319efc
This patch should be reverted in the future as it supports translation
of invalid SPIR-V modules.

This is forward-porting from:
KhronosGroup/SPIRV-LLVM-Translator#3476

This is workaround related to:
f34e5458aa63

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@daa7d18d7e42615
`OpSubgroupBlockPrefetchINTEL` instruction prefetches a contiguous byte
block from CrossWorkgroup memory per subgroup invocation, with an
optional Memory Operands bitmask.

Spec:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_subgroup_buffer_prefetch.asciidoc

AI-assisted: Claude Sonnet 4.6 (commercial SaaS)

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@c8862dd465aba4a
llvm/llvm-project@d19e954b83 made `-use-constant-fp-for-fixed-length-splat` default to true.
This causes `ConstantFP` to represent fixed-length vector splats natively instead of using `ConstantDataVector`.

The translator only handled scalar `ConstantFP`, so vector-typed `ConstantFP` splats were silently mistranslated.
e.g., this LLVM IR call argument:

  `call <8 x i8> @convert(<8 x half> <half 0xH3C00, half 0xH3C00, ...>)`

produced a scalar `OpConstant` with a vector type:

  `%53 = OpConstant %v8half 15360    ; invalid: scalar op, vector type`

instead of the correct `OpConstantComposite`:
```
  %53 = OpConstant %half 15360
  %60 = OpConstantComposite %v8half %53 %53 %53 %53 %53 %53 %53 %53
```

Build an `OpConstantComposite` from repeated scalar references instead.

Existing tests that cover this:
- extensions/EXT/SPV_EXT_float8/conversions_scalar_vector.ll
- extensions/INTEL/SPV_INTEL_float4/conversions_packed.ll
- extensions/INTEL/SPV_INTEL_float4/conversions_scalar_vector.ll
- extensions/INTEL/SPV_INTEL_int4/conversions_packed.ll

AI-assisted: Claude Sonnet 4.6 (commercial SaaS)

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@d4d65d5ea9449ae
SPIR-V spec section 2.16.1 requires OpVariable instructions to be first
in the entry block of a function.
The translator was inserting `DebugFunctionDefinition` before the fix.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@39e94f8985301fb
Bare-string substitutions match as substrings and the replacement path
contains the tool name, causing corrupted RUN lines

Port of the original patch in LLVM SPIR-V backend:
llvm/llvm-project#192462

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@bd774ef4c90b7ad
@iclsrc iclsrc added the disable-lint Skip linter check step and proceed with build jobs label Apr 30, 2026
jsji and others added 7 commits April 30, 2026 18:22
Changed nullptr to false for the TrackInlineHistory bool parameter in
InlineFunction call. The previous code incorrectly passed nullptr to a
bool parameter, which requires direct-initialization and caused a
compilation error with -fpermissive.

Also corrected the parameter comment from /*ForwardVarArgsTo*/ to
/*TrackInlineHistory*/ to match the actual parameter name.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…36056)

Update addSimpleArrayInit to use InitializeMemberImplicit() instead of
InitializeMember() to properly indicate implicit member initialization.
This change is required after upstream commit 45ac2db refactored
InitializedEntity booleans into enums and split InitializeMember() into
separate functions for normal vs implicit initialization.

Without this fix, array kernel parameters fail with "array initializer
must be an initializer list" error because the initialization system no
longer recognizes this as an implicit initialization that allows array
copy operations.

Fixes: Clang::CodeGenSYCL/array-kernel-param.cpp test failure

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix sycl-cconv.cpp test that broke after commit 034d4dc
which changed SemaChecking to diagnose invalid non-dependent
calls in dependent contexts.

The commit now checks variadic function calls even in template
definitions, not just during instantiation. This means the printf
call on line 18 now emits an error both during template definition
and during instantiation, requiring 2 expected-error directives.

Fixes: CMPLRLLVM-74970

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix Clang::OpenMP/spirv_variant_match.cpp test that broke after commit
7d0bf88 which changed OpenMP variant arch matching to use
Triple::parseArch instead of getArchTypeForLLVMName.

Sync the tests to upstream version to fix the failures.

Fixes: CMPLRLLVM-74596

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This is reland of 8c2b0d4 tracked in
#12455

SourceLocExpr that may produce a function name are marked dependent so
that the non-instantiated
name of a function does not get evaluated.

In GH78128, the name('s size) is used as
template argument to a `DeclRef` that is not otherwise dependent, and
therefore cached and not transformed when the function is
instantiated, leading to 2 different values existing at the same time
for the same function.

Fixes #78128

Fixes: https://jira.devtools.intel.com/browse/CMPLRLLVM-74680

Co-authored-by: cor3ntin <corentinjabot@gmail.com>
@jsji
Copy link
Copy Markdown
Contributor

jsji commented Apr 30, 2026

@wenju-he Can you please help to restore the native_cpu support for libclc. Thanks!

…_native

The wrong native_cpu clang_triple was a bad merge conflict resolve
in 80e398e.
@wenju-he
Copy link
Copy Markdown
Contributor

wenju-he commented May 1, 2026

@wenju-he Can you please help to restore the native_cpu support for libclc. Thanks!

@jsji fixed in 21a7e47

@jsji
Copy link
Copy Markdown
Contributor

jsji commented May 1, 2026

@wenju-he Can you please help to restore the native_cpu support for libclc. Thanks!

@jsji fixed in 21a7e47

Thanks!

Following 121f5a9, this PR removes deprecated
`LLVM_ENABLE_RUNTIMES=libclc` build approach from SYCL toolchain.

For nvptx64-nvidia-cuda build, pass following options to cmake configure:
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libclc
    -DLLVM_RUNTIME_TARGETS="nvptx64-nvidia-cuda"
@wenju-he
Copy link
Copy Markdown
Contributor

wenju-he commented May 1, 2026

6adc223 (cherry-picked from #21843) fixes clang-cl.exe used in windows libclc build (due to 12f636d)

Test is faiilng w/ oneAPI build compiler

Testing:  0.. 10.. 20.. 30..  40.. 50..
FAIL: Clang :: Driver/dxc_spirv-val_missing.hlsl (15082 of 25696)
******************** TEST 'Clang :: Driver/dxc_spirv-val_missing.hlsl' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
env PATH="" d:\github\_work\llvm\llvm\build\bin\clang.exe --driver-mode=dxc -spirv -I test -Tlib_6_3 -Fo D:\github\_work\llvm\llvm\build\tools\clang\test\Driver\Output\dxc_spirv-val_missing.hlsl.tmp.spv -### D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl 2>&1 | d:\github\_work\llvm\llvm\build\bin\filecheck.exe D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl
 executed command: env PATH= 'd:\github\_work\llvm\llvm\build\bin\clang.exe' --driver-mode=dxc -spirv -I test -Tlib_6_3 -Fo 'D:\github\_work\llvm\llvm\build\tools\clang\test\Driver\Output\dxc_spirv-val_missing.hlsl.tmp.spv' '-###' 'D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl'
 note: command had no output on stdout or stderr
 error: command failed with exit status: 0xc0000135
@jsji
Copy link
Copy Markdown
Contributor

jsji commented May 1, 2026

This is ready for review:

@intel/dpcpp-nativecpu-reviewers

@intel/dpcpp-clang-driver-reviewers

@jsji jsji closed this May 1, 2026
@jsji jsji reopened this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.