LLVM and SPIRV-LLVM-Translator pulldown (WW18 2026)#21908
Draft
LLVM and SPIRV-LLVM-Translator pulldown (WW18 2026)#21908
Conversation
…plitting (#191417) If we need to split the memory operation, we'll also need to split the mask. This has a performance benefit in some cases when the loop vectorizer is asked to maximize bandwidth and ends up choosing a VF that's too high when tail folding. The costs of splitting the masks are not accounted for in the current model, so this is something of a brute-force approach to avoiding the wider VFs.
…ductions, NFC Reviewers: Pull Request: llvm/llvm-project#192072
This patch adds the C1-Ultra scheduling model. This model is largely based on the Neoverse V3 scheduling model with appropriate changes made based on information available in the software optimization guide for this core. https://developer.arm.com/documentation/111079/3-0
Sort the std::set ProcItinList by Record name, not the pointer address. --------- Co-authored-by: Bao, Qiaojin (Fred) <Qiaojin.Bao@amd.com>
On s390x, the changes to `control_tool.c` cause a different return address to be returned from the call to `print_current_address(0)`. Due to the strictness of the current address returned by this call, this lead to a test failure. Since the return values of `omp_control_tool` are checked in separate tests already, revert the changes to ensure that the return address stays at the expected value. Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
…egal vector types during vector op legalization. (#190914) This code needs to create a step vector but we only have a mask vector type. If the step vector is too large it might not be an MVT. This causes the getSimpleVT() call for getTypeAction to fail. We can replace that with the EVT version of getTypeAction, but we'll still fail trying to legalize the vselect. The getOperationAction query will return Expand for non-simple VTs. ExpandVSELECT will try to unroll the vselect which will fail for scalable vectors. We could hack that to not unroll scalable vectors, but that would be a hack. To fix this, split the FIND_LAST_ACTIVE into two if the step vector needs to be split. Those will recursively legalize and eventually arrive at a size we can create a valid step vector for. One existing test changes because it created an illegal type which happened to still be an MVT. This allowed getOperationAction to return Legal, even though the type isn't legal. Fixes the assertion mentioned in #187458. Assisted-by: Claude Sonnet 4.5
It already got inconsistent because new changes require complying with clang-format on CI, while everything old is not complying with it.
…part 44) (#191926) Tests converted from test/Lower/Intrinsics: verify.f90 Tests converted from test/Lower: io-char-array.f90, io-implied-do-fixes.f90, io-item-list.f90, io-statement-1.f90
…z%re, z%im) (#191846) fir.slice with a path component (z%re, z%im) was silently dropped by FIRToMemRef. Since memref.reinterpret_cast cannot change element type, layout must come from the projected box descriptor via fir.box_dims/fir.box_elesize rather than the triplets. Only complex-array projections are handled here — sizeof(complex<T>)/sizeof(T) = 2 is always exact for divsi. Derived-type component projections bail out to downstream FIR-to-LLVM lowering where strides can be non-integer.
…#190965) We were bailing out from checking calls expressions in a dependent context, but if the expression itself was not dependent it's never checked again. Fixes #135694
…ws (#176276) Windows Control Flow Guard (CFG) has two different "mechanisms" or "patterns": * Dispatch: the caller calls into the CFG function, which both checks the target callee and then calls it. * Check: the caller calls the CFG function which only checks the target callee and then must separately call the callee. LLVM has followed MSVC's pattern for selecting the mechanism based on the target architecture. These defaults in MSVC are based on tests for performance: Dispatch produces a smaller code size, whereas Check is more friendly to branch predictors. It is possible, however, for a given workload, call pattern or target CPU that someone may want to select a different mechanism to use for their code. This change adds a new Clang and CC1 flag to force a CFG mechanism: `-fwin-cfg-mechanism`. This can be set to `automatic` (lets LLVM choose a mechanism), `force-dispatch` or `force-check`. Also adds the support for the equivalent MSVC flag `/d2guardcfgdispatch`. NOTE: Arm64EC only supports the check mechanism. It should be noted that MSVC emits the "dispatch" name for the call checker (for legacy reasons) but uses the check mechanism.
This patch enables NEON to generate more efficient `cttz` intrinsics by utilising `rbit` and `ctlz` instructions when they are legal. # Alive Proof https://alive2.llvm.org/ce/z/qgrT_7 ``` define <8 x i8> @src_v8i8(<8 x i8> %a) { #0: %r = cttz <8 x i8> %a, 1 ret <8 x i8> %r } => define <8 x i8> @tgt_v8i8(<8 x i8> %a) { #0: %rbit = bitreverse <8 x i8> %a %clz = ctlz <8 x i8> %rbit, 0 ret <8 x i8> %clz } Transformation seems to be correct! ---------------------------------------- define <16 x i8> @src_v16i8(<16 x i8> %a) { #0: %r = cttz <16 x i8> %a, 1 ret <16 x i8> %r } => define <16 x i8> @tgt_v16i8(<16 x i8> %a) { #0: %rbit = bitreverse <16 x i8> %a %clz = ctlz <16 x i8> %rbit, 0 ret <16 x i8> %clz } Transformation seems to be correct! ```
…able callees (#189244) The `MLProgramPipelineGlobals` pass crashed with a null pointer dereference when a `CallOpInterface` operation referred to a callee symbol that could not be resolved in the IR (e.g. an external function defined outside the module). Instead conservatively bail out when a callee symbol cannot be resolved, causing the pass to (preserving all loads/stores). This is consistent with how Value-based callees are handled. Fixes #109649 Assisted-by: Claude Code
This header assumed these had been imported
Avoid the confusing `Runtime unrolling with count: 0` `LLVM_DEBUG` statement.
…… (#191863) … value
Add an InstCombine fold for masked overwrite patterns where the add constant matches the cleared bits in the mask: (X + C) + (Y & ~C) -> X + (Y | C) Since `Y & ~C` clears all bits set in C, adding C cannot generate carry through those bits and is equivalent to setting them with `or`. Proof: https://alive2.llvm.org/ce/z/277UFK Fixed: llvm/llvm-project#191171
CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
When -fclangir is passed and the input is LLVM IR (e.g. during the backend phase of OpenMP offloading), the CIR frontend pipeline is not applicable. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Adds support for the $arch-unknown-serenity target to the Clang front end. This makes the compiler look for libraries and headers in the right places, and enables some security mitigations like stack-smashing protection and position-independent code by default. ---- A first attempt at upstreaming this patch was made [here](https://reviews.llvm.org/D154396). I hope I fixed everything mentioned there. I intentionally kept `/usr/local/` in the default lookup path. I consider it the more practical option, and I’d prefer to have the patch merged as is and revisit the FIXME later. If this is absolutely unacceptable to the maintainers, I will happily drop it and keep it as a local patch until we address the underlying issue. @MaskRay, @brad0 as you already reviewed the original patch. --------- Co-authored-by: Daniel Bertalan <dani@danielbertalan.dev> Co-authored-by: kleines Filmröllchen <filmroellchen@serenityos.org> Co-authored-by: Andrew Kaster <akaster@serenityos.org> Co-authored-by: Dan Klishch <danilklishch@gmail.com>
Included __llvm-libc-common.h in __futex_word.h to fix a build failure with GCC. GCC in C++ mode does not recognize _Alignas without the mapping to alignas provided in __llvm-libc-common.h. The failure was introduced in commit 91c0fdf.
…on (#182592) Preparation change before implementing stride-multiversioning as a VPlan-based transformation. Might help llvm/llvm-project#147297 as well.
Without the fix, bytecode serialization roundtrip breaks for types that don't have custom bytecode serializers and contain quant types, since the fallback mechanism prints the type and the quant printer coerces signed to signless types. E.g. `!custom<!quant.uniform<ui8:f32, 0.1>>` will print as `u8` when serializing and later be created as a signless `i8` when deserializing.
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGLoopInfo.cpp
…090) Fixes buildbot report (https://lab.llvm.org/buildbot/#/builders/66/builds/29379): /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_bitvector_test.cpp:64:29: error: format specifies type 'unsigned long' but the argument has type 'uptr' (aka 'unsigned int') [-Werror,-Wformat] 64 | fprintf(stderr, "%lu ", idx); | ~~~ ^~~ | %u
RecordType::getTypeSizeInBits for unions was calling dataLayout.getTypeSize (which returns bytes) instead of dataLayout.getTypeSizeInBits. This returned a value 8x too small. Also handle the empty-union case where getLargestMember returns nullptr.
The memchr LLVM declaration created by MemChrOp lowering had no arg_attrs, so the lowered IR was missing `noundef` on all three parameters. OGCG emits `noundef` on them. Adds `noundef` to both the `@memchr` declaration and each `call @memchr` instruction. Made with [Cursor](https://cursor.com)
…ro (#192038)
This mostly replaces `"0x%" PRIx64` with `"{:x}"`, but also replaces
'%d' (used for register / scheme numbers and CFA offsets) and '%s' with
simple `{}`, removing the now redundant casts and calls to
`GetCString()` / `AsCString()`.
`UnwindLogMsg()` is no longer used and has been removed.
The e32alt and e64alt encodings for vtype are reserved. Non-fp instructions ignore altfmt and we want to use that to avoid vtype toggle when using load, store, slide, gather, etc. to manipulate bf16 vectors. This is why we have a Demanded bit for AltFmt. We need to make sure we don't keep the AltFmt set when we're changing SEW to 32 or 64. A new isValidVType function has been added to help catch illegal vtype earlier.
CONFLICT (content): Merge conflict in libclc/CMakeLists.txt CONFLICT (content): Merge conflict in libclc/cmake/modules/AddLibclc.cmake CONFLICT (content): Merge conflict in libclc/test/CMakeLists.txt
) Some spirv-val invocations are failing with newer SPIRV-Tools (`e4bceacf`) for PtrCastToGeneric OpSpecConstantOps with the error: Expected input and Result Type to point to the same type Disable the spirv-val steps temporarily. Original commit: KhronosGroup/SPIRV-LLVM-Translator@7de094bda017e2a
Update for llvm-project commit llvm/llvm-project@193d7a6ace9f ("[MC,CodeGen] Update .prefalign for symbol-based preferred alignment (#184032)", 2026-04-11). Temporarily match both the old and new patterns, to ensure the test also passes with outdated LLVM apt binaries on our CI. Original commit: KhronosGroup/SPIRV-LLVM-Translator@f417c2db8409dfb
…sts (#3669) Enable tests that pass on the latest LLVM revision Original commit: KhronosGroup/SPIRV-LLVM-Translator@508c2aeb96aab8f
…puts (#3675) Packed Int4/FP4 conversion builtins only worked correctly when the packed container was `i8` or `i32`. Using `i16` or `i64` caused an `"Invalid floating point encoding"` assertion during forward translation. Using a vector of integers (e.g. `<2 x i8>`) as the packed container produced a size-mismatched bitcast that crashed the round-trip translation. Extend support to all integer widths (8, 16, 32, 64 bits) and vector-of-integer packed containers for both Int4 and FP4 conversions in both directions. Extended `SPV_INTEL_float4/conversions_packed.ll` and `SPV_INTEL_int4/conversions_packed.ll` with `i16`, `i64`, and `<2 x i8>` packed container cases in both directions. AI-assisted: Claude Sonnet 4.6 (commercial SaaS) Original commit: KhronosGroup/SPIRV-LLVM-Translator@31ba7d132319efc
This patch should be reverted in the future as it supports translation of invalid SPIR-V modules. This is forward-porting from: KhronosGroup/SPIRV-LLVM-Translator#3476 This is workaround related to: f34e5458aa63 Original commit: KhronosGroup/SPIRV-LLVM-Translator@daa7d18d7e42615
`OpSubgroupBlockPrefetchINTEL` instruction prefetches a contiguous byte block from CrossWorkgroup memory per subgroup invocation, with an optional Memory Operands bitmask. Spec: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_subgroup_buffer_prefetch.asciidoc AI-assisted: Claude Sonnet 4.6 (commercial SaaS) Original commit: KhronosGroup/SPIRV-LLVM-Translator@c8862dd465aba4a
llvm/llvm-project@d19e954b83 made `-use-constant-fp-for-fixed-length-splat` default to true. This causes `ConstantFP` to represent fixed-length vector splats natively instead of using `ConstantDataVector`. The translator only handled scalar `ConstantFP`, so vector-typed `ConstantFP` splats were silently mistranslated. e.g., this LLVM IR call argument: `call <8 x i8> @convert(<8 x half> <half 0xH3C00, half 0xH3C00, ...>)` produced a scalar `OpConstant` with a vector type: `%53 = OpConstant %v8half 15360 ; invalid: scalar op, vector type` instead of the correct `OpConstantComposite`: ``` %53 = OpConstant %half 15360 %60 = OpConstantComposite %v8half %53 %53 %53 %53 %53 %53 %53 %53 ``` Build an `OpConstantComposite` from repeated scalar references instead. Existing tests that cover this: - extensions/EXT/SPV_EXT_float8/conversions_scalar_vector.ll - extensions/INTEL/SPV_INTEL_float4/conversions_packed.ll - extensions/INTEL/SPV_INTEL_float4/conversions_scalar_vector.ll - extensions/INTEL/SPV_INTEL_int4/conversions_packed.ll AI-assisted: Claude Sonnet 4.6 (commercial SaaS) Original commit: KhronosGroup/SPIRV-LLVM-Translator@d4d65d5ea9449ae
SPIR-V spec section 2.16.1 requires OpVariable instructions to be first in the entry block of a function. The translator was inserting `DebugFunctionDefinition` before the fix. Original commit: KhronosGroup/SPIRV-LLVM-Translator@39e94f8985301fb
Bare-string substitutions match as substrings and the replacement path contains the tool name, causing corrupted RUN lines Port of the original patch in LLVM SPIR-V backend: llvm/llvm-project#192462 Original commit: KhronosGroup/SPIRV-LLVM-Translator@bd774ef4c90b7ad
Changed nullptr to false for the TrackInlineHistory bool parameter in InlineFunction call. The previous code incorrectly passed nullptr to a bool parameter, which requires direct-initialization and caused a compilation error with -fpermissive. Also corrected the parameter comment from /*ForwardVarArgsTo*/ to /*TrackInlineHistory*/ to match the actual parameter name. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…36056) Update addSimpleArrayInit to use InitializeMemberImplicit() instead of InitializeMember() to properly indicate implicit member initialization. This change is required after upstream commit 45ac2db refactored InitializedEntity booleans into enums and split InitializeMember() into separate functions for normal vs implicit initialization. Without this fix, array kernel parameters fail with "array initializer must be an initializer list" error because the initialization system no longer recognizes this as an implicit initialization that allows array copy operations. Fixes: Clang::CodeGenSYCL/array-kernel-param.cpp test failure Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix sycl-cconv.cpp test that broke after commit 034d4dc which changed SemaChecking to diagnose invalid non-dependent calls in dependent contexts. The commit now checks variadic function calls even in template definitions, not just during instantiation. This means the printf call on line 18 now emits an error both during template definition and during instantiation, requiring 2 expected-error directives. Fixes: CMPLRLLVM-74970 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix Clang::OpenMP/spirv_variant_match.cpp test that broke after commit 7d0bf88 which changed OpenMP variant arch matching to use Triple::parseArch instead of getArchTypeForLLVMName. Sync the tests to upstream version to fix the failures. Fixes: CMPLRLLVM-74596 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This is reland of 8c2b0d4 tracked in #12455 SourceLocExpr that may produce a function name are marked dependent so that the non-instantiated name of a function does not get evaluated. In GH78128, the name('s size) is used as template argument to a `DeclRef` that is not otherwise dependent, and therefore cached and not transformed when the function is instantiated, leading to 2 different values existing at the same time for the same function. Fixes #78128 Fixes: https://jira.devtools.intel.com/browse/CMPLRLLVM-74680 Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Contributor
|
@wenju-he Can you please help to restore the native_cpu support for libclc. Thanks! |
…_native The wrong native_cpu clang_triple was a bad merge conflict resolve in 80e398e.
Contributor
Contributor
Following 121f5a9, this PR removes deprecated `LLVM_ENABLE_RUNTIMES=libclc` build approach from SYCL toolchain. For nvptx64-nvidia-cuda build, pass following options to cmake configure: -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=libclc -DLLVM_RUNTIME_TARGETS="nvptx64-nvidia-cuda"
Contributor
Test is faiilng w/ oneAPI build compiler Testing: 0.. 10.. 20.. 30.. 40.. 50.. FAIL: Clang :: Driver/dxc_spirv-val_missing.hlsl (15082 of 25696) ******************** TEST 'Clang :: Driver/dxc_spirv-val_missing.hlsl' FAILED ******************** Exit Code: 2 Command Output (stdout): -- env PATH="" d:\github\_work\llvm\llvm\build\bin\clang.exe --driver-mode=dxc -spirv -I test -Tlib_6_3 -Fo D:\github\_work\llvm\llvm\build\tools\clang\test\Driver\Output\dxc_spirv-val_missing.hlsl.tmp.spv -### D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl 2>&1 | d:\github\_work\llvm\llvm\build\bin\filecheck.exe D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl executed command: env PATH= 'd:\github\_work\llvm\llvm\build\bin\clang.exe' --driver-mode=dxc -spirv -I test -Tlib_6_3 -Fo 'D:\github\_work\llvm\llvm\build\tools\clang\test\Driver\Output\dxc_spirv-val_missing.hlsl.tmp.spv' '-###' 'D:\github\_work\llvm\llvm\src\clang\test\Driver\dxc_spirv-val_missing.hlsl' note: command had no output on stdout or stderr error: command failed with exit status: 0xc0000135
Contributor
|
This is ready for review:
@intel/dpcpp-nativecpu-reviewers @intel/dpcpp-clang-driver-reviewers |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@4a24c68
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@bd774ef