Skip to content

How to run Qwen3.5 9B GRPO GSM8K #9556

@whyseu

Description

@whyseu

Checklist / 检查清单

  • I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。

Question Description / 问题描述

I want to train Qwen3.5 9B using the GRPO method on the GSM8K dataset, but I am encountering an error.
I am unable to install causal_conv1d.

Here is pip list:
Package Version


absl-py 2.4.0
accelerate 1.14.0
addict 2.4.0
aiofiles 24.1.0
aiohappyeyeballs 2.6.2
aiohttp 3.14.1
aiosignal 1.4.0
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-doc 0.0.4
annotated-types 0.7.0
anthropic 0.109.1
antlr4-python3-runtime 4.13.2
anyio 4.13.0
apache-tvm-ffi 0.1.9
astor 0.8.1
attrdict 2.0.1
attrs 26.1.0
av 17.1.0
binpacking 2.0.1
blake3 1.0.8
brotli 1.2.0
cachetools 7.1.4
cbor2 6.1.2
certifi 2026.5.20
cffi 2.0.0
charset-normalizer 3.4.7
click 8.4.1
cloudpickle 3.1.2
compressed-tensors 0.17.0
contourpy 1.3.3
cpm-kernels 1.0.11
crcmod 1.7
cryptography 49.0.0
cuda-bindings 13.3.1
cuda-core 1.0.1
cuda-pathfinder 1.5.5
cuda-python 13.3.1
cuda-tile 1.3.0
cuda-toolkit 13.0.2
cycler 0.12.1
dacite 1.9.2
datasets 4.8.4
deepspeed 0.19.1
depyf 0.20.0
detect-installer 0.1.0
dill 0.4.1
diskcache 5.6.3
distro 1.9.0
dnspython 2.8.0
docstring_parser 0.18.0
einops 0.8.2
email-validator 2.3.0
fastapi 0.136.3
fastapi-cli 0.0.24
fastapi-cloud-cli 0.20.0
fastar 0.11.0
fastsafetensors 0.3.2
ffmpy 1.0.0
filelock 3.29.4
fla-core 0.5.0
flash_attn 2.8.3
flash-linear-attention 0.5.0
flashinfer-cubin 0.6.12
flashinfer-python 0.6.12
fonttools 4.63.0
frozenlist 1.8.0
fsspec 2026.2.0
gguf 0.19.0
googleapis-common-protos 1.75.0
gradio 5.50.0
gradio_client 1.14.0
groovy 0.1.2
grpcio 1.81.1
h11 0.16.0
hf-xet 1.5.1
hjson 3.1.0
httpcore 1.0.9
httptools 0.8.0
httpx 0.28.1
httpx-sse 0.4.3
huggingface_hub 1.19.0
humming-kernels 0.1.4
idna 3.18
ijson 3.5.0
importlib_metadata 9.0.0
interegular 0.3.3
Jinja2 3.1.6
jiter 0.15.0
jmespath 0.10.0
joblib 1.5.3
json_repair 0.60.1
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
kiwisolver 1.5.0
lark 1.2.2
latex2sympy2_extended 1.0.6
liger_kernel 0.8.0
llguidance 1.7.6
llvmlite 0.47.0
lm-format-enforcer 0.11.3
loguru 0.7.3
Markdown 3.10.2
markdown-it-py 4.2.0
MarkupSafe 3.0.3
math-verify 0.5.2
matplotlib 3.11.0
mcp 1.27.2
mdurl 0.1.2
mistral_common 1.11.3
ml_dtypes 0.5.4
model-hosting-container-standards 0.1.15
modelscope 1.37.1
mpmath 1.3.0
ms_swift 4.3.0
msgpack 1.2.0
msgspec 0.21.1
multidict 6.7.1
multiprocess 0.70.19
networkx 3.6.1
ninja 1.13.0
nltk 3.9.4
numba 0.65.0
numpy 2.3.5
nvidia-cublas 13.1.0.3
nvidia-cublas-cu12 12.8.3.14
nvidia-cuda-cccl 13.3.3.3.1
nvidia-cuda-crt 13.3.33
nvidia-cuda-cupti 13.0.85
nvidia-cuda-cupti-cu12 12.8.57
nvidia-cuda-nvcc 13.2.78
nvidia-cuda-nvrtc 13.0.88
nvidia-cuda-nvrtc-cu12 12.8.61
nvidia-cuda-runtime 13.0.96
nvidia-cuda-runtime-cu12 12.8.57
nvidia-cuda-tileiras 13.2.78
nvidia-cudnn-cu12 9.7.1.26
nvidia-cudnn-cu13 9.19.0.56
nvidia-cudnn-frontend 1.25.0
nvidia-cufft 12.0.0.61
nvidia-cufft-cu12 11.3.3.41
nvidia-cufile 1.15.1.6
nvidia-cufile-cu12 1.13.0.11
nvidia-curand 10.4.0.35
nvidia-curand-cu12 10.3.9.55
nvidia-cusolver 12.0.4.66
nvidia-cusolver-cu12 11.7.2.55
nvidia-cusparse 12.6.3.3
nvidia-cusparse-cu12 12.5.7.53
nvidia-cusparselt-cu12 0.6.3
nvidia-cusparselt-cu13 0.8.0
nvidia-cutlass-dsl 4.5.2
nvidia-cutlass-dsl-libs-base 4.5.2
nvidia-cutlass-dsl-libs-cu13 4.5.2
nvidia-ml-py 13.610.43
nvidia-nccl-cu12 2.26.2
nvidia-nccl-cu13 2.28.9
nvidia-nvjitlink 13.0.88
nvidia-nvjitlink-cu12 12.8.61
nvidia-nvshmem-cu13 3.4.5
nvidia-nvtx 13.0.85
nvidia-nvtx-cu12 12.8.55
nvidia-nvvm 13.2.78
openai 2.41.1
openai-harmony 0.0.8
opencv-python-headless 4.13.0.92
opentelemetry-api 1.42.1
opentelemetry-exporter-otlp 1.42.1
opentelemetry-exporter-otlp-proto-common 1.42.1
opentelemetry-exporter-otlp-proto-grpc 1.42.1
opentelemetry-exporter-otlp-proto-http 1.42.1
opentelemetry-proto 1.42.1
opentelemetry-sdk 1.42.1
opentelemetry-semantic-conventions 0.63b1
opentelemetry-semantic-conventions-ai 0.5.1
orjson 3.11.9
oss2 2.19.1
outlines_core 0.2.14
packaging 26.0
pandas 2.3.3
partial-json-parser 0.2.1.1.post7
peft 0.19.1
pillow 11.3.0
pip 26.1.1
prometheus_client 0.25.0
prometheus-fastapi-instrumentator 8.0.0
propcache 0.5.2
protobuf 6.33.6
psutil 7.2.2
py-cpuinfo 9.0.0
pyarrow 24.0.0
pybase64 1.4.3
pycountry 26.2.16
pycparser 3.0
pycryptodome 3.23.0
pydantic 2.12.3
pydantic_core 2.41.4
pydantic-extra-types 2.11.1
pydantic-settings 2.14.1
pydub 0.25.1
pyelftools 0.33
Pygments 2.20.0
PyJWT 2.13.0
pyparsing 3.3.2
python-dateutil 2.9.0.post0
python-dotenv 1.2.2
python-json-logger 4.1.0
python-multipart 0.0.32
pytz 2026.2
PyYAML 6.0.3
pyzmq 27.1.0
quack-kernels 0.5.0
qwen-vl-utils 0.0.14
referencing 0.37.0
regex 2026.5.9
requests 2.34.2
rich 15.0.0
rich-toolkit 0.20.1
rignore 0.7.6
rouge 1.0.1
rpds-py 2026.5.1
ruff 0.15.17
safehttpx 0.1.7
safetensors 0.8.0
scipy 1.17.1
semantic-version 2.10.0
sentencepiece 0.2.1
sentry-sdk 2.62.0
setproctitle 1.3.7
setuptools 80.10.2
shellingham 1.5.4
simplejson 4.1.1
six 1.17.0
sniffio 1.3.1
sortedcontainers 2.4.0
sse-starlette 3.4.4
starlette 1.3.1
supervisor 4.3.0
sympy 1.14.0
tabulate 0.10.0
tensorboard 2.20.0
tensorboard-data-server 0.7.2
tiktoken 0.13.0
tilelang 0.1.9
tokenizers 0.22.2
tokenspeed-mla 0.1.2
tokenspeed-triton 3.7.10.post20260531
tomlkit 0.13.3
torch 2.11.0
torch_c_dlpack_ext 0.1.5
torchvision 0.26.0
tqdm 4.68.2
transformers 5.12.0
transformers-stream-generator 0.0.5
triton 3.6.0
trl 0.29.1
typer 0.25.1
typer-slim 0.24.0
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2026.2
urllib3 2.7.0
uvicorn 0.49.0
uvloop 0.22.1
vllm 0.23.0
watchfiles 1.2.0
websockets 15.0.1
Werkzeug 3.1.8
wheel 0.46.3
xgrammar 0.2.2
xxhash 3.7.0
yarl 1.24.2
z3-solver 4.15.4.0
zipp 4.1.0
zstandard 0.25.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions