Dichotomous Diffusion Policy Optimization

Paper Project page

This repository provides an official implementation of DIPOLE on RL benchmarks, including ExORL and OGBench. Please refer to the corresponding directories for detailed experimental settings and instructions.

Overview

DIPOLE (DIchotomous diffusion POLicy improvEment) is a reinforcement learning (RL) algorithm designed for stable and controllable optimization of diffusion-based policies. It addresses key challenges in applying RL to large diffusion policies, including training instability, inefficient credit assignment, and limited controllability of policy greediness.

Core Idea

DIPOLE reformulates KL-regularized RL with a greedified policy regularization objective, which enables the optimal diffusion policy to be decomposed into two dichotomous policies:

Positive policy: reward maximization by emphasizing high-return actions.
Negative policy: reward minimization by emphasizing low-return actions.

Both policies are trained using bounded sigmoid-based weighting, ensuring stable and efficient learning without loss explosion.

Controllable Inference

At inference time, actions are generated by linearly combining the score functions of the positive and negative policies. It

enables greediness factors control the trade-off between exploitation and stability.
is mathematically analogous to classifier-free guidance in diffusion models.

As a result, DIPOLE enables explicit and continuous control over policy optimality without retraining.

Acknowledgments

Our implementation is buit upon on CFGRL and FQL, on top of OGBench's reference implementations. We thank all the contributions of prior studies for making their work publicly available.

Bibtex

If you find this repository useful, please cite:

@article{liang2026dipole,
  title={Dichotomous Diffusion Policy Optimization},
  author={Ruiming Liang and Yinan Zheng and Kexin Zheng and Tianyi Tan and Jianxiong Li and Liyuan Mao and Zhihao Wang and Guang Chen and Hangjun Ye and Jingjing Liu and Jinqiao Wang and Xianyuan Zhan},
  journal={arXiv preprint arXiv:2601.00898},
  year={2026}
}

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
ogbench		ogbench
rlbase		rlbase
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dichotomous Diffusion Policy Optimization

Paper Project page

Overview

Core Idea

Controllable Inference

Acknowledgments

Bibtex

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dichotomous Diffusion Policy Optimization

Paper Project page

Overview

Core Idea

Controllable Inference

Acknowledgments

Bibtex

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages