Skip to content

Whiterrrrr/dipole-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dichotomous Diffusion Policy Optimization

Illustration
 

This repository provides an official implementation of DIPOLE on RL benchmarks, including ExORL and OGBench. Please refer to the corresponding directories for detailed experimental settings and instructions.

Overview

DIPOLE (DIchotomous diffusion POLicy improvEment) is a reinforcement learning (RL) algorithm designed for stable and controllable optimization of diffusion-based policies. It addresses key challenges in applying RL to large diffusion policies, including training instability, inefficient credit assignment, and limited controllability of policy greediness.

Core Idea

DIPOLE reformulates KL-regularized RL with a greedified policy regularization objective, which enables the optimal diffusion policy to be decomposed into two dichotomous policies:

  • Positive policy: reward maximization by emphasizing high-return actions.
  • Negative policy: reward minimization by emphasizing low-return actions.

Both policies are trained using bounded sigmoid-based weighting, ensuring stable and efficient learning without loss explosion.

Controllable Inference

At inference time, actions are generated by linearly combining the score functions of the positive and negative policies. It

  • enables greediness factors control the trade-off between exploitation and stability.
  • is mathematically analogous to classifier-free guidance in diffusion models.

As a result, DIPOLE enables explicit and continuous control over policy optimality without retraining.

Acknowledgments

Our implementation is buit upon on CFGRL and FQL, on top of OGBench's reference implementations. We thank all the contributions of prior studies for making their work publicly available.

Bibtex

If you find this repository useful, please cite:

@article{liang2026dipole,
  title={Dichotomous Diffusion Policy Optimization},
  author={Ruiming Liang and Yinan Zheng and Kexin Zheng and Tianyi Tan and Jianxiong Li and Liyuan Mao and Zhihao Wang and Guang Chen and Hangjun Ye and Jingjing Liu and Jinqiao Wang and Xianyuan Zhan},
  journal={arXiv preprint arXiv:2601.00898},
  year={2026}
}

License

This project is licensed under the MIT License.

About

[ICLR 2026] The official implementation of Dichotomous Diffusion Policy Optimization (DIPOLE) in RL bench

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages