Replies: 1 comment
-
|
我觉得是sft 做完后在做RLHF的 理论上是这样,也希望博主也能开源一下RM和PPO的代码呀 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
将chinese-AIpaca的STF替换成RLHF+PPO来进行指令微调会得到更好的效果吗?
Beta Was this translation helpful? Give feedback.
All reactions