Compared to the loss function of PPO, BPPO does not introduce any extra constraint or regularization. The only difference is the advantage approximation, corresponding to the code difference between ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results