Skip to content

Pairwise value implementation

Alexis Carras requested to merge pairwise-value-implementation into policy-learning

Implemented a new kind of q-learning module that ranks each possible move (from the current school) individually. That is, the state vector is different. This required a few changes in the loop and a whole new file. In the meantime I also made a few bug fixes (ie, all code in this branch should be better than that in the target).

Merge request reports

Loading