Pairwise value implementation
Implemented a new kind of q-learning module that ranks each possible move (from the current school) individually. That is, the state vector is different. This required a few changes in the loop and a whole new file. In the meantime I also made a few bug fixes (ie, all code in this branch should be better than that in the target).