Othello AI

I have wanted to gain a better understanding of neural networks for a while now, and so decided to implement one from scratch to try and achieve this. Rather than something like the tune generator, this time I opted to make it play a board game instead. Plenty of AIs are available for chess – my first thought – such as Stockfish NNUE, so I settled on Othello (sometimes called Reversi) instead.

The idea of Othello is to have the most pieces possible on the board when neither player can move. Every turn, a counter is played such that it sandwiches one or more of your opponent’s – which flips them to your colour. If a counter cannot be played then the turn is passed to the next player.

As a game, Othello is weakly solved – meaning that from the starting position, if both players play perfectly, both the result (draw in this case) and moves required are known. This may sound like it trivialises the problem, but it doesn’t – the game is only solved for the starting position, so by intentionally playing an imperfect move, your opponent no longer has a sequence of perfect moves, and therefore cannot guarantee a win/draw. (adapted from post by “cvoss” here)

I started off by implementing the game itself. My first thought was to use two boolean arrays, one for occupancy and one for colour, but after some reading on the Chess programming wiki discovered that since the board is 8×8 this information can in fact be stored using the binary representation of an unsigned 64 bit integer. 1s in the integer represent true, and 0s represent false. Using some simple bit manipulation, bits can be set, unset or read (see code below). This is also beneficial to performance – a 64 bit integer is small enough that it can undergo a bitwise operation in one cycle on most processors. In addition, it reduces the amount of branching in the code, allowing better pipelining.

u64 set(u64 bits, u8 index, bool value) {
    if value {
         return bits | (1 << index);
    } else {
         return bits & ~(1 << index);
    }
}

u64 flip(u64 bits, u8 index) {
    return bits ^ (1 << index);
}

bool get(u64 bits, u8 index) {
    return (bits & (1 << index)) != 0;
}

The move generation works by taking a mask of all current player pieces, and then sliding it in the chosen direction until either another player piece or an empty square is found. If the square is empty, then the square can be added to the list of valid moves. A similar algorithm is used for flipping pieces. I did not come up with this algorithm myself and instead found it on GitHub – my previous versions didn’t properly take advantage of the bit representation of the board.

Next, I made the neural network. My original plan was to use NEAT (https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies), which allows the network to reshape itself so the programmer does not need to know the optimal network configuration. However, this became difficult to implement efficiently, and I wasn’t sure I would see any benefit – from what I read, NEAT tends to be better suited to networks with fewer inputs/outputs, that might play physics based games for example. In the end, I settled on a fixed network size. I used the Eigen library in order to perform the matrix operations needed for each layer (the layers can be represented as matrices of weights and vectors of biases, and with a proper library these can be performed quickly using SIMD instructions. I have no idea if that is what Eigen is in fact doing in this case, but it seems to be one of the best libraries for this use.)

To train it, I used a method called self-play. This means having the AI play itself to determine which variation is stronger. The training consists of several generations – in each of which, every AI plays every other one in the generation, and the win/loss record for each AI is noted. A random selection of networks that win the most are picked for the next generation, some of which are mutated and some of which are cloned. Then, a selection of losers are taken, as well as some randomly configured networks. This helps to ensure that the networks don’t just learn to beat themselves. By repeating this process several times, eventually a good network should emerge. I later parallelised the training so it could make use of multiple cores.

The best version of the network I trained so far claimed to win around 70% of games against randomised opponents. While this doesn’t seem too high, the fact that it is above 50% surprised me, and I think I may be able to get better results if I let it train for longer. I found that during training the evaluation would suddenly decrease back to around 50% – something I am still not quite sure about and hope to correct.

I later added a simple UI with SDL2 to allow games to be played more fluidly, supporting UI clicks.

View project (currently lacks docs, hope to add some later).

Leave a Reply

Your email address will not be published. Required fields are marked *