Reinforcement Learning (Q-Learning & SARSA) applied to the snake game --- My course project for the Reinforcement Learning course.
In Reinforcement Learning, one does not teach the agent (bot). The agent's controller (the environment) merely tells it what is good, and what is bad. This particular agent has been told that:
The snake then learns on its own, through a process of trial-and-error across many games, what it should do, and what it should not. It is kind of like human learning.
Source can be found on github.To download simply run
$ git clone https://github.com/spranesh/rl-snake.git
Watch these three (bad quality) screencasts for examples:
SARSA seems to peform better. It however needs quite a lot more training. The new long_train.sara training file in the source, was the result of training SARSA for 8 hours (~30,000 games). Q-Learning does well (compared to SARSA), when the training period is short - 15-20 minutes on my machine seems ideal.
Both algorithms show a bit of learning, even with just 2-3 (~100 games) minutes of training. This is mainly due to the compact state space used (state_mappers/quadrant_view.py). It has been seen to be sufficient to merely store which quadrant (relative to the snake) the food is in. Exact position, leads to a huge increase in the size of the state space, decreasing the rate of learning.
Note: Unlike most RL Algorithms, a completely relative state space has been used for this problem. The world changes with respect to the snake's head.
:Thanks to Prashanth for seeing through a reward bug that I was banging my head on.
: Strong correlations have been observed by the models reinforcement learning uses, and the reinforcement happening in our neurons. TD Learning is one such model.