RL Snake

Reinforcement Learning in the Classic Snake Arcade Game.

Pranesh Srinivasan

Description

Reinforcement Learning (Q-Learning & SARSA) applied to the snake game --- My course project for the Reinforcement Learning course.

Why is it not just another snake bot?

In Reinforcement Learning, one does not teach the agent (bot). The agent's controller (the environment) merely tells it what is good, and what is bad. This particular agent has been told that:

Getting food is good. +500 points to the snake.
Hitting a wall or itself is bad. -100 reward to it.
Anything else is also (relatively) bad. We want to get to the food quickly. (-10 to the snake for every move where none of the above happens).[1]

The snake then learns on its own, through a process of trial-and-error across many games, what it should do, and what it should not. It is kind of like human learning[2].

How does it work?

Source can be found on github.

To download simply run

$ git clone https://github.com/spranesh/rl-snake.git

Watch these three (bad quality) screencasts for examples:

Plain Old Snake. (Q Learning)
Snake with a small wall in between. (Q Learning)
A crazy vertical maze. (SARSA)

Observations

SARSA seems to peform better. It however needs quite a lot more training. The new long_train.sara training file in the source, was the result of training SARSA for 8 hours (~30,000 games). Q-Learning does well (compared to SARSA), when the training period is short - 15-20 minutes on my machine seems ideal.

Both algorithms show a bit of learning, even with just 2-3 (~100 games) minutes of training. This is mainly due to the compact state space used (state_mappers/quadrant_view.py). It has been seen to be sufficient to merely store which quadrant (relative to the snake) the food is in. Exact position, leads to a huge increase in the size of the state space, decreasing the rate of learning.

Note: Unlike most RL Algorithms, a completely relative state space has been used for this problem. The world changes with respect to the snake's head.

[1]:Thanks to Prashanth for seeing through a reward bug that I was banging my head on.

[2]: Strong correlations have been observed by the models reinforcement learning uses, and the reinforcement happening in our neurons. TD Learning is one such model.