Converting Block Sudoku for AI Gym
I won't go into the specific folder structure needed to make an AI Gym environment, but if you're curious you can check out the source code for my OpenAI Gym Environment on my github. The main file we'll be working with is under gym_blocksudoku/envs/blocksudoku_env.py.
Importing the class we built earlier for the game is straightforward. We now just need to connect it to our BlockSudoku class. Two important things to note are the action_space, and observation_space variables which will be the shapes of the action and observation space respectively.
def __init__(self): self.game = BlockSudokuGame() self.factory = BlockFactory() # Generate first three blocks and game board self.main_board = self.game.new_board() self.block_queue = self.factory.generate_3blocks() # reset total score self.is_running = True self.total_score = 0 self.current_steps = 0 self.max_steps = 2000 # action space self.action_space = spaces.Discrete(3*9*9) # observation space self.observation_space = spaces.Box( low=0, high=1, shape=(15, 15, 1), dtype=np.uint8 )
The Action Space
Since we have 3 blocks (at most) in our queue, and 81 different tiles that block could go in, our action space is therefore 243. This is a significantly larger action space than many of the examples in OpenAI Gym, but I'm hoping it won't be too much of a problem. If anything, it may just complicate training our network later. Given a number between 0 - 242, the environment will decode this into a block on the queue and it's respective x and y position. Below are such examples of this.
|Action||Block Queue||X Position||Y Position|
It's important to note that certain actions are impossible. We are not able to place a block if it intersects another block already on the board. We also cannot place a block from queue position 2 if we only have 1 block left in the queue. In these cases, we'll mark the action as invalid and return a negative reward. Conversely, the only way to gain positive reward is to clear lines off the board.
The Observation Space (Game state)
For the state of the game, we're going to be arranging it as a 15x15 array with the board on top and blocks in queue on bottom. For every step of the environment (after we receive an action), this state will be returned.
Our first agent - Random Agent
Now that we have OpenAI gym all set up with our project, the first "AI" agent we'll subject our game to is random agent. As the name suggests, Random Agent will randomly pick an action from the action state. It doesn't really care about the next state, nor will it act upon rewards. Implementing Random Agent is pretty straightforward:
import gym from gym_blocksudoku.envs.blocksudoku_env import BlockSudoku import numpy as np import random env = gym.make('blocksudoku-v0') env.reset() for _ in range(10000): env.render() # Take a random action t = np.zeros((3,9)) t[0,random.randint(0,2)] = 1 t[1,random.randint(0,8)] = 1 t[2,random.randint(0,8)] = 1 t state, reward, done, _ = env.step(t) print('reward: ' + str(reward)) if done: print('Game is finished') print('Total Score: ' + str(env.total_score)) break; env.close()
The results of running Random Agent are as follows:
Random Agent got pretty lucky the first two times, and then proceeded to score nothing until Trial 7. Since none of the scores even came close to the lowest score (5) in the baseline human benchmarks, we can conclude that randomly choosing positions isn't a very effective strategy.
Trying to solve BlockSudoku for yourself
I published the OpenAI gym implementation of BlockSudoku to PyPi, and if you'd like to try to design an agent for it, you can simply use pip to install it.
pip install gym-blocksudoku
Next up, building a smarter algorithm to help solve the game for us!