General Game Playing

What is GGP?

General Game Playing (GGP) is a research area concerned with a design of autonomous programs capable of playing a variety of games, including the ones they have never encountered before. This project refers to the particular embodiment of the GGP idea proposed by Stanford University in 2005. Since then, an annual GGP competition has been organized, that brings an additional challenge, excitement and the way to define the current trends in the field. The competitors, i.e. computer programs, are given the rules of games at run-time and have to figure out how to play without any human intervention. Therefore, they need to combine various interesting aspect such as knowledge extraction and representation, efficient state-search, abstract reasoning and decision making.

Here are some good introductions to the problem:

The rules are written in the so-called Game Description Language (GDL). Any finite, deterministic, perfect-information synchronous game can be encoded in GDL. The official language specification can be found here.

What is MINI-Player

MINI-Player is our GGP agent. It was the first player from Poland (ax-aequo with Magician) to participate in the GGP Competition and the first one to reach the final day of this tournament (in 2012).

In 2012, MINI-Player has advanced to the final day of the completion with the 5^th best score. The final place was 7^th. In 2013, it encountered some issues with single-player games and ended the tournament as 11-th not qualifying to the final day. In 2014, there were a record number of 49 players submitted to the qualifying round of the competition thanks to a massive online open course about GGP. Our player, together with 16 other ones, successfully went through the qualification round. Then it reach the final day with 11 other players. It managed to eliminate one opponent (2-0) but lost against two (1-2, 1-2) in a double-elimination format and finished the competition on the 7-8^th place.

The major parts and contributions of MINI-Player:

A portfolio of the so-called strategies, which guide the Monte Carlo phase in the MCTS algorithm. In MINI-Player, seven strategies called: were implemented.
Dynamic self-adaptation of the strategies based on their performance in a currently played game
New method of parallelization of the MCTS/UCT algorithm in GGP
A dedicated approach for single-player games
A high-performance interpreter for the Game Description Language
Communication module to take part in the GGP competition

In order to illustrate the way the MCTS algorithm works, let us consider a game between two MCTS-based machine players in a well-known game of Connect-4. In each step, each of the players was allotted 7 second for a move. Four interesting steps were selected from the played game. The following figure shows snapshots of the game board taken in these four steps (after the move was performed). The statistics computed for each available action (move) just prior to the presented game positions are provided below in the text. Please note that Q and N denote the parameters from the UCT formula. The boundary values of Q are equal to 0 (expected loss for red) and 100 (expected victory for red), respectively.

In step 1, two moves putting stones in the middlemost columns, i.e., 4 and 5, were clearly the best for the red player.

Action	MCTS assessment (Q):	The number of simulations (N):
[Drop in column 1]	50.0	2319
[Drop in column 2]	53.3	3294
[Drop in column 3]	58.4	6743
[Drop in column 4]	61.7	12829
[Drop in column 5]	61.4	11896
[Drop in column 6]	57.4	5677
[Drop in column 7]	52.5	3017
[Drop in column 8]	50.8	2522

In step 3, the best strategy for red was to continue putting stones in the middle and block the blue player:

Action	MCTS assessment (Q):	The number of simulations (N):
[Drop in column 1]	56.1	6263
[Drop in column 2]	55.9	6036
[Drop in column 3]	54.4	4882
[Drop in column 4]	59.3	11395
[Drop in column 5]	59.7	12208
[Drop in column 6]	53.1	4094
[Drop in column 7]	55.4	5601
[Drop in column 8]	53.9	4552

In step 15, the highest Q value was equal to 55.6, which means that the red player was slightly more likely to win, but the MCTS/UCT algorithm does not see any path leading to a very strong position (very likely victory) of this player (and of the other, as well):

Action	MCTS assessment (Q):	The number of simulations (N):
[Drop in column 1]	55.6	44908
[Drop in column 2]	46.0	4945
[Drop in column 3]	47.1	5788
[Drop in column 4]	44.8	4241
[Drop in column 5]	47.0	5737
[Drop in column 6]	43.7	3679
[Drop in column 7]	44.8	4205
[Drop in column 8]	43.4	3547

Finally, in step 41, three out of four actions were assigned Q = 100 meaning that they lead to a victory for red. Only the last action resulted in an immediate win, but the length of a winning path makes no difference to the MCTS/UCT assessment.

Action	MCTS assessment (Q):	The number of simulations (N):
[Drop in column 2]	100.0	2567553
[Drop in column 3]	26.3	259
[Drop in column 7]	100.0	2567833
[Drop in column 8]	100.0	2681408

Example games and selected results

As stated before, games in GGP are written in a GDL language. The language is based on Datalog and Prolog. Here, you can download [82KB] a compressed file with games used by us for the testing purposes.

The most famous public repositories can be found here:

http://www.ggp.org/view/tiltyard/games/

http://ggpserver.general-game-playing.de/ggpserver/public/show_games.jsp

http://arrogant.stanford.edu/ggp/php/showgames

The GGP competition is a specific testing environment as it is organized only once a year and very few matches between the players of a given match are carried out. Typically, a pair of GGP agents face each other from 2-4 times during a single day of the competition. Moreover, the choice of games is arbitrary and quite limited, so the luck/random factor plays a larger role than in a typical offline experiment with many repetitions. We have performed various experiments including MINI-Player that were carried out using 250 repetitions. We report the average scores obtained by MINI-Player, which is normalized into the [0,100] interval, where 0 is interpreted as a loss, 50 as a draw and 100 as a win of MINI-Player. Therefore, the higher the score, the better. For statistical significance we used the 95% confidence intervals computed from t-student distribution. They are presented as one number V, which should be interpreted that the score is burdened with the error of [–V, +V] with 95% statistical confidence.

TABLE 1

In the following table, the results achieved by MINI-Player against the basic implementation of the MCTS/UCT player are presented. Since the same basis is used in MINI-Player, the experiment reveals insights about the novel contributions, which are: portfolio of strategies and the mechanism of choosing a strategy for a simulation. Depending on the game, both players were given from 45 to 90 seconds for the initial analysis of the rules and from 10 to 30 seconds for making a move.

Game	Score	95% Confidence
Alquerque	67,2	4,12
Breakthrough	46,8	6,19
Cephalopod Micro	53,2	6,19
Checkers	71,2	5,22
Chinese Checkers	50,1	5
Chinook	61,6	4,6
Connect Four	58,6	5,62
Connect Four Suicide	50,4	5,6
Farmers	57,8	4,31
Farming Quandries	78,6	3,66
Free for ALL 2P	66,8	5,65
Hex	50	6,2
Nine Board Tic-Tac-Toe	67,8	5,42
Othello	51	5,85
Pacman	60,4	3,75
Pentago	44	5,79
Pilgrimage	55,8	3,12
Average:	58,31	5,08

TABLE 2

In the next table, MINI-Player was pitted against the publicly available version of CadiaPlayer from 2012. Again, the scores are reported from the MINI-Player’s perspective.

Game	Score	95% Confidence
Alquerque	61,2	4,1
Breakthrough	36	5,95
Cephalopod Micro	43,2	6,14
Checkers	61,2	5,7
Chinese Checkers	51	5,39
Chinook	50,8	3,42
Connect Four	41,4	5,86
Connect Four Suicide	61,8	5,39
Farmers	66,4	4,31
Farming Quandries	83,4	2,92
Free for ALL 2P	63	5,71
Hex	60	6,07
Nine Board Tic-Tac-Toe	73	5,17
Othello	48,6	6,01
Pacman	56,4	3,81
Pentago	26,6	4,74
Pilgrimage	61,4	3,04
Average:	55,61	4,93

For readers interested in more results, please take a look at our publications.

TABLE 3

The following table shows the best-evaluated strategy (on average) by MINI-Player in various phases of each game. Apart from the random strategy, each of them is the most used one in at least 3 cases. However, the random strategy is crucial to keep in the system, because it is the fastest one, so allows to perform more simulations per second. Moreover, it is not biased and displays synergistic behavior with other strategies.

Game	Early phase (before the 1 move)	Middle of the game	End of the game
Alquerque	HH	AGE	AGE
Breakthrough	HH	HH	HH
Cephalopod Micro	AGE	HH	HH
Checkers	AGE	SSC	S
Chinese Checkers	HH	HH	HH
Chinook	AGE	M	M
Connect Four	HH	AGE	AGE
Connect Four Suicide	HH	HH	HH
Farmers	S	S	AGE
Farming Quandries	E	E	E
Free for ALL 2P	E	SSC	SSC
Hex	AGE	AGE	AGE
Nine Board Tic-Tac-Toe	AGE	AGE	AGE
Othello	M	SSC	SSC
Pacman	M	HH	SSC
Pentago	AGE	AGE	AGE
Pilgrimage	E	E	E

Next, we present examples from 4 games of how the usage of strategies (in %) varies as the game progresses. In all charts, the X axis denotes normalized game progress, i.e. current game step divided by the length of the match in steps converted to percentage. The Y axis denotes portion

of simulations, also converted to percentage, assigned to strategies. For instance,

point (20, 30) means that when 20% of the game elapsed, strategy corresponding to

the point had performed 30% of simulations until that moment.

In Connect-4, the History Heuristic is identified as the best strategy at the beginning, but the Approximate Goal Evaluation quickly becomes the best one. The possible reason is that it is the best to start in the middle of the board – so there exist clearly the best action. The goal of the game, however, is to complete four-in-a-row, which works very well with the AGE strategy.

Connect-4

In Pilgrimage below, there exist the best strategy (Exploration) throughout the whole game. The strategy favours actions which result in big changes in the game states. It is particularly suitable for games, in which one of more conditions hold: (1) – the actions can return to the states already observed in the game making loops possible, (2) – the state-space is large, (3) – the goal cannot be broken down into simpler tasks to be achieved step by step, (4) – there exist no good material-based evaluation function.

Pilgrimage

In Farmers below, the best-evaluated strategies are different in the start (HH) and the end (AGE) of the game, respectively. The Score strategy is the choice in the middle of the game and seems to be reasonably well-suited for the whole game.

Farmers

In the last example of Othello, there is a huge dynamism in the evaluation of the strategies. This is most likely caused by the facts that Othello is a highly dynamic game and it is quite expensive to simulate in GGP, what makes the assessment less stable.

Othello

Play with and against MINI-Player!

Here you can download the manual how to operate the programs.

Here, you can download [258 KB] a program to play Inverted Pentago.

Inverted Pentago is a game played on a 6×6 board divided into four 3×3 sub-boards (or quadrants). Taking turns, the two players place a marble of their color (either red or blue) onto an unoccupied space on the board, and then rotate any one of the sub-boards by 90 degrees either clockwise or anti-clockwise. A player wins by making their opponent get five of their marbles in a vertical, horizontal or diagonal row (either before or after the sub-board rotation in their move). If all 36 spaces on the board are occupied without a row of five being formed then the game is a draw. Participants play as blue and are the second player to have a turn. This is a variant of the well-known Pentago game with inverted goals.

Here, you can download [258 KB] a program to play Ceplalopod.

Cephalopod is a game played on a 5×5 board. Taking turns, the two players place a die of their own colour (either white or black) onto an empty cell on the board. Immediately after the die is placed, the capturing conditions are checked and enforced if apply. The placement is capturing if and only if: (1) – the currently added die is horizontally or vertically adjacent (so no more than four neighbours are possible) to at least two dice and (2) – the sum of the pip counts on those adjacent dice is six or less. In both cases it does not matter what colours the adjacent dice are. If the die placement is capturing, then all the horizontally and vertically adjacent dice are removed from the board and the new dice is set to show the sum of the pip counts of the captured dice. Capturing are mandatory only when placing a die onto a square where capturing conditions apply. The game terminates when the board is full. The player who controls more dice wins. Draws and ties are impossible. Participants play as black and are the second player to have a turn.

Adaptive metaheuristic methods in dynamically changing environments

NSC Grant DEC–2012/07/B/ST6/01527
07/2013 - 02/2017

General Game Playing