Playing Capitals with OpenCV and Python

On Monday evening I had dinner with my friend Alysia Promislow, who showed me the game Capitals and suggested it might be interesting to play it programmatically.

I spent the last 18-20 working hours doing that. (Github link with code and example images). Hopefully it's redundant to mention that I've done this because it let me geek out on a goofy computational problem, not because I'm interested in cheating at a phone game1.

90% of getting anything done is knowing things exist, and the first two things I thought about after seeing the game is:

  • The low-complexity shapes and clear separation make it amenable to some simple computer vision techniques that I have employed before
  • Hexagonal coordinate systems are a thing

Decoding game state from a screenshot

I spent Tuesday afternoon writing a Python script to decode the game state from a screenshot, using OpenCV and Tesseract OCR engine.

It takes an RGB image, such as:

RGB screenshot

Next, it greyscales and runs the Canny edge detector to produce:


To detect and isolate hexagons, I follow a similar approach as the OpenCV example squares.cpp:

  • Find contours in the edge-detected image
  • Using the Ramer-Douglas-Peucker algorithm, attempt to approximate polygonal curves as lower-degree polygons
  • Take all detected six-sided convex polygons having a certain minimum area
  • It is also useful to check for approximately 120 degree angles, but it's not necessary for this application.

Taking only the contours which meet the criteria, we have:

Isolated contours

Iterating through these I use a series of increasingly-dubious heuristics to classify the hexagons:

  • Take the average RGB value of each hexagon:
    • Hexagons that are much more red than blue belong to red player
    • Hexagons that are much more blue than red belong to blue player
  • Hexagons that are mostly white belong to neither, and I determine the letter by passing the masked image to Tesseract
  • Rather than searching for the icon that denotes the capital, I count white pixels in each red and blue hexagon. e.g., a hexagon that is mostly red but has a significant white space is the red capital.

Now I can find possible words, but only some of them are useful. Also, there are typically thousands of candidates. In order to consider word connectedness, I find the centroids of each hexagon and estimate their position on a hexagonal grid. I've encountered hexagonal coordinate systems in wireless communications, but I didn't know about the Q*bert equivalence. (There's also a great animation here, search for "convert to cube coordinates").

Here is my derivation for the mapping from rectangular to hexagonal coordinates, given the hexagonal side length (estimated from detected contours) and an origin (arbitrarily chosen, it only needs to be relatively consistent). Also there is a spacing between hexagons in this game, which I have denoted 'b'.

Finding useful moves from game state

On Wednesday afternoon I wrote code to score candidate words. It is trivial to find all possible words, given a list of letters and a dictionary. However as mentioned above, a word's length is not the most useful indicator of its fitness as a move in the game. For a played word, letters that are connected to the player's territory will become a part of it. Isolated letters can be used to construct a word, but will not become part of player territory. Also, if connected tiles in a word are adjacent to enemy territory, the opposing player will lose that territory. Finally, it is frequently possible to create the same word using differently-located tiles, and this has a strategic impact (so words are not unique, combinations of tiles are).

I score candidate word choices by counting the length of the word, number of tiles it will add to player territory, number of tiles it will remove from enemy territory, and whether one of those tiles is the enemy capital (thus granting the player an extra turn, usually allowing a win).2

I determine connectedness of a candidate word by considering the list of all currently-owned tiles. For each of these, I check each of the six adjacent tiles. If it is part of the candidate word, add to the list of owned tiles, and note that I've visited it. Iterate until I am done with the list of owned tiles. The result is a list of all connected tiles in the candidate word, which I then use to check enemy player adjacency. For each candidate word a vector of these score variables is generated, and I do some rudimentary sorting to produce a suggestion, such as this suggestion for the red player:3

word:    retrenching, territory gain   10, enemy territory loss    6
word:    incremented, territory gain   10, enemy territory loss    5
word:     converting, territory gain   10, enemy territory loss    5
word:     converting, territory gain   10, enemy territory loss    5
word:     monteverdi, territory gain   10, enemy territory loss    5
word:     reentering, territory gain    9, enemy territory loss    6
word:    retrenching, territory gain    9, enemy territory loss    6
word:    retrenching, territory gain    9, enemy territory loss    6
word:      comintern, territory gain    9, enemy territory loss    5


  1. I'm terrible at this game, though. Scrabble too. 

  2. There are other variables to consider such as protecting one's own capital, avoiding being the player to bridge the gap, and positioning (ie gaining tiles in the center is more important than gaining isolated tiles which have no path to the enemy player). My thought is to construct a feature vector with these variables, weight them by some vector $\alpha$, and use a genetic algorithm to pit several of these automated players against each other in order to learn $\alpha$. I may never get around to it. 

  3. Although here I'd rather play "incremented" in order to gain the 'm' tile, protecting the red capital. 

Written by Ian Kilgore in computer vision on Thu 04 June 2015. Tags: computer math, hexagons, capitals,