Generating an image using random triangles via a genetic algorithm

Показать описание

A quick test of a genetic algorithm code. Not real time, the original runs took about 50 mins each.

GAs are straightforward to code. This one is about 200-250 lines of Python/OpenCV. The steps are intuitive: decide on a 'genome' coding system, randomize a batch of individuals, generate an image from each individual genome, score each image for similarity to the reference image, randomize mutations and crossovers based on scoring fitness, repeat the generation and scoring. With minor changes to the 'genome' and mutation code, the code works on other polygons and shapes also.

The triangles are coded with six 'genes' for the location of the three vertices (x, y), three for color (RGB), and an alpha factor for blending. I fixed the alpha blending on this run to 0.8 (no specific reason on number, just better/faster results). Population size is 40 individuals. I restricted the color set based on the colors of the reference image to limit the search space and speed up the search. [Full randomization can be done better in HSV or Lab color spaces. RGB does not really lend itself easily to sensible color mutations.]

I retain the best fit and mutate 20 clones of the elite individual, and crossover the previous generation for the remaining 20 slots. By mutating the best fit without crossover, this is not a strict GA implementation. However, I noticed it tended to form faster images vs pure cross-overs only. Cross-overs are done over a random fixed point (left-right gene splicing). I originally had a fully random cross-over across the genome, but it tended to meander, likely due to the alpha blending, which can swing results abruptly if the top layers change.

On each cycle, a random set of crossed individuals are mutated. Out of these, a random set of triangles are mutated per affected individual. For each triangle, a random set of features are mutated, and the mutation amount is also randomly determined. The color randomization is based on a uniform probability (of colors that are in the reference image).

The first four tests (A through D) used 300 triangles, the last four used 150. Each group had different starting conditions for the randomized size range of the triangles. This is only for 500 generations, running slightly less than an hour for each run. Longer runs naturally produce better images, but finer progress takes longer. OpenCV's blending is known to be inefficient and is the biggest bottleneck on this code. The scoring is surprisingly not a huge time sink. I use Euclidian distance over all three channels and is vectorized into a single line for all images. I skip the square root since the ranking is the same without the expensive operation.