Playing Wordle with a Computer

Wordle is a word game where you must guess the correct 5 letter word in 6 guesses and each time you guess you are given hint as to which letters are correct and which letters are incorrect. For instance, if your guess is NOTED and the and answer word is MONEY then the squares in which the letters are placed will turn yellow, grey, and green. If the letter is green, then the letter is the correct letter, and it is in the correct position. If the letter is yellow, it is the correct letter but in the wrong position. If the letter is grey, then the letter is both in the wrong position and is the wrong letter. In our example the “N” is colored yellow, the “O” and the “E” are green, and the rest are colored grey. You can only play the official Wordle game once a day.

The game was created by Brooklyn software engineer Josh Wardle for his girlfriend. In the later years of the pandemic, it was quite popular, filling Facebook and Twitter feeds with colored blocks.

Figure 1 From Google Trends Data

To be honest I wasn’t that into the game. Occasionally guessing and occasionally winning and occasionally losing with no real stake in the outcome because I rarely ever posted my wins. I didn’t really think it was interesting until 3Blue1Brown, a mathematics YouTuber, posted the video below:

In this video and the point of this video is to use information theory to find first word that gives the most information for any given answer word. The answer that he comes up with is CRANE in the video. He later does another video where he explains that he made a slight mistake and SALET, TRACE, and CRATE become the top contenders for the first guess.

In the end by running a simulation against all the 2,300 Wordle answer (with a word list of 13,000) his best average number of guesses was around 3.43. He goes on to explain that it is impossible to get a lower score than 3 because it takes more than 3 guess to reduce the uncertainty low enough to guarantee the correct guess.

How Word Data can help you play  

My initial reaction to the video wasn’t necessarily to make my own algorithm but to use the word data to help me play the game better and I have come up with some insights. Should you use ADIEU as the first guess? I don’t think so mostly because the word that will be the answer most likely will have a consonant to vowel ratio of 3 to 2. That means for every 3 consonants there are 2 vowels. So, the question then become which vowel should you use to start?

Doing a frequency analysis, we can see that “E” and “A” is the most probable letter and this strengthens the case for what 3Blue1Brown found TRACE, CRATE, and SALET. So, if you are going to pick a word pick a word with 3 consonants and 2 vowels and make sure those vowels are “A” and “E”.

Next question I asked was which letter is the best letter for each position and I did the following frequency analysis.

For first and last letter of each word the best letter to guess is “S”. For the second letter the best options are “A” and “O”, for the third slot “A” and “R” are the best options, and fourth slot greatly favors “E”.

Not all letters are created equal! I once played a game where my final guess was GAZER, and the final letter was GAMER. If you are down to one letter or are comparing letters you should always go with the more frequently occurring letter.

Playing with the Computer

First something that a computer can do that a human can’t do is filter words based on guesses. For example, let go back to our example in the beginning with NOTED and MONEY once the computer knows that “T” is a grey letter word then it can filter out all the words with letter “T” and then chose from a list of filter words. This is the first thing that I built in my algorithm and to be honest due to my poor ability at programming it took me a couple of months to code correctly.

I used this basic filtering method to create a benchmark for my other algorithms. It work by the hidden word and the guess word both being random choices from the 13,000 word Wordle list. Each guess is scored (meaning it is assigned a color) and then use to filter words from the Wordle list and then, while the hidden word remains the same, another random guess is chosen then scored and the whole process repeats until the game is won or until the score is all green. I ran this process 1000 times and with the entire Wordle word list it took about an hour so process. The results are below

The average number of guesses is around 3.93 which means that the algorithm works in about 4 guesses. I consider an algorithm with a higher number of guesses than this to be a bad algorithm.

I then went on to change the algorithm by ranking the words by probability of guessing each letter individually. One could imaging randomly picking a letter out of a bag (with replacement) and since not every letter is equally likely to be picked some words are more likely to be made than others.

Since each pick of a letter is independent then the probability and rank of is the product of probabilities of each letter

P(word) = {\product } p(letter)

This method produces a starting word of “ESSES” which seems good but ends up producing unsatisfactory results

The average number of guesses is 4.23 which is greater than randomly guessing a word so I consider this a bad algorithm but the question that is really important is why is it bad? Well I think it is because of the diversity of letters like for instance the first word being “ESSES” yes it has a 3 to 2 consonant to vowel ratio but most of the letters are the same because they are the most common. A better algorithm is one that leans more toward letter diversity.

That is why my next method ranks the word using position. Instead of picking out of a bag with the same probability for each letter for every position we pick out of a different bag for each position. Where each letter has a different probability for each position (refer to figure 3). This produces a first guess “SORES” it still uses two “S’s” but has much better letter diversity than “ESSES” which uses only 2 letters. The results are below

This did much better with an average number of guesses of 3.79.

References:

https://jonathanolson.net/experiments/optimal-wordle-solutions