Faster Wordle Estimators
Two years ago, the game of Wordle was released.
I wrote an article back then on finding good words to solve it,
but I focused on implementing a good-enough algorithm,
which was a bit slow, and switched programming language
trying to tame its speed. I ended it saying: “Algorithms do matter too.”
It would be nice if, instead of language-specific tweaks,
we found an algorithm that made the runtime plummet
from half an hour to a second.
It would be nicer if we created an algorithm that achieved perfect play.
It would be even neater if that algorithm was so generic,
that it could solve a whole class of similar games.
Statistical §
The strategy we left off in the previous article had merit:
it computed the average number of possible solutions left over
after making a particular guess.
Given the set of solutions S, we compute for each guess g:
R(g)=def∣S∣∑s∈S∣{s′∣s′∈S,C(s′∣g)=C(s∣g)}∣
where C(s∣g) is the constraint that Wordle unveils
if the secret solution is s and the guessed word is g.
For instance, if the secret word is “craft”,
and we input the guess “salet”, Wordle will yield
the constraint ⬛🟨⬛⬛🟩 (second letter misplaced,
last letter correct, other letters absent from the secret word).
We can rephrase the formula using the Iverson bracket
(which is 1 if the condition is true, 0 otherwise):
R(g)=∣S∣∑s∈S∑s′∈S[[C(s′∣g)=C(s∣g)]]
Let’s now do a magic trick.
We can insert a sum over constraints
(defining C as the set of constraints)
if that sum only contains a single term.
It is the case below,
since a guess always produces the same constraint for a given solution:
R(g)=∣S∣∑s∈S∑c∈C[[C(s∣g)=c]]∑s′∈S[[C(s′∣g)=C(s∣g)]]
We can now swap the summations:
R(g)=∣S∣∑c∈C∑s∈S[[C(s∣g)=c]]∑s′∈S[[C(s′∣g)=c]]
Finally, we notice that the innermost sum is independent of s,
so we can factorize it:
R(g)=∣S∣∑c∈C(∑s∈S[[C(s∣g)=c]])2
That is much faster to compute:
there are only ∣C∣=35=243 possible constraints,
but there are ∣S∣=3158 word solutions.
What used to take ∣S∣2 operations
now takes ∣S∣+∣C∣.
The computation that used to take hours in the previous article,
now takes a mere second:
function average_remaining_solutions_after_guess(guess::Vector{UInt8}, solutions::Vector{Vector{UInt8}})::Float64
counts = zeros(Int, 243)
for solution in solutions
@inbounds counts[constraints(guess, solution) + 1] += 1
end
return sum(abs2, counts) / length(solutions)
end
Hat tip to Douglas Bates from the Julia community
for figuring that out!
From there, building an estimation of the number of remaining guesses n
is smooth sailing. We can assume that we maintain
a constant ratio ρ of removed solutions after each guess.
We have ∣S∣ solutions currently, and at the end we have only 1.
So before the last guess, we had ρ solutions,
and before the guess before that, we had ρ2…
All the way back to now, where we have ρn−1=∣S∣.
Hence n=1+log(ρ)log∣S∣.
function estimate_guesses_remaining_statistically(guess::Vector{UInt8}, solutions::Vector{Vector{UInt8}})::Float64
avg_remaining = average_remaining_solutions_after_guess(guess, solutions)
nsols = length(solutions)
prob_sol = if guess in solutions
1 / nsols
else
0
end
expected_guesses = 1 + log(nsols) / log(nsols / avg_remaining)
return prob_sol * 1 + (1-prob_sol) * expected_guesses
end
Entropic §
Of course, there are multiple ways to estimate the number of remaining guesses.
Information theory provides a framework for analysing
just how much information we gain from the constraint that Wordle reveals.
In information theory, we deal with symbols
and study the probability that they appear.
In our case, we have one symbol for each constraint.
The probability that a constraint appears is
p(c)=∣S∣∣{s∣s∈S,C(s∣g)=c}∣.
The amount of information gained, in bits, by seeing a constraint
is the entropy of Wordle:
H=def−c∈C∑p(c)log2p(c)
At the end, once there are zero bits of information,
we know the solution for sure,
but we still have to submit it as a guess in order to win.
Before that, we were 2 guesses away from a win,
and assuming we gain as many bits of information on each guess,
we had H bits of information.
Before that, we were 3 guesses away from the end,
and had 2H bits of information, and so forth.
At the start, the amount of information we had was
I=deflog2∣S∣=(n−1)×H.
Hence we obtain:
n=∑c∈Cp(c)log2p(c)−log2∣S∣+1
This approach is inspired by Grant Sanderson,
and has the redeeming quality of being efficient to compute
even without casting mathematical spells on the formula:
function estimate_guesses_remaining_entropically(guess::Vector{UInt8}, solutions::Vector{Vector{UInt8}})::Float64
counts = zeros(Int, 243)
for solution in solutions
@inbounds counts[constraints(guess, solution) + 1] += 1
end
nsols = length(solutions)
entropy = 0.0
for count in counts
if count == 0
continue
end
prob = count / nsols
entropy -= prob * log2(prob)
end
expected_guesses = log2(nsols) / entropy + 1
prob_sol = if guess in solutions
1 / nsols
else
0
end
return prob_sol * 1 + (1-prob_sol) * expected_guesses
end
Are those estimations good, though?
Estimators always offer a tradeoff.
What matters at the end is not how good it is on its own,
but how well it melds within the overall struture.
Heuristic §
The first thought you might have on how to use the estimator,
is to pick its best suggested guess at each step of the game.
Here is what we would get:
Estimator | Expected number of guesses to win |
Statistical | 3.6450 |
Entropic | 3.5687 |
That is better than the human average of about 4!
It also seems to prefer entropic calculations.
That could be due to chance, however.
(By the way, beware of comparing any of the figures I give to other articles.
Bot performances depend on the word lists,
which the New York Times has changed multiple times,
making the game slightly harder.
Meanwhile, human statistics often have selection bias,
considering that people that fail to solve in 6 guesses
cannot give the true number of guesses they would require,
on top of the obvious bias that people only publish good results.)
Accuracy §
Heuristics give a good-enough strategy.
We want to go further than “good enough” though.
We target optimal play.
With that in mind, the true intent of the estimator
is to predict the performance of perfect play.
Let’s compare the estimation to
the true optimal figure.
Estimator | Mean squared error |
Statistical | 0.9932 |
Entropic | 0.1044 |
The entropic estimation wins again; that said, surprisingly,
if we restrict the analysis to the best 100 guesses,
which are most likely to be useful,
we get a reversed picture, which may matter later:
Estimator | Mean squared error |
Statistical | 0.0266 |
Entropic | 0.2533 |
Convergence §
Here’s a sneak peek at how each estimator fares
within a self-optimizing algorithm.
That algorithm iterates until it finds the optimal choices
throughout the whole game, for all possible solutions.
What we plot is the optimizer’s best-known playing strategy.
It is conservative, not an estimation:
it starts out on iteration zero (not depicted) at 3158 guesses
(brute-forcing through all possible words).
Whenever it has yet to explore a guess, it assumes brute-force,
thus offering an upper bound on the optimal strategy
which eventually converges to it.
We want that algorithm to converge to the optimal strategy fast,
and the number of iterations is a good proxy
for how long it takes to achieve a certain level.
The statistical estimator starts out with an advantage,
likely because it is more accurate for better guesses.
But that makes it overconfident on its accuracy;
meanwhile the entropic estimator’s evaluated uncertainty
encourages it to search through a wider array of options,
and it eventually overtakes the statistical one permanently,
finding the optimal strategy first.
Coincidental circumstances §
Was it obvious from the get-go that the entropic estimator would end up on top?
It feels to me like there is a bit of luck involved:
- Its formula gives way to a fast algorithm without mathematical tour-de-force;
- Its higher accuracy overall ends up being beneficial,
despite being less accurate on the guesses that matter,
likely because it predicts pairwise rankings better;
- By sheer luck, on the current wordlist, its top guess is the optimal guess.
In the next article, let’s dig further into the weeds
of building a fully game-agnostic self-optimizing algorithm
that finds the best strategy fast,
using the entropic estimator as a foundation.