Wordle – Calculations & starting words

At the start of this year I was introduced to Wordle and took to it imminently. It is a simple word game where you try to guess a 5 letter word. Every time you guess the wrong word, you get information about which letter are used in the correct word and where they are or aren’t used.

I didn’t need to play it for long, before I started think about the statistics and strategies. Specifically I started to think about:

  • How common are words with duplicate letters?
  • How common is each letter?
  • How common is each letter per slot?
  • And if I figure out the above, can I find a few good starting words?

Luckily the game is 100% JavaScript which makes it easy to analyze.

Getting the words

Before I start answering the questions, I will quickly go through how I obtained the words the analyze. As I mentioned, the game is made with Javascript, which means the source code and the word list is viewable in the web browser.

The steps here will be very general as web code can change at any point, making detailed steps obsolete.

First step is to right click on the page and selecting “View Source” (or similar named option). Then look for links to files with the ending of “.js”. Click each one until you find one that has a long list of words inside it.

The word list in the Wordle script

Usually, JavaScript is “simplified” to make it smaller, which means it’s hard to read without lots of editing. But in this case, we don’t want to understand the code, we just want the words, and they can be easily found by looking throw the code.

Getting answers to my questions

When I first downloaded the word list (2022-02-20), Wordle had 2309 possible answers and an additional 10638 possible valid words to guess/probe with. All statistics below is derived for that download.

How common are words with duplicate letters?

747 of the possible answers have one or more duplicate letter. That is about one third of the answers (32%).

How often each letter appears as a duplicate can be seen below.

A69B13C29D22E172F22
G11H10I24J00K08L71
M15N23O81P18Q00R60
S49T61U10V4W1X0
Y8Z5

How common is each letter?

In order from most common to least: E, A, R, O, T, I, L, S, N, U, C, Y, H, D, P, G, M, B, F, K, W, V, X, Z, Q and lastly J.

Letter# of words% of words
A90639%
B26612%
C44619%
D37016%
E105346%
F2069%
G29913%
H37716%
I64628%
J271%
K2029%
L64528%
M29813%
N54824%
O67229%
P34515%
Q291%
R83536%
S61727%
T66729%
U45620%
V1486%
W1938%
X372%
Y41618%
Z352%

(A letter is only counted once per word. Duplicate letters are not counted twice.)

How common is each letter per slot?

In order from most common to least:

Slot 1:  S, C, B, T, P, A, F, G, D, M, R, L, W, E, H, V, O, N, I, U, Q, J, K, Y, Z, X

Slot 2: A, O, R, E, I, L, U, H, N, T, P, W, C, M, Y, D, B, S, V, X, G, K, F, Q, J, Z

Slot 3: A, I, O, E, U, R, N, L, T, S, D, G, M, P, B, C, V, Y, W, F, K, X, Z, H, J, Q

Slot 4: E, N, S, A, L, I, C, R, T, O, U, G, D, M, K, P, V, F, H, W, B, Z, X, Y, J, Q

Slot 5: E, Y, T, R, L, H, N, D, K, A, O, P, M, G, S, C, F, W, B, I, X, Z, U, J, Q, V

LetterSlot 1Slot 2Slot 3Slot 4Slot 5
A140 (6%)304 (13%)306 (13%)162 (7%)63 (3%)
B173 (7%)16 (1%)56 (2%)24 (1%)11 (0%)
C198 (9%)40 (2%)56 (2%)150 (6%)31 (1%)
D111 (5%)20 (1%)75 (3%)69 (3%)118 (5%)
E72 (3%)241 (10%)177 (8%)318 (14%)422 (18%)
F135 (6%)8 (0%)25 (1%)35 (2%)26 (1%)
G115 (5%)11 (0%)67 (3%)76 (3%)41 (2%)
H69 (3%)144 (6%)9 (0%)28 (1%)137 (6%)
I34 (1%)201 (9%)266 (12%)158 (7%)11 (0%)
J20 (1%)2 (0%)3 (0%)2 (0%)0 (0%)
K20 (1%)10 (0%)12 (1%)55 (2%)113 (5%)
L87 (4%)200 (9%)112 (5%)162 (7%)155 (7%)
M107 (5%)38 (2%)61 (3%)68 (3%)42 (2%)
N37 (2%)87 (4%)137 (6%)82 (8%)130 (6%)
O41 (2%)279 (12%)243 (11%)132 (6%)58 (3%)
P141 (6%)61 (3%)57 (2%)50 (2%)56 (2%)
Q23 (1%)5 (0%)1 (0%)0 (0%)0 (0%)
R105 (5%)267 (12%)163 (7%)150 (6%)212 (9%)
S365 (16%)16 (1%)80 (3%)171 (7%)36 (2%)
T149 (6%)77 (3%)111 (5%)139 (6%)253 (11%)
U33 (1%)185 (8%)165 (7%)82 (4%)1 (0%)
V43 (2%)15 (1%)49 (2%)45 (2%)0 (0%)
W82 (4%)44 (2%)26 (1%)25 (1%)17 (1%)
X0 (0%)14 (1%)12 (1%)3 (0%)8 (0%)
Y6 (0%)22 (1%)29 (1%)3 (0%)364 (16%)
Z3 (0%)2 (0%)11 (0%)20 (1%)4 (0%)

Starting words

The above information is interesting, but the ultimate question is what words can I start with to maximize the chance of getting useful information.

Below I have sets of words based on the above statistics.

Starting words based on letters popularity

If we go by which letters are the most popular ones, we can maximize the chance of finding letters used in the answer by using one of the two following word sequence.

1st word2nd word3rd word
All valid wordsOATERLYSINBUMPH
Answers onlyIRATELOCUSNYMPH

First sequence is built on all valid words and second one with only the words that are possible answers. The first sequence uses more popular letters, but no word has any chance of being the answer.

Starting words based on letters popularity in each slot

If we want to factor in slot popularity, we can use the following word sequence to not just find likely used letters but also to maximize the chance of picking the correct slot for them as well.

1st word 2nd word 3rd word
All valid words SAINEBORTYFLUMP
Answers only SLATECRONYHUMID

Starting words based on high exclusion of answers

While working on the code for analyzing the words and letter frequency, I thought a lots of other ways to get good starting words. One way I liked, was picking the words that if no letters were correct would exclude the most words from the list of possible answers. That line of thinking gave we the following word sequences.

1st word 2nd word 3rd word
All valid words STOAE / TOEASNIRLYBUMPH
Answers only ARISE / RAISECLOUTNYMPH

Conclusion

Analyzing the words was a bit of fun for me and it did give me usable result. The current favorite sequence of starting words are the last presented: Arise, Clout and Nymph. They have been serving me very well. /Henrik

Be the first to comment

Leave a Reply

Your email address will not be published.


*