Category Archives: Uncategorized
But just because those are the most frequent letters doesn’t mean they are the best letters to call. For example, many words take the form of _ _ _ _ N _; Wheel fans know that an N in the penultimate space is a very good indicator that the word ends in I-N-G. As a result, calling G is not very useful in this situation, since you can guess it is there already anyway.
Then what are the best letters? Unfortunately, this post will not provide an answer. However, it will show one wrong way of deriving an answer, which serves as a perfect example of post treatment bias.
First, let’s look at the data. Here is a list of non-RSTLNE letters, the frequency contestants call them, and how often they win if they called them. Keep in mind that data consists of every bonus round from 2007 to 2012, or 1166 total puzzles.
The results seem surprising. H, G, D, and O aren’t anywhere near the top. Instead, V, U, J, and Y rule the day. (Yes, yes, small sample sizes.) Maybe we all should start calling V J Y U!
Let’s talk about post-treatment bias for a moment. In the absence of a controlled, randomized experiment, our goal here was to take existing data on games played and infer which letters are the best. Our treatment, of course, is the selected letters. But this causes a problem. From Gary King’s slides, post-treatment bias occurs when:
- when controlling away for the consequences of treatment
- when causal ordering among predictors is ambiguous or wrong
The latter is problematic in this case. What if the letters weren’t helping players solve the puzzle but rather solving the puzzle was helping players pick the letters?
To see what I mean, let’s look through that list again. Before, I secretly hid Q, Z, and X from you. Now let’s throw them back in:
Q and Z end up on top, albeit with three observations total. Why are players undefeated when they select these letters?
Looking through the data gives the obvious answers: players already solved the puzzle before picking their letters and selected Q and Z appropriately. For example, in one of the cases, the player called Z W J I on JIGSAW PUZZLE. To say the least, I don’t think that person was randomly guessing letters.
I found a bunch of cases where this happened with V. Calling G V W I on GIVE IT A WHIRL strikes me as more than just a coincidence.
Consequently, those tables don’t tell us much. Post-treatment bias sucks, and I will have to figure out a different way to infer the best letters to call.
Sorry this post didn’t end on a happier note. Here’s a picture of some puppies trending on /r/aww to make up for it:
C M D A? Try H G D O.
I have been watching Wheel of Fortune for more than 20 years now–my parents even tell me that the game taught me how to read. And all the while I have unquestionably thought that the best letters to call during the bonus round are C M D and A. But watching the program last night, I realized I had no factual basis for that. It was a belief. It was not science.
So I figured I would do some quick Googling and find out what the best letters actually were. Turns out, it seems no one has figured this out yet. (The best result was some dude on Yahoo! Answers, which wasn’t exactly reassuring.)
No problem. I found this website, which archives Wheel of Fortune bonus round puzzles and other associated information. It has a complete record from 2007-2012, or 1166 total puzzles. I scrapped the data and began my analysis. Here are some of the important findings:
1) I am not a lone in my belief: C M D A are the four most frequently called letters at 64.6%, 59.9%, 57.9%, and 48.3%, respectively.
2) P H O G are the next four in order at 38.2%, 34.5%, 31.1%, and 21.0%.
3) O is the most common letter to appear in puzzles, consuming 9.5% of all letters. This just goes to show you that the bonus round puzzles are not a random sample of words from the English language–in real life, O is the fourth most common letter after E, T, and A.
4) Despite being the most common letter in English, E is the fourth most common letter in the puzzles after O, I, and A. Ostensibly, they give you R S T L N E for free because they are common letters. However, the producers intentionally pick puzzles where those letters don’t show up. Like cake, the value of R S T L N E is a lie.
5) M is an awful pick, ranking 21st on the list. It only accounts for 2.1% of the letters. Only V, J, Q, Z, and X are less frequent. No one ever calls V, J, Q, Z, or X unless they already know the answer to the puzzle and want to show off. Yet 57.9% of players pick M. Go figure.
6) H is a great selection. It has a frequency of 4.6%, placing the highest among non-R S T L N consonants. It ranks just slightly below the least frequent vowel (U, 4.7%) but higher than N (4.5%), S (3.8%) and L (3.7%).
7) If you solely want to maximize the number of letters that are revealed, H G D O is the best selection. D (3.5%) is very close to P and B (both 3.4%), so there is some wiggle room here.
To hammer home the point, the plot below shows the frequency of called letters versus what appears on the board (click to enlarge):
The mess on the bottom left corner is the V, J, Q, Z, X trash.
A couple of notes before I wrap this up. First, I want to emphasize the distinction between “most frequent letters” versus “best letters.” What shows up most frequently might not be the most useful in terms of actually solving the puzzle. G’s frequency might be overrated since a lot of those come from -ING suffixes, which you could reasonably guess if you see a word like _ _ _ _ _ N _. Letters like C, B, or P might have an advantage in that they could appear at the beginning of words more frequently and are thus more valuable. This is something I could check on later.
This segues to the second point nicely. There are a bunch of interesting questions we can now answer now that I have this dataset. Expect more investigative posts like this in the future.
 The category What Are You Doing? only appears 9 out of 1166 times. Since this category always begins with a word ending in -ING, having the G be revealed in that slot is worthless to a contestant. But even if you remove those puzzles from the sample, G ranks much higher than the nearest alternatives.
In case you missed it, last night’s Final Jeopardy was flat terrible. This was the semifinal game in the teen tournament; only the top (strictly positive) scorer advanced to the next round, and no one keeps any money. The scores were $16,400, $12,000, and $1,200. The Final Jeopardy category was capital cities. Pretend you are the leader and place your wager.
Ready? Cue music:
It’s criss-crossed by dozens of “peace walls” that separate its Catholic & Protestant neighborhoods
Was your response Dublin? Mine was, as was all of the contestants’. Dublin is also wrong. The correct response was Belfast.
Nothing wrong with a triple stumper, though. The wagering strategies, on the other hand, were horrible. Every contestant wagered everything. With no one coming up with the correct response, no one had any money and thus no one qualified for the finals.
This made me go insane. The leader had no reason to wager more than $7,601; such a wager ensures that the leader wins with certainty if he receives the correct response and also gives him a win against a wider variety of opposing bids, including the set of bids from the game. In game theory terms, wagering $7,601 weakly dominates wagering everything.
I then vented in YouTube form:
Here’s a comment from the YouTube view page:
This is why the idea that people are intelligent self interested agents makes me laugh. People do this kind of thing ALL THE TIME, and it’s why economic theories that don’t account for this can’t predict [stuff].
Only he didn’t say stuff.
There are two big problems with this logic. First, rational self-interest is an assumption. We use assumptions to build theories not for their accuracy but for their usefulness. The better metric for modeling is a simple question: is this model more useful than the alternative? If yes, the model is satisfactory. If not, then use the alternative. We could discard certain reality and instead use some probability distribution over rational agents and automaton agents. While this would certainly be a more realistic model, it would come at the expense of being substantially more computationally intensive without much obvious reward. We should find no inherent shame in simplicity.
Second, a good theory explains and predicts behavior. Theories are not laws–we should not require a theory to hold 100% of the time for us to find a theory useful. Contrary to what the commentator wrote, we can use “intelligent, self-interested agents” as an assumption and predict quite a lot. In fact, the reason Final Jeopardy last night caused such a stir is because it egregiously violated what intelligent individuals should do. Intelligent individuals make up about 99.9% of the Jeopardy players, which is what made last night so extraordinary.
If models are useless because of the .1%, then all of academia–hard and soft science alike–needs to close up shop immediately.
Here is a frustrating critique of formal/game theoretical modeling:
The model the author presents is way too simple and completely divorced from reality. Therefore, we ought to ignore its conclusions.
Such comments pop up frequently. They are frustrating because they are grounded in ignorance. There is an implicit belief that formal modelers are attempting to match reality. With that as the premise, models fall woefully short. Consequently, the critics reject them.
However, I know of no serious modeler claims to or even wants to match reality. Formal modeling acts as accounting standards that verify or reject theories of causation. That’s it. That’s all. There’s nothing more. And that’s perfectly okay.
For example, consider the following argument on why countries develop nuclear weapons:
- Countries with rivalrous relationships are in competition for scarce resources.
- All other things being equal, countries with nuclear weapons receive more of the scarce resources.
- If the cost of the nuclear weapons is worth less than the additional amount of resources they bring in, we should expect a country to proliferate.
I just gave you the conventional wisdom about nuclear proliferation. I doubt many would criticize this as being too simplistic to reflect reality. For some reason, informal arguments have a certain immunity to this type of criticism. Perhaps this is because they do not explicitly detail what any of these assumptions mean–that nuclear weapons cost exactly $k, that power is equal to p, that the costs of war are c, and so forth. Yet, implicitly, those types of assumptions are present in the informal argument. And our inability to grasp that without formalization puts us in deep trouble.
As it turns out, the conventional wisdom is wrong. There normally exist bargained settlements that leave both sides better off than had the potential proliferator obtained nuclear weapons. A simple model illustrates this, and you can find the proof in this paper.
Yes, this model is way too “simple” and is completely divorced from reality. But so is the informal explanation I gave above. The English words make it feel more comfortable than Greek letters. But the critical difference is that the Greek letters show that the English words are wrong. And that is why modeling is useful. (And awesome.)
Life is a gigantic tradeoff. When anyone crafts a theory–formal or informal–they are simplifying reality into digestible bits. There is no shame in doing this, since indigestible chunks are completely worthless. Formal modeling is just really good at identifying logical inconsistencies.
This allows us to take the informal logic a bit deeper. For example, if the three above assumptions do not imply proliferation, what does? Sometimes, the second question is more interesting than the first.
Whenever you look at a model, ask a simple question: does this model help me understand the world better? If the answer is yes, then accept the model for what it is–a much simplified (but useful!) version of reality. If the answer is no, then feel free to hate it with a passion.
A bunch of my friends started posting Wordles of their academic papers. I thought it would be fun to try it for my “Invisible Fist” paper. The results were disappointing:
Apparently, I am not writing about nuclear proliferation. (Proliferation is barely visible.) No, I am writing about discount factors and \frac, the LaTeX command used to write fractions.
At least state is the third most prominent word.
It is really remarkable how far Game Theory 101 has come in the last three years. I had hopes for the enterprise, but nothing like this…
But really, please follow me on Twitter.
This semester, I am TAing for Civil War and the International System at the University of Rochester. These are the slides that I am using for our midterm review session. They cover how to read academic articles, how not to write a bad in-class essay (which should be the subject of a full blog post at some point), and some material related to the class directly. The first two items should be useful to people outside of the class. I can’t say the same about the third.
Anyway, click here for the slides. Also, it is a good idea to read over the full blog post I have on how to read an academic article.
I just got back home from Wegmans. Going over the receipt, I noticed that the cashier neglected to scan my coffee beans, so my espresso is free for the week. (See footnote.) Small victory.
Now, I felt a little bad when I noticed the error. I blame my mom for this. But then I started thinking–should I really have any moral obligation to correct the error?
The answer may appear trivial. My mom would say yes without any second thought. However, consider this:
- Register errors happen.
- Retailers knows this.
- Therefore, they increase the prices of these goods by some small amount to recover their losses. Call this the liar’s tax. (I name it this not because the liars pay it but rather the presence of the liars forces retailers to include it in the price of goods.
- If I correct the error, then I am essentially being double taxed, once because I have to pay for the good whereas a liar would not and twice because I have to pay more for the good than I would have to if liars did not exist.
So, I ask again: do I have a moral obligation to correct the checkout error? If I do, I am penalized twice for being an honest man. If I don’t, should I feel guilty?
Here’s how society should resolve the problem. Let’s begin a social norm not to correct checkout errors. Never ever ever ever ever. This will increase the liar’s tax to compensate for all of the errors. However, now liars and honest people are paying the exact same for the good in expectation. Retailers will complain that they are losing money by having no one correct the errors. But that is bullcrap–they are collecting that lost money through the liar’s tax they collect on all correctly processed sales.
Footnote: At Wegmans, you must weigh and label your own coffee beans. In the two plus years of living in Rochester, the label maker has worked effectively exactly twice. Usually, the label sticker fails to automatically come off the back paper, so you have to peel it for yourself. Today, the label printed at an angle where the barcode was supposed to go. So the cashier probably scanned it, but the register didn’t take it, and the cashier failed to notice that. But you’d think a business operation as large as Wegmans could develop a solution to a problem that has been going on since at least summer 2010.
Graduated from the University of California, San Diego in 2009. Currently a PhD student at the University of Rochester.
Interested in formal models of inter-state conflict.