What’s wrong with “should”?

Nov 8 JDN 2459162

I have been a patient in cognitive behavioral therapy (CBT) for many years now. The central premise that thoughts can influence emotions is well-founded, and the results of CBT are empirically well supported.

One of the central concepts in CBT is cognitive distortions: There are certain systematic patterns in how we tend to think, which often results in beliefs and emotions that are disproportionate with reality.

Most of the cognitive distortions CBT deals with make sense to me—and I am well aware that my mind applies them frequently: All-or-nothing, jumping to conclusions, overgeneralization, magnification and minimization, mental filtering, discounting the positive, personalization, emotional reasoning, and labeling are all clearly distorted modes of thinking that nevertheless are extremely common.

But there’s one “distortion” on CBT lists that always bothers me: “should statements”.

Listen to this definition of what is allegedly a cognitive distortion:

Another particularly damaging distortion is the tendency to make “should” statements. Should statements are statements that you make to yourself about what you “should” do, what you “ought” to do, or what you “must” do. They can also be applied to others, imposing a set of expectations that will likely not be met.

When we hang on too tightly to our “should” statements about ourselves, the result is often guilt that we cannot live up to them. When we cling to our “should” statements about others, we are generally disappointed by their failure to meet our expectations, leading to anger and resentment.

So any time we use “should”, “ought”, or “must”, we are guilty of distorted thinking? In other words, all of ethics is a cognitive distortion? The entire concept of obligation is a symptom of a mental disorder?

Different sources on CBT will define “should statements” differently, and sometimes they offer a more nuanced definition that doesn’t have such extreme implications:

Individuals thinking in ‘shoulds’, ‘oughts; or ‘musts’ have an ironclad view of how they and others ‘should’ and ‘ought’ to be. These rigid views or rules can generate feels of anger, frustration, resentment, disappointment and guilt if not followed.

Example: You don’t like playing tennis but take lessons as you feel you ‘should’, and that you ‘shouldn’t’ make so many mistakes on the court, and that your coach ‘ought to’ be stricter on you. You also feel that you ‘must’ please him by trying harder.

This is particularly problematic, I think, because of the All-or-Nothing distortion which does genuinely seem to be common among people with depression: Unless you are very clear from the start about where to draw the line, our minds will leap to saying that all statements involving the word “should” are wrong.

I think what therapists are trying to capture with this concept is something like having unrealistic expectations, or focusing too much on what could or should have happened instead of dealing with the actual situation you are in. But many seem to be unable to articulate that clearly, and instead end up asserting that entire concept of moral obligation is a cognitive distortion.

There may be a deeper error here as well: The way we study mental illness doesn’t involve enough comparison with the control group. Psychologists are accustomed to asking the question, “How do people with depression think?”; but they are not accustomed to asking the question, “How do people with depression think compared to people who don’t?” If you want to establish that A causes B, it’s not enough to show that those with B have A; you must also show that those who don’t have B also don’t have A.

This is an extreme example for illustration, but suppose someone became convinced that depression is caused by having a liver. They studied a bunch of people with depression, and found that they all had livers; hypothesis confirmed! Clearly, we need to remove the livers, and that will cure the depression.

The best example I can find of a study that actually asked that question compared nursing students and found that cognitive distortions explain about 20% of the variance in depression. This is a significant amount—but still leaves a lot unexplained. And most of the research on depression doesn’t even seem to think to compare against people without depression.

My impression is that some cognitive distortions are genuinely more common among people with depression—but not all of them. There is an ongoing controversy over what’s called the depressive realism effect, which is the finding that in at least some circumstances the beliefs of people with mild depression seem to be more accurate than the beliefs of people with no depression at all. The result is controversial both because it seems to threaten the paradigm that depression is caused by distortions, and because it seems to be very dependent on context; sometimes depression makes people more accurate in their beliefs, other times it makes them less accurate.

Overall, I am inclined to think that most people have a variety of cognitive distortions, but we only tend to notice when those distortions begin causing distress—such when are they involved in depression. Human thinking in general seems to be a muddled mess of heuristics, and the wonder is that we function as well as we do.

Does this mean that we should stop trying to remove cognitive distortions? Not at all. Distorted thinking can be harmful even if it doesn’t cause you distress: The obvious example is a fanatical religious or political belief that leads you to harm others. And indeed, recognizing and challenging cognitive distortions is a highly effective treatment for depression.

Actually I created a simple cognitive distortion worksheet based on the TEAM-CBT approach developed by David Burns that has helped me a great deal in a remarkably short time. You can download the worksheet yourself and try it out. Start with a blank page and write down as many negative thoughts as you can, and then pick 3-5 that seem particularly extreme or unlikely. Then make a copy of the cognitive distortion worksheet for each of those thoughts and follow through it step by step. Particularly do not ignore the step “This thought shows the following good things about me and my core values:”; that often feels the strangest, but it’s a critical part of what makes the TEAM-CBT approach better than conventional CBT.

So yes, we should try to challenge our cognitive distortions. But the mere fact that a thought is distressing doesn’t imply that it is wrong, and giving up on the entire concept of “should” and “ought” is throwing out a lot of babies with that bathwater.

We should be careful about labeling any thoughts that depressed people have as cognitive distortions—and “should statements” is a clear example where many psychologists have overreached in what they characterize as a distortion.

Terrible but not likely, likely but not terrible

May 17 JDN 2458985

The human brain is a remarkably awkward machine. It’s really quite bad at organizing data, relying on associations rather than formal categories.

It is particularly bad at negation. For instance, if I tell you that right now, no matter what, you must not think about a yellow submarine, the first thing you will do is think about a yellow submarine. (You may even get the Beatles song stuck in your head, especially now that I’ve mentioned it.) A computer would never make such a grievous error.

The human brain is also quite bad at separation. Daniel Dennett coined a word “deepity” for a particular kind of deep-sounding but ultimately trivial aphorism that seems to be quite common, which relies upon this feature of the brain. A deepity has at least two possible readings: On one reading, it is true, but utterly trivial. On another, it would be profound if true, but it simply isn’t true. But if you experience both at once, your brain is triggered for both “true” and “profound” and yields “profound truth”. The example he likes to use is “Love is just a word”. Well, yes, “love” is in fact just a word, but who cares? Yeah, words are words. But love, the underlying concept it describes, is not just a word—though if it were that would change a lot.

One thing I’ve come to realize about my own anxiety is that it involves a wide variety of different scenarios I imagine in my mind, and broadly speaking these can be sorted into two categories: Those that are likely but not terrible, and those that are terrible but not likely.

In the former category we have things like taking an extra year to finish my dissertation; the mean time to completion for a PhD is over 8 years, so finishing in 6 instead of 5 can hardly be considered catastrophic.

In the latter category we have things like dying from COVID-19. Yes, I’m a male with type A blood and asthma living in a high-risk county; but I’m also a young, healthy nonsmoker living under lockdown. Even without knowing the true fatality rate of the virus, my chances of actually dying from it are surely less than 1%.

But when both of those scenarios are running through my brain at the same time, the first triggers a reaction for “likely” and the second triggers a reaction for “terrible”, and I get this feeling that something terrible is actually likely to happen. And indeed if my probability of dying were as high as my probability of needing a 6th year to finish my PhD, that would be catastrophic.

I suppose it’s a bit strange that the opposite doesn’t happen: I never seem to get the improbability of dying attached to the mildness of needing an extra year. The confusion never seems to trigger “neither terrible nor likely”. Or perhaps it does, and my brain immediately disregards that as not worthy of consideration? It makes a certain sort of sense: An event that is neither probable nor severe doesn’t seem to merit much anxiety.

I suspect that many other people’s brains work the same way, eliding distinctions between different outcomes and ending up with a sort of maximal product of probability and severity.
The solution to this is not an easy one: It requires deliberate effort and extensive practice, and benefits greatly from formal training by a therapist. Counter-intuitively, you need to actually focus more on the scenarios that cause you anxiety, and accept the anxiety that such focus triggers in you. I find that it helps to actually write down the details of each scenario as vividly as possible, and review what I have written later. After doing this enough times, you can build up a greater separation in your mind, and more clearly categorize—this one is likely but not terrible, that one is terrible but not likely. It isn’t a cure, but it definitely helps me a great deal. Perhaps it could help you.

Motivation under trauma

May 3 JDN 2458971

Whenever I ask someone how they are doing lately, I get the same answer: “Pretty good, under the circumstances.” There seems to be a general sense that—at least among the sort of people I interact with regularly—that our own lives are still proceeding more or less normally, as we watch in horror the crises surrounding us. Nothing in particular is going wrong for us specifically. Everything is fine, except for the things that are wrong for everyone everywhere.

One thing that seems to be particularly difficult for a lot of us is the sense that we suddenly have so much time on our hands, but can’t find the motivation to actually use this time productively. So many hours of our lives were wasted on commuting or going to meetings or attending various events we didn’t really care much about but didn’t want to feel like we had missed out on. But now that we have these hours back, we can’t find the strength to use them well.

This is because we are now, as an entire society, experiencing a form of trauma. One of the most common long-term effects of post-traumatic stress disorder is a loss of motivation. Faced with suffering we have no power to control, we are made helpless by this traumatic experience; and this makes us learn to feel helpless in other domains.

There is a classic experiment about learned helplessness; like many old classic experiments, its ethics are a bit questionable. Though unlike many such experiments (glares at Zimbardo), its experimental rigor was ironclad. Dogs were divided into three groups. Group 1 was just a control, where the dogs were tied up for a while and then let go. Dogs in groups 2 and 3 were placed into a crate with a floor that could shock them. Dogs in group 2 had a lever they could press to make the shocks stop. Dogs in group 3 did not. (They actually gave the group 2 dogs control over the group 3 dogs to make the shock times exactly equal; but the dogs had no way to know that, so as far as they knew the shocks ended at random.)

Later, dogs from both groups were put into another crate, where they no longer had a lever to press, but they could jump over a barrier to a different part of the crate where the shocks wouldn’t happen. The dogs from group 2, who had previously had some control over their own pain, were able to quickly learn to do this. The dogs from group 3, who had previously felt pain apparently at random, had a very hard time learning this, if they could ever learn it at all. They’d just lay there and suffer the shocks, unable to bring themselves to even try to leap the barrier.

The group 3 dogs just knew there was nothing they could do. During their previous experience of the trauma, all their actions were futile, and so in this new trauma they were certain that their actions would remain futile. When nothing you do matters, the only sensible thing to do is nothing; and so they did. They had learned to be helpless.

I think for me, chronic migraines were my first crate. For years of my life there was basically nothing I could do to prevent myself from getting migraines—honestly the thing that would have helped most would have been to stop getting up for high school that started at 7:40 AM every morning. Eventually I found a good neurologist and got various treatments, as well as learned about various triggers and found ways to avoid most of them. (Let me know if you ever figure out a way to avoid stress.) My migraines are now far less frequent than they were when I was a teenager, though they are still far more frequent than I would prefer.

Yet, I think I still have not fully unlearned the helplessness that migraines taught me. Every time I get another migraine despite all the medications I’ve taken and all the triggers I’ve religiously avoided, this suffering beyond my control acts as another reminder of the ultimate caprice of the universe. There are so many things in our lives that we cannot control that it can be easy to lose sight of what we can.

This pandemic is a trauma that the whole world is now going through. And perhaps that unity of experience will ultimately save us—it will make us see the world and each other a little differently than we did before.

There are a few things you can do to reduce your own risk of getting or spreading the COVID-19 infection, like washing your hands regularly, avoiding social contact, and wearing masks when you go outside. And of course you should do these things. But the truth really is that there is very little any one of us can do to stop this global pandemic. We can watch the numbers tick up almost in real-time—as of this writing, 1 million cases and over 50,000 deaths in the US, 3 million cases and over 200,000 deaths worldwide—but there is very little we can do to change those numbers.

Sometimes we really are helpless. The challenge we face is not to let this genuine helplessness bleed over and make us feel helpless about other aspects of our lives. We are currently sitting in a crate with no lever, where the shocks will begin and end beyond our control. But the day will come when we are delivered to a new crate, and given the chance to leap over a barrier; we must find the strength to take that leap.

For now, I think we can forgive ourselves for getting less done than we might have hoped. We’re still not really out of that first crate.

Mental illness is different from physical illness.

Post 311 Oct 13 JDN 2458770

There’s something I have heard a lot of people say about mental illness that is obviously well-intentioned, but ultimately misguided: “Mental illness is just like physical illness.”

Sometimes they say it explicitly in those terms. Other times they make analogies, like “If you wouldn’t shame someone with diabetes for using insulin, why shame someone with depression for using SSRIs?”

Yet I don’t think this line of argument will ever meaningfully reduce the stigma surrounding mental illness, because, well, it’s obviously not true.

There are some characteristics of mental illness that are analogous to physical illness—but there are some that really are quite different. And these are not just superficial differences, the way that pancreatic disease is different from liver disease. No one would say that liver cancer is exactly the same as pancreatic cancer; but they’re both obviously of the same basic category. There are differences between physical and mental illness which are both obvious, and fundamental.

Here’s the biggest one: Talk therapy works on mental illness.

You can’t talk yourself out of diabetes. You can’t talk yourself out of myocardial infarct. You can’t even talk yourself out of migraine (though I’ll get back to that one in a little bit). But you can, in a very important sense, talk yourself out of depression.

In fact, talk therapy is one of the most effective treatments for most mental disorders. Cognitive behavioral therapy for depression is on its own as effective as most antidepressants (with far fewer harmful side effects), and the two combined are clearly more effective than either alone. Talk therapy is as effective as medication on bipolar disorder, and considerably better on social anxiety disorder.

To be clear: Talk therapy is not just people telling you to cheer up, or saying it’s “all in your head”, or suggesting that you get more exercise or eat some chocolate. Nor does it consist of you ruminating by yourself and trying to talk yourself out of your disorder. Cognitive behavioral therapy is a very complex, sophisticated series of techniques that require years of expert training to master. Yet, at its core, cognitive therapy really is just a very sophisticated form of talking.

The fact that mental disorders can be so strongly affected by talk therapy shows that there really is an important sense in which mental disorders are “all in your head”, and not just the trivial way that an axe wound or even a migraine is all in your head. It isn’t just the fact that it is physically located in your brain that makes a mental disorder different; it’s something deeper than that.

Here’s the best analogy I can come up with: Physical illness is hardware. Mental illness is software.

If a computer breaks after being dropped on the floor, that’s like an axe wound: An obvious, traumatic source of physical damage that is an unambiguous cause of the failure.

If a computer’s CPU starts overheating, that’s like a physical illness, like diabetes: There may be no particular traumatic cause, or even any clear cause at all, but there is obviously something physically wrong that needs physical intervention to correct.

But if a computer is suffering glitches and showing error messages when it tries to run particular programs, that is like mental illness: Something is wrong not on the low-level hardware, but on the high-level software.

These different types of problem require different types of solutions. If your CPU is overheating, you might want to see about replacing your cooling fan or your heat sink. But if your software is glitching while your CPU is otherwise running fine, there’s no point in replacing your fan or heat sink. You need to get a programmer in there to look at the code and find out where it’s going wrong. A talk therapist is like a programmer: The words they say to you are like code scripts they’re trying to get your processor to run correctly.

Of course, our understanding of computers is vastly better than our understanding of human brains, and as a result, programmers tend to get a lot better results than psychotherapists. (Interestingly they do actually get paid about the same, though! Programmers make about 10% more on average than psychotherapists, and both are solidly within the realm of average upper-middle-class service jobs.) But the basic process is the same: Using your expert knowledge of the system, find the right set of inputs that will fix the underlying code and solve the problem. At no point do you physically intervene on the system; you could do it remotely without ever touching it—and indeed, remote talk therapy is a thing.

What about other neurological illnesses, like migraine or fibromyalgia? Well, I think these are somewhere in between. They’re definitely more physical in some sense than a mental disorder like depression. There isn’t any cognitive content to a migraine the way there is to a depressive episode. When I feel depressed or anxious, I feel depressed or anxious about something. But there’s nothing a migraine is about. To use the technical term in cognitive science, neurological disorders lack the intentionality that mental disorders generally have. “What are you depressed about?” is a question you usually can answer. “What are you migrained about?” generally isn’t.

But like mental disorders, neurological disorders are directly linked to the functioning of the brain, and often seem to operate at a higher level of functional abstraction. The brain doesn’t have pain receptors on itself the way most of your body does; getting a migraine behind your left eye doesn’t actually mean that that specific lobe of your brain is what’s malfunctioning. It’s more like a general alert your brain is sending out that something is wrong, somewhere. And fibromyalgia often feels like it’s taking place in your entire body at once. Moreover, most neurological disorders are strongly correlated with mental disorders—indeed, the comorbidity of depression with migraine and fibromyalgia in particular is extremely high.

Which disorder causes the other? That’s a surprisingly difficult question. Intuitively we might expect the “more physical” disorder to be the primary cause, but that’s not always clear. Successful treatment for depression often improves symptoms of migraine and fibromyalgia as well (though the converse is also true). They seem to be mutually reinforcing one another, and it’s not at all clear which came first. I suppose if I had to venture a guess, I’d say the pain disorders probably have causal precedence over the mood disorders, but I don’t actually know that for a fact.

To stretch my analogy a little, it may be like a software problem that ends up causing a hardware problem, or a hardware problem that ends up causing a software problem. There actually have been a few examples of this, like games with graphics so demanding that they caused GPUs to overheat.

The human brain is a lot more complicated than a computer, and the distinction between software and hardware is fuzzier; we don’t actually have “code” that runs on a “processor”. We have synapses that continually fire on and off and rewire each other. The closest thing we have to code that gets processed in sequence would be our genome, and that is several orders of magnitude less complex than the structure of our brains. Aside from simply physically copying the entire brain down to every synapse, it’s not clear that you could ever “download” a mind, science fiction notwithstanding.

Indeed, anything that changes your mind necessarily also changes your brain; the effects of talking are generally subtler than the effects of a drug (and certainly subtler than the effects of an axe wound!), but they are nevertheless real, physical changes. (This is why it is so idiotic whenever the popular science press comes out with: “New study finds that X actually changes your brain!” where X might be anything from drinking coffee to reading romance novels. Of course it does! If it has an effect on your mind, it did so by having an effect on your brain. That’s the Basic Fact of Cognitive Science.) This is not so different from computers, however: Any change in software is also a physical change, in the form of some sequence of electrical charges that were moved from one place to another. Actual physical electrons are a few microns away from where they otherwise would have been because of what was typed into that code.

Of course I want to reduce the stigma surrounding mental illness. (For both selfish and altruistic reasons, really.) But blatantly false assertions don’t seem terribly productive toward that goal. Mental illness is different from physical illness; we can’t treat it the same.

Procrastination is an anxiety symptom

Aug 18 JDN 2458715

Why do we procrastinate? Some people are chronic procrastinators, while others only do it on occasion, but almost everyone procrastinates: We have something important to do, and we should be working on it, but we find ourselves doing anything else we can think of—cleaning is a popular choice—rather than actually getting to work. This continues until we get so close to the deadline that we have no choice but to rush through the work, lest it not get done at all. The result is more stress and lower-quality work. Why would we put ourselves through this?

There are a few different reasons why people may procrastinate. The one that most behavioral economists lean toward is hyperbolic discounting: Because we undervalue the future relative to the present, we set aside unpleasant tasks for later, when it seems they won’t be as bad.

This could be relevant in some cases, particularly for those who chronically procrastinate on a wide variety of tasks, but I find it increasingly unconvincing.

First of all, there’s the fact that many of the things we do while procrastinating are not particularly pleasant. Some people procrastinate by playing games, but even more procrastinate by cleaning house or reorganizing their desks. These aren’t enjoyable activities that you would want to do as soon as possible to maximize the joy.

Second, most people don’t procrastinate consistently on everything. We procrastinate on particular types of tasks—things we consider particularly important, as a matter of fact. I almost never procrastinate in general: I complete tasks early, I plan ahead, I am always (over)prepared. But lately I’ve been procrastinating on three tasks in particular: Revising my second-year paper to submit to journals, writing grant proposals, and finishing my third-year paper. These tasks are all academic, of course; they all involve a great deal of intellectual effort. But above all, they are high stakes. I didn’t procrastinate on homework for classes, but I’m procrastinating on finishing my dissertation.

Another common explanation for procrastination involves self-control: We can’t stop ourselves from doing whatever seems fun at the moment, when we should be getting down to work on what really matters.

This explanation is even worse: There is no apparent correlation between propensity to procrastinate and general impulsiveness—or, if anything, the correlation seems to be negative. The people I know who procrastinate the most consistently are the least impulsive; they tend to ponder and deliberate every decision, even small decisions for which the extra time spent clearly isn’t worth it.

The explanation I find much more convincing is that procrastination isn’t about self-control or time at all. It’s about anxiety. Procrastination is a form of avoidance: We don’t want to face the painful experience, so we stay away from it as long as we can.

This is certainly how procrastination feels for me: It’s not that I can’t stop myself from doing something fun, it’s that I can’t bring myself to face this particular task that is causing me overwhelming stress.

This also explains why it’s always something important that we procrastinate on: It’s precisely things with high stakes that are going to cause a lot of painful feelings. And anxiety itself is deeply linked to the fear of negative evaluation—which is exactly what you’re afraid of when submitting to a journal or applying for a grant. Usually it’s a bit more metaphorical than that, the “evaluation” of being judged by your peers; but here we are literally talking about a written evaluation from a reviewer.

This is why the most effective methods at reducing procrastination all involve reducing your anxiety surrounding the task. In fact, one of the most important is forgiving yourself for prior failings—including past procrastination. Students who were taught to forgive themselves for procrastinating were less likely to procrastinate in the future. If this were a matter of self-control, forgiving yourself should be counterproductive; but in fact it’s probably the most effective intervention.

Unsurprisingly, those with the highest stress level had the highest rates of procrastination (causality could run both ways there); but this is much less true for those who are good at practicing self-compassion. The idea behind self-compassion is very simple: Treat yourself as kindly as you would treat someone you care about.

I am extraordinarily bad at self-compassion. It is probably my greatest weakness. If we were to measure self-compassion by the gap between how kind you are to yourself and how kind you are to others, I would probably have one of the largest gaps in the world. Compassion for others has been a driving force in my life for as long as I can remember, and I put my money where my mouth is, giving at least 8% of my gross income to top-rated international charities every year. But compassion for myself feels inauthentic, even alien; I brutally punish myself for every failure, every moment of weakness. If someone else treated me the way I treat myself, I’d consider them abusive. It’s something I’ve struggled with for many years.

Really, the wonder is that I don’t procrastinate more; I think it’s because I’m already doing most of the things that people will tell you to do to avoid procrastination, like scheduling specific tasks to specific times and prioritizing a small number of important tasks each day. I even keep track of how I actually use my time (I call it “descriptive scheduling”, as opposed to conventional “normative scheduling”), and use that information to make my future schedules more realistic—thus avoiding or at least mitigating the planning fallacy. But when it’s just too intimidating to even look at the paper I’m supposed to be revising, none of that works.

If you too are struggling with procrastination (and odds of that are quite high), I’m afraid that I don’t have any brilliant advice for you today. I can recommend those scheduling techniques, and they may help; but the ultimate cause of procrastination is not bad scheduling or planning but something much deeper: anxiety about the task itself and being evaluated upon it. Procrastination is not laziness or lack of self-control: It’s an anxiety symptom.

Information theory proves that multiple-choice is stupid

Mar 19, JDN 2457832

This post is a bit of a departure from my usual topics, but it’s something that has bothered me for a long time, and I think it fits broadly into the scope of uniting economics with the broader realm of human knowledge.

Multiple-choice questions are inherently and objectively poor methods of assessing learning.

Consider the following question, which is adapted from actual tests I have been required to administer and grade as a teaching assistant (that is, the style of question is the same; I’ve changed the details so that it wouldn’t be possible to just memorize the response—though in a moment I’ll get to why all this paranoia about students seeing test questions beforehand would also be defused if we stopped using multiple-choice):

The demand for apples follows the equation Q = 100 – 5 P.
The supply of apples follows the equation Q = 10 P.
If a tax of $2 per apple is imposed, what is the equilibrium price, quantity, tax revenue, consumer surplus, and producer surplus?

A. Price = $5, Quantity = 10, Tax revenue = $50, Consumer Surplus = $360, Producer Surplus = $100

B. Price = $6, Quantity = 20, Tax revenue = $40, Consumer Surplus = $200, Producer Surplus = $300

C. Price = $6, Quantity = 60, Tax revenue = $120, Consumer Surplus = $360, Producer Surplus = $300

D. Price = $5, Quantity = 60, Tax revenue = $120, Consumer Surplus = $280, Producer Surplus = $500

You could try solving this properly, setting supply equal to demand, adjusting for the tax, finding the equilibrium, and calculating the surplus, but don’t bother. If I were tutoring a student in preparing for this test, I’d tell them not to bother. You can get the right answer in only two steps, because of the multiple-choice format.

Step 1: Does tax revenue equal $2 times quantity? We said the tax was $2 per apple.
So that rules out everything except C and D. Welp, quantity must be 60 then.

Step 2: Is quantity 10 times price as the supply curve says? For C they are, for D they aren’t; guess it must be C then.

Now, to do that, you need to have at least a basic understanding of the economics underlying the question (How is tax revenue calculated? What does the supply curve equation mean?). But there’s an even easier technique you can use that doesn’t even require that; it’s called Answer Splicing.

Here’s how it works: You look for repeated values in the answer choices, and you choose the one that has the most repeated values. Prices $5 and $6 are repeated equally, so that’s not helpful (maybe the test designer planned at least that far). Quantity 60 is repeated, other quantities aren’t, so it’s probably that. Likewise with tax revenue $120. Consumer surplus $360 and Producer Surplus $300 are both repeated, so those are probably it. Oh, look, we’ve selected a unique answer choice C, the correct answer!

You could have done answer splicing even if the question were about 18th century German philosophy, or even if the question were written in Arabic or Japanese. In fact you even do it if it were written in a cipher, as long as the cipher was a consistent substitution cipher.

Could the question have been designed to better avoid answer splicing? Probably. But this is actually quite difficult to do, because there is a fundamental tradeoff between two types of “distractors” (as they are known in the test design industry). You want the answer choices to contain correct pieces and resemble the true answer, so that students who basically understand the question but make a mistake in the process still get it wrong. But you also want the answer choices to be distinct enough in a random enough pattern that answer splicing is unreliable. These two goals are inherently contradictory, and the result will always be a compromise between them. Professional test-designers usually lean pretty heavily against answer-splicing, which I think is probably optimal so far as it goes; but I’ve seen many a professor err too far on the side of similar choices and end up making answer splicing quite effective.

But of course, all of this could be completely avoided if I had just presented the question as an open-ended free-response. Then you’d actually have to write down the equations, show me some algebra solving them, and then interpret your results in a coherent way to answer the question I asked. What’s more, if you made a minor mistake somewhere (carried a minus sign over wrong, forgot to divide by 2 when calculating the area of the consumer surplus triangle), I can take off a few points for that error, rather than all the points just because you didn’t get the right answer. At the other extreme, if you just randomly guess, your odds of getting the right answer are miniscule, but even if you did—or copied from someone else—if you don’t show me the algebra you won’t get credit.

So the free-response question is telling me a lot more about what the student actually knows, in a much more reliable way, that is much harder to cheat or strategize against.

Moreover, this isn’t a matter of opinion. This is a theorem of information theory.

The information that is carried over a message channel can be quantitatively measured as its Shannon entropy. It is usually measured in bits, which you may already be familiar with as a unit of data storage and transmission rate in computers—and yes, those are all fundamentally the same thing. A proper formal treatment of information theory would be way too complicated for this blog, but the basic concepts are fairly straightforward: think in terms of how long a sequence of 1s and 0s it would take to convey the message. That is, roughly speaking, the Shannon entropy of that message.

How many bits are conveyed by a multiple-choice response with four choices? 2. Always. At maximum. No exceptions. It is fundamentally, provably, mathematically impossible to convey more than 2 bits of information via a channel that only has 4 possible states. Any multiple-choice response—any multiple-choice response—of four choices can be reduced to the sequence 00, 01, 10, 11.

True-false questions are a bit worse—literally, they convey 1 bit instead of 2. It’s possible to fully encode the entire response to a true-false question as simply 0 or 1.

For comparison, how many bits can I get from the free-response question? Well, in principle the answer to any mathematical question has the cardinality of the real numbers, which is infinite (in some sense beyond infinite, in fact—more infinite than mere “ordinary” infinity); but in reality you can only write down a small number of possible symbols on a page. I can’t actually write down the infinite diversity of numbers between 3.14159 and the true value of pi; in 10 digits or less, I can only (“only”) write down a few billion of them. So let’s suppose that handwritten text has about the same information density as typing, which in ASCII or Unicode has 8 bits—one byte—per character. If the response to this free-response question is 300 characters (note that this paragraph itself is over 800 characters), then the total number of bits conveyed is about 2400.

That is to say, one free-response question conveys six hundred times as much information as a multiple-choice question. Of course, a lot of that information is redundant; there are many possible correct ways to write the answer to a problem (if the answer is 1.5 you could say 3/2 or 6/4 or 1.500, etc.), and many problems have multiple valid approaches to them, and it’s often safe to skip certain steps of algebra when they are very basic, and so on. But it’s really not at all unrealistic to say that I am getting between 10 and 100 times as much useful information about a student from reading one free response than I would from one multiple-choice question.

Indeed, it’s actually a bigger difference than it appears, because when evaluating a student’s performance I’m not actually interested in the information density of the message itself; I’m interested in the product of that information density and its correlation with the true latent variable I’m trying to measure, namely the student’s actual understanding of the content. (A sequence of 500 random symbols would have a very high information density, but would be quite useless in evaluating a student!) Free-response questions aren’t just more information, they are also better information, because they are closer to the real-world problems we are training for, harder to cheat, harder to strategize, nearly impossible to guess, and provided detailed feedback about exactly what the student is struggling with (for instance, maybe they could solve the equilibrium just fine, but got hung up on calculating the consumer surplus).

As I alluded to earlier, free-response questions would also remove most of the danger of students seeing your tests beforehand. If they saw it beforehand, learned how to solve it, memorized the steps, and then were able to carry them out on the test… well, that’s actually pretty close to what you were trying to teach them. It would be better for them to learn a whole class of related problems and then be able to solve any problem from that broader class—but the first step in learning to solve a whole class of problems is in fact learning to solve one problem from that class. Just change a few details each year so that the questions aren’t identical, and you will find that any student who tried to “cheat” by seeing last year’s exam would inadvertently be studying properly for this year’s exam. And then perhaps we could stop making students literally sign nondisclosure agreements when they take college entrance exams. Listen to this Orwellian line from the SAT nondisclosure agreement:

Misconduct includes,but is not limited to:

Taking any test questions or essay topics from the testing room, including through memorization, giving them to anyone else, or discussing them with anyone else through anymeans, including, but not limited to, email, text messages or the Internet

Including through memorization. You are not allowed to memorize SAT questions, because God forbid you actually learn something when we are here to make money off evaluating you.

Multiple-choice tests fail in another way as well; by definition they cannot possibly test generation or recall of knowledge, they can only test recognition. You don’t need to come up with an answer; you know for a fact that the correct answer must be in front of you, and all you need to do is recognize it. Recall and recognition are fundamentally different memory processes, and recall is both more difficult and more important.

Indeed, the real mystery here is why we use multiple-choice exams at all.
There are a few types of very basic questions where multiple-choice is forgivable, because there are just aren’t that many possible valid answers. If I ask whether demand for apples has increased, you can pretty much say “it increased”, “it decreased”, “it stayed the same”, or “it’s impossible to determine”. So a multiple-choice format isn’t losing too much in such a case. But most really interesting and meaningful questions aren’t going to work in this format.

I don’t think it’s even particularly controversial among educators that multiple-choice questions are awful. (Though I do recall an “educational training” seminar a few weeks back that was basically an apologia for multiple choice, claiming that it is totally possible to test “higher-order cognitive skills” using multiple-choice, for reals, believe me.) So why do we still keep using them?

Well, the obvious reason is grading time. The one thing multiple-choice does have over a true free response is that it can be graded efficiently and reliably by machines, which really does make a big difference when you have 300 students in a class. But there are a couple reasons why even this isn’t a sufficient argument.

First of all, why do we have classes that big? It’s absurd. At that point you should just email the students video lectures. You’ve already foreclosed any possibility of genuine student-teacher interaction, so why are you bothering with having an actual teacher? It seems to be that universities have tried to work out what is the absolute maximum rent they can extract by structuring a class so that it is just good enough that students won’t revolt against the tuition, but they can still spend as little as possible by hiring only one adjunct or lecturer when they should have been paying 10 professors.

And don’t tell me they can’t afford to spend more on faculty—first of all, supporting faculty is why you exist. If you can’t afford to spend enough providing the primary service that you exist as an institution to provide, then you don’t deserve to exist as an institution. Moreover, they clearly can afford it—they simply prefer to spend on hiring more and more administrators and raising the pay of athletic coaches. PhD comics visualized it quite well; the average pay for administrators is three times that of even tenured faculty, and athletic coaches make ten times as much as faculty. (And here I think the mean is the relevant figure, as the mean income is what can be redistributed. Firing one administrator making $300,000 does actually free up enough to hire three faculty making $100,000 or ten grad students making $30,000.)

But even supposing that the institutional incentives here are just too strong, and we will continue to have ludicrously-huge lecture classes into the foreseeable future, there are still alternatives to multiple-choice testing.

Ironically, the College Board appears to have stumbled upon one themselves! About half the SAT math exam is organized into a format where instead of bubbling in one circle to give your 2 bits of answer, you bubble in numbers and symbols corresponding to a more complicated mathematical answer, such as entering “3/4” as “0”, “3”, “/”, “4” or “1.28” as “1”, “.”, “2”, “8”. This could easily be generalized to things like “e^2” as “e”, “^”, “2” and “sin(3pi/2)” as “sin”, “3” “pi”, “/”, “2”. There are 12 possible symbols currently allowed by the SAT, and each response is up to 4 characters, so we have already increased our possible responses from 4 to over 20,000—which is to say from 2 bits to 14. If we generalize it to include symbols like “pi” and “e” and “sin”, and allow a few more characters per response, we could easily get it over 20 bits—10 times as much information as a multiple-choice question.

But we can do better still! Even if we insist upon automation, high-end text-recognition software (of the sort any university could surely afford) is now getting to the point where it could realistically recognize a properly-formatted algebraic formula, so you’d at least know if the student remembered the formula correctly. Sentences could be transcribed into typed text, checked for grammar, and sorted for keywords—which is not nearly as good as a proper reading by an expert professor, but is still orders of magnitude better than filling circle “C”. Eventually AI will make even more detailed grading possible, though at that point we may have AIs just taking over the whole process of teaching. (Leaving professors entirely for research, presumably. Not sure if this would be good or bad.)

Automation isn’t the only answer either. You could hire more graders and teaching assistants—say one for every 30 or 40 students instead of one for every 100 students. (And then the TAs might actually be able to get to know their students! What a concept!) You could give fewer tests, or shorter ones—because a small, reliable sample is actually better than a large, unreliable one. A bonus there would be reducing students’ feelings of test anxiety. You could give project-based assignments, which would still take a long time to grade, but would also be a lot more interesting and fulfilling for both the students and the graders.

Or, and perhaps this is the most radical answer of all: You could stop worrying so much about evaluating student performance.

I get it, you want to know whether students are doing well, both so that you can improve your teaching and so that you can rank the students and decide who deserves various awards and merits. But do you really need to be constantly evaluating everything that students do? Did it ever occur to you that perhaps that is why so many students suffer from anxiety—because they are literally being formally evaluated with long-term consequences every single day they go to school?

If we eased up on all this evaluation, I think the fear is that students would just detach entirely; all teachers know students who only seem to show up in class because they’re being graded on attendance. But there are a couple of reasons to think that maybe this fear isn’t so well-founded after all.

If you give up on constant evaluation, you can open up opportunities to make your classes a lot more creative and interesting—and even fun. You can make students want to come to class, because they get to engage in creative exploration and collaboration instead of memorizing what you drone on at them for hours on end. Most of the reason we don’t do creative, exploratory activities is simply that we don’t know how to evaluate them reliably—so what if we just stopped worrying about that?

Moreover, are those students who only show up for the grade really getting anything out of it anyway? Maybe it would be better if they didn’t show up—indeed, if they just dropped out of college entirely and did something else with their lives until they get their heads on straight. Maybe all this effort that we are currently expending trying to force students to learn who clearly don’t appreciate the value of learning could instead be spent enriching the students who do appreciate learning and came here to do as much of it as possible. Because, ultimately, you can lead a student to algebra, but you can’t make them think. (Let me be clear, I do not mean students with less innate ability or prior preparation; I mean students who aren’t interested in learning and are only showing up because they feel compelled to. I admire students with less innate ability who nonetheless succeed because they work their butts off, and wish I were quite so motivated myself.)
There’s a downside to that, of course. Compulsory education does actually seem to have significant benefits in making people into better citizens. Maybe if we let those students just leave college, they’d never come back, and they would squander their potential. Maybe we need to force them to show up until something clicks in their brains and they finally realize why we’re doing it. In fact, we’re really not forcing them; they could drop out in most cases and simply don’t, probably because their parents are forcing them. Maybe the signaling problem is too fundamental, and the only way we can get unmotivated students to accept not getting prestigious degrees is by going through this whole process of forcing them to show up for years and evaluating everything they do until we can formally justify ultimately failing them. (Of course, almost by construction, a student who does the absolute bare minimum to pass will pass.) But college admission is competitive, and I can’t shake this feeling there are thousands of students out there who got rejected from the school they most wanted to go to, the school they were really passionate about and willing to commit their lives to, because some other student got in ahead of them—and that other student is now sitting in the back of the room playing with an iPhone, grumbling about having to show up for class every day. What about that squandered potential? Perhaps competitive admission and compulsory attendance just don’t mix, and we should stop compelling students once they get their high school diploma.