Information theory proves that multiple-choice is stupid

Mar 19, JDN 2457832

This post is a bit of a departure from my usual topics, but it’s something that has bothered me for a long time, and I think it fits broadly into the scope of uniting economics with the broader realm of human knowledge.

Multiple-choice questions are inherently and objectively poor methods of assessing learning.

Consider the following question, which is adapted from actual tests I have been required to administer and grade as a teaching assistant (that is, the style of question is the same; I’ve changed the details so that it wouldn’t be possible to just memorize the response—though in a moment I’ll get to why all this paranoia about students seeing test questions beforehand would also be defused if we stopped using multiple-choice):

The demand for apples follows the equation Q = 100 – 5 P.
The supply of apples follows the equation Q = 10 P.
If a tax of $2 per apple is imposed, what is the equilibrium price, quantity, tax revenue, consumer surplus, and producer surplus?

A. Price = $5, Quantity = 10, Tax revenue = $50, Consumer Surplus = $360, Producer Surplus = $100

B. Price = $6, Quantity = 20, Tax revenue = $40, Consumer Surplus = $200, Producer Surplus = $300

C. Price = $6, Quantity = 60, Tax revenue = $120, Consumer Surplus = $360, Producer Surplus = $300

D. Price = $5, Quantity = 60, Tax revenue = $120, Consumer Surplus = $280, Producer Surplus = $500

You could try solving this properly, setting supply equal to demand, adjusting for the tax, finding the equilibrium, and calculating the surplus, but don’t bother. If I were tutoring a student in preparing for this test, I’d tell them not to bother. You can get the right answer in only two steps, because of the multiple-choice format.

Step 1: Does tax revenue equal $2 times quantity? We said the tax was $2 per apple.
So that rules out everything except C and D. Welp, quantity must be 60 then.

Step 2: Is quantity 10 times price as the supply curve says? For C they are, for D they aren’t; guess it must be C then.

Now, to do that, you need to have at least a basic understanding of the economics underlying the question (How is tax revenue calculated? What does the supply curve equation mean?). But there’s an even easier technique you can use that doesn’t even require that; it’s called Answer Splicing.

Here’s how it works: You look for repeated values in the answer choices, and you choose the one that has the most repeated values. Prices $5 and $6 are repeated equally, so that’s not helpful (maybe the test designer planned at least that far). Quantity 60 is repeated, other quantities aren’t, so it’s probably that. Likewise with tax revenue $120. Consumer surplus $360 and Producer Surplus $300 are both repeated, so those are probably it. Oh, look, we’ve selected a unique answer choice C, the correct answer!

You could have done answer splicing even if the question were about 18th century German philosophy, or even if the question were written in Arabic or Japanese. In fact you even do it if it were written in a cipher, as long as the cipher was a consistent substitution cipher.

Could the question have been designed to better avoid answer splicing? Probably. But this is actually quite difficult to do, because there is a fundamental tradeoff between two types of “distractors” (as they are known in the test design industry). You want the answer choices to contain correct pieces and resemble the true answer, so that students who basically understand the question but make a mistake in the process still get it wrong. But you also want the answer choices to be distinct enough in a random enough pattern that answer splicing is unreliable. These two goals are inherently contradictory, and the result will always be a compromise between them. Professional test-designers usually lean pretty heavily against answer-splicing, which I think is probably optimal so far as it goes; but I’ve seen many a professor err too far on the side of similar choices and end up making answer splicing quite effective.

But of course, all of this could be completely avoided if I had just presented the question as an open-ended free-response. Then you’d actually have to write down the equations, show me some algebra solving them, and then interpret your results in a coherent way to answer the question I asked. What’s more, if you made a minor mistake somewhere (carried a minus sign over wrong, forgot to divide by 2 when calculating the area of the consumer surplus triangle), I can take off a few points for that error, rather than all the points just because you didn’t get the right answer. At the other extreme, if you just randomly guess, your odds of getting the right answer are miniscule, but even if you did—or copied from someone else—if you don’t show me the algebra you won’t get credit.

So the free-response question is telling me a lot more about what the student actually knows, in a much more reliable way, that is much harder to cheat or strategize against.

Moreover, this isn’t a matter of opinion. This is a theorem of information theory.

The information that is carried over a message channel can be quantitatively measured as its Shannon entropy. It is usually measured in bits, which you may already be familiar with as a unit of data storage and transmission rate in computers—and yes, those are all fundamentally the same thing. A proper formal treatment of information theory would be way too complicated for this blog, but the basic concepts are fairly straightforward: think in terms of how long a sequence of 1s and 0s it would take to convey the message. That is, roughly speaking, the Shannon entropy of that message.

How many bits are conveyed by a multiple-choice response with four choices? 2. Always. At maximum. No exceptions. It is fundamentally, provably, mathematically impossible to convey more than 2 bits of information via a channel that only has 4 possible states. Any multiple-choice response—any multiple-choice response—of four choices can be reduced to the sequence 00, 01, 10, 11.

True-false questions are a bit worse—literally, they convey 1 bit instead of 2. It’s possible to fully encode the entire response to a true-false question as simply 0 or 1.

For comparison, how many bits can I get from the free-response question? Well, in principle the answer to any mathematical question has the cardinality of the real numbers, which is infinite (in some sense beyond infinite, in fact—more infinite than mere “ordinary” infinity); but in reality you can only write down a small number of possible symbols on a page. I can’t actually write down the infinite diversity of numbers between 3.14159 and the true value of pi; in 10 digits or less, I can only (“only”) write down a few billion of them. So let’s suppose that handwritten text has about the same information density as typing, which in ASCII or Unicode has 8 bits—one byte—per character. If the response to this free-response question is 300 characters (note that this paragraph itself is over 800 characters), then the total number of bits conveyed is about 2400.

That is to say, one free-response question conveys six hundred times as much information as a multiple-choice question. Of course, a lot of that information is redundant; there are many possible correct ways to write the answer to a problem (if the answer is 1.5 you could say 3/2 or 6/4 or 1.500, etc.), and many problems have multiple valid approaches to them, and it’s often safe to skip certain steps of algebra when they are very basic, and so on. But it’s really not at all unrealistic to say that I am getting between 10 and 100 times as much useful information about a student from reading one free response than I would from one multiple-choice question.

Indeed, it’s actually a bigger difference than it appears, because when evaluating a student’s performance I’m not actually interested in the information density of the message itself; I’m interested in the product of that information density and its correlation with the true latent variable I’m trying to measure, namely the student’s actual understanding of the content. (A sequence of 500 random symbols would have a very high information density, but would be quite useless in evaluating a student!) Free-response questions aren’t just more information, they are also better information, because they are closer to the real-world problems we are training for, harder to cheat, harder to strategize, nearly impossible to guess, and provided detailed feedback about exactly what the student is struggling with (for instance, maybe they could solve the equilibrium just fine, but got hung up on calculating the consumer surplus).

As I alluded to earlier, free-response questions would also remove most of the danger of students seeing your tests beforehand. If they saw it beforehand, learned how to solve it, memorized the steps, and then were able to carry them out on the test… well, that’s actually pretty close to what you were trying to teach them. It would be better for them to learn a whole class of related problems and then be able to solve any problem from that broader class—but the first step in learning to solve a whole class of problems is in fact learning to solve one problem from that class. Just change a few details each year so that the questions aren’t identical, and you will find that any student who tried to “cheat” by seeing last year’s exam would inadvertently be studying properly for this year’s exam. And then perhaps we could stop making students literally sign nondisclosure agreements when they take college entrance exams. Listen to this Orwellian line from the SAT nondisclosure agreement:

Misconduct includes,but is not limited to:

Taking any test questions or essay topics from the testing room, including through memorization, giving them to anyone else, or discussing them with anyone else through anymeans, including, but not limited to, email, text messages or the Internet

Including through memorization. You are not allowed to memorize SAT questions, because God forbid you actually learn something when we are here to make money off evaluating you.

Multiple-choice tests fail in another way as well; by definition they cannot possibly test generation or recall of knowledge, they can only test recognition. You don’t need to come up with an answer; you know for a fact that the correct answer must be in front of you, and all you need to do is recognize it. Recall and recognition are fundamentally different memory processes, and recall is both more difficult and more important.

Indeed, the real mystery here is why we use multiple-choice exams at all.
There are a few types of very basic questions where multiple-choice is forgivable, because there are just aren’t that many possible valid answers. If I ask whether demand for apples has increased, you can pretty much say “it increased”, “it decreased”, “it stayed the same”, or “it’s impossible to determine”. So a multiple-choice format isn’t losing too much in such a case. But most really interesting and meaningful questions aren’t going to work in this format.

I don’t think it’s even particularly controversial among educators that multiple-choice questions are awful. (Though I do recall an “educational training” seminar a few weeks back that was basically an apologia for multiple choice, claiming that it is totally possible to test “higher-order cognitive skills” using multiple-choice, for reals, believe me.) So why do we still keep using them?

Well, the obvious reason is grading time. The one thing multiple-choice does have over a true free response is that it can be graded efficiently and reliably by machines, which really does make a big difference when you have 300 students in a class. But there are a couple reasons why even this isn’t a sufficient argument.

First of all, why do we have classes that big? It’s absurd. At that point you should just email the students video lectures. You’ve already foreclosed any possibility of genuine student-teacher interaction, so why are you bothering with having an actual teacher? It seems to be that universities have tried to work out what is the absolute maximum rent they can extract by structuring a class so that it is just good enough that students won’t revolt against the tuition, but they can still spend as little as possible by hiring only one adjunct or lecturer when they should have been paying 10 professors.

And don’t tell me they can’t afford to spend more on faculty—first of all, supporting faculty is why you exist. If you can’t afford to spend enough providing the primary service that you exist as an institution to provide, then you don’t deserve to exist as an institution. Moreover, they clearly can afford it—they simply prefer to spend on hiring more and more administrators and raising the pay of athletic coaches. PhD comics visualized it quite well; the average pay for administrators is three times that of even tenured faculty, and athletic coaches make ten times as much as faculty. (And here I think the mean is the relevant figure, as the mean income is what can be redistributed. Firing one administrator making $300,000 does actually free up enough to hire three faculty making $100,000 or ten grad students making $30,000.)

But even supposing that the institutional incentives here are just too strong, and we will continue to have ludicrously-huge lecture classes into the foreseeable future, there are still alternatives to multiple-choice testing.

Ironically, the College Board appears to have stumbled upon one themselves! About half the SAT math exam is organized into a format where instead of bubbling in one circle to give your 2 bits of answer, you bubble in numbers and symbols corresponding to a more complicated mathematical answer, such as entering “3/4” as “0”, “3”, “/”, “4” or “1.28” as “1”, “.”, “2”, “8”. This could easily be generalized to things like “e^2” as “e”, “^”, “2” and “sin(3pi/2)” as “sin”, “3” “pi”, “/”, “2”. There are 12 possible symbols currently allowed by the SAT, and each response is up to 4 characters, so we have already increased our possible responses from 4 to over 20,000—which is to say from 2 bits to 14. If we generalize it to include symbols like “pi” and “e” and “sin”, and allow a few more characters per response, we could easily get it over 20 bits—10 times as much information as a multiple-choice question.

But we can do better still! Even if we insist upon automation, high-end text-recognition software (of the sort any university could surely afford) is now getting to the point where it could realistically recognize a properly-formatted algebraic formula, so you’d at least know if the student remembered the formula correctly. Sentences could be transcribed into typed text, checked for grammar, and sorted for keywords—which is not nearly as good as a proper reading by an expert professor, but is still orders of magnitude better than filling circle “C”. Eventually AI will make even more detailed grading possible, though at that point we may have AIs just taking over the whole process of teaching. (Leaving professors entirely for research, presumably. Not sure if this would be good or bad.)

Automation isn’t the only answer either. You could hire more graders and teaching assistants—say one for every 30 or 40 students instead of one for every 100 students. (And then the TAs might actually be able to get to know their students! What a concept!) You could give fewer tests, or shorter ones—because a small, reliable sample is actually better than a large, unreliable one. A bonus there would be reducing students’ feelings of test anxiety. You could give project-based assignments, which would still take a long time to grade, but would also be a lot more interesting and fulfilling for both the students and the graders.

Or, and perhaps this is the most radical answer of all: You could stop worrying so much about evaluating student performance.

I get it, you want to know whether students are doing well, both so that you can improve your teaching and so that you can rank the students and decide who deserves various awards and merits. But do you really need to be constantly evaluating everything that students do? Did it ever occur to you that perhaps that is why so many students suffer from anxiety—because they are literally being formally evaluated with long-term consequences every single day they go to school?

If we eased up on all this evaluation, I think the fear is that students would just detach entirely; all teachers know students who only seem to show up in class because they’re being graded on attendance. But there are a couple of reasons to think that maybe this fear isn’t so well-founded after all.

If you give up on constant evaluation, you can open up opportunities to make your classes a lot more creative and interesting—and even fun. You can make students want to come to class, because they get to engage in creative exploration and collaboration instead of memorizing what you drone on at them for hours on end. Most of the reason we don’t do creative, exploratory activities is simply that we don’t know how to evaluate them reliably—so what if we just stopped worrying about that?

Moreover, are those students who only show up for the grade really getting anything out of it anyway? Maybe it would be better if they didn’t show up—indeed, if they just dropped out of college entirely and did something else with their lives until they get their heads on straight. Maybe all this effort that we are currently expending trying to force students to learn who clearly don’t appreciate the value of learning could instead be spent enriching the students who do appreciate learning and came here to do as much of it as possible. Because, ultimately, you can lead a student to algebra, but you can’t make them think. (Let me be clear, I do not mean students with less innate ability or prior preparation; I mean students who aren’t interested in learning and are only showing up because they feel compelled to. I admire students with less innate ability who nonetheless succeed because they work their butts off, and wish I were quite so motivated myself.)
There’s a downside to that, of course. Compulsory education does actually seem to have significant benefits in making people into better citizens. Maybe if we let those students just leave college, they’d never come back, and they would squander their potential. Maybe we need to force them to show up until something clicks in their brains and they finally realize why we’re doing it. In fact, we’re really not forcing them; they could drop out in most cases and simply don’t, probably because their parents are forcing them. Maybe the signaling problem is too fundamental, and the only way we can get unmotivated students to accept not getting prestigious degrees is by going through this whole process of forcing them to show up for years and evaluating everything they do until we can formally justify ultimately failing them. (Of course, almost by construction, a student who does the absolute bare minimum to pass will pass.) But college admission is competitive, and I can’t shake this feeling there are thousands of students out there who got rejected from the school they most wanted to go to, the school they were really passionate about and willing to commit their lives to, because some other student got in ahead of them—and that other student is now sitting in the back of the room playing with an iPhone, grumbling about having to show up for class every day. What about that squandered potential? Perhaps competitive admission and compulsory attendance just don’t mix, and we should stop compelling students once they get their high school diploma.

The Asymmetry that Rules the World

JDN 2456921 PDT 13:30.

One single asymmetry underlies millions of problems and challenges the world has always faced. No, it’s not Christianity versus Islam (or atheism). No, it’s not the enormous disparities in wealth between the rich and the poor, though you’re getting warmer.

It is the asymmetry of information—the fundamental fact that what you know and what I know are not the same. If this seems so obvious as to be unworthy of comment, maybe you should tell that to the generations of economists who have assumed perfect information in all of their models.

It’s not clear that information asymmetry could ever go away—even in the utopian post-scarcity economy of the Culture, one of the few sacred rules is the sanctity of individual thought. The closest to an information-symmetric world I can think of is the Borg, and with that in mind we may ask whether we want such a thing after all. It could even be argued that total information symmetry is logically impossible, because once you make two individuals know and believe exactly the same things, you don’t have two individuals anymore, you just have one. (And then where do we draw the line? It’s that damn Ship of Theseus again—except of course the problem was never the ship, but defining the boundaries of Theseus himself.)

Right now you may be thinking: So what? Why is asymmetric information so important? Well, as I mentioned in an earlier post, the Myerson-Satterthwaithe Theorem proves—mathematically proves, as certain as 2+2=4—that in the presence of asymmetric information, there is no market mechanism that guarantees Pareto-efficiency.

You can’t square that circle; because information is asymmetric, there’s just no way to make a free market that insures Pareto efficiency. This result is so strong that it actually makes you begin to wonder if we should just give up on economics entirely! If there’s no way we can possibly make a market that works, why bother at all?

But this is not the appropriate response. First of all, Pareto-efficiency is overrated; there are plenty of bad systems that are Pareto-efficient, and even some good systems that aren’t quite Pareto-efficient.

More importantly, even if there is no perfect market system, there clearly are better and worse market systems. Life is better here in the US than it is in Venezuela. Life in Sweden is arguably a bit better still (though not in every dimension). Life in Zambia and North Korea is absolutely horrific. Clearly there are better and worse ways to run a society, and the market system is a big part of that. The quality—and sometimes quantity—of life of billions of people can be made better or worse by the decisions we make in managing our economic system. Asymmetric information cannot be conquered, but it can be tamed.

This is actually a major subject for cognitive economics: How can we devise systems of regulation that minimize the damage done by asymmetric information? Akerlof’s Nobel was for his work on this subject, especially his famous paper “The Market for Lemons” in which he showed how product quality regulations could increase efficiency using the example of lemon cars. What he showed was, in short, that libertarian deregulation is stupid; removing regulations on product safety and quality doesn’t increase efficiency, it reduces it. (This is of course only true if the regulations are good ones; but despite protests from the supplement industry I really don’t see how “this bottle of pills must contain what it claims to contain” is an illegitimate regulation.)

Unfortunately, the way we currently write regulations leaves much to be desired: Basically, lobbyists pay hundreds of staffers to make hundreds of pages that no human being can be expected to read, and then hands them to Congress with a wink and a reminder of last year’s campaign contributions, who passes them without question. (Can you believe the US is one of the least corrupt governments in the world? Yup, that’s how bad it is out there.) As a result, we have a huge morass of regulations that nobody really understands, and there is a whole “industry” of people whose job it is to decode those regulations and use them to the advantage of whoever is paying them—lawyers. The amount of deadweight loss introduced into our economy is almost incalculable; if I had to guess, I’d have to put it somewhere in the trillions of dollars per year. At the very least, I can tell you that the $200 billion per year spent by corporations on litigation is all deadweight loss due to bad regulation. That is an industry that should not exist—I cannot stress this enough. We’ve become so accustomed to the idea that regulations are this complicated that people have to be paid six-figure salaries to understand them that we never stopped to think whether this made any sense. The US Constitution was originally printed on 6 pages.

The tax code should contain one formula for setting tax brackets with one or two parameters to adjust to circumstances, and then a list of maybe two dozen goods with special excise taxes for their externalities (like gasoline and tobacco). In reality it is over 70,000 pages.

Laws should be written with a clear and general intent, and then any weird cases can be resolved in court—because there will always be cases you couldn’t anticipate. Shakespeare was onto something when he wrote, “First, kill all the lawyers.” (I wouldn’t kill them; I’d fire them and make them find a job doing something genuinely useful, like engineering or management.)

All told, I think you could run an entire country with less than 100 pages of regulations. Furthermore, these should be 100 pages that are taught to every high school student, because after all, we’re supposed to be following them. How are we supposed to follow them if we don’t even know them? There’s a principle called ignorantia non excusatignorance does not excuse—which is frankly Kafkaesque. If you can be arrested for breaking a law you didn’t even know existed, in what sense can we call this a free society? (People make up strawman counterexamples: “Gee, officer, I didn’t know it was illegal to murder people!” But all you need is a standard of reasonable knowledge and due diligence, which courts already use to make decisions.)

So, in that sense, I absolutely favor deregulation. But my reasons are totally different from libertarians: I don’t want regulations to stop constraining businesses, I want regulations to be so simple and clear that no one can get around them. In the system I envision, you wouldn’t be able to sell fraudulent derivatives, because on page 3 it would clearly say that fraud is illegal and punishable in proportion to the amount of money involved.

But until that happens—and let’s face it, it’s gonna be awhile—we’re stuck with these ridiculous regulations, and that introduces a whole new type of asymmetric information. This is the way that regulations can make our economy less efficient; they distort what we can do not just by making it illegal, but by making it so we don’t know what is illegal.

The wealthy and powerful can hire people to explain—or evade—the regulations, while the rest of us are forced to live with them. You’ve felt this in a small way if you’ve ever gotten a parking ticket and didn’t know why. Asymmetric information strikes again.