Reflections at the crossroads

Jan 21 JDN 2460332

When this post goes live, I will have just passed my 36th birthday. (That means I’ve lived for about 1.1 billion seconds, so in order to be as rich as Elon Musk, I’d need to have made, on average, since birth, $200 per second—$720,000 per hour.)

I certainly feel a lot better turning 36 than I did 35. I don’t have any particular additional accomplishments to point to, but my life has already changed quite a bit, in just that one year: Most importantly, I quit my job at the University of Edinburgh, and I am currently in the process of moving out of the UK and back home to Michigan. (We moved the cat over Christmas, and the movers have already come and taken most of our things away; it’s really just us and our luggage now.)

But I still don’t know how to field the question that people have been asking me since I announced my decision to do this months ago:

“What’s next?”

I’m at a crossroads now, trying to determine which path to take. Actually maybe it’s more like a roundabout; it has a whole bunch of different paths, surely not just two or three. The road straight ahead is labeled “stay in academia”; the others at the roundabout are things like “freelance writing”, “software programming”, “consulting”, and “tabletop game publishing”. There’s one well-paved and superficially enticing road that I’m fairly sure I don’t want to take, labeled “corporate finance”.

Right now, I’m just kind of driving around in circles.

Most people don’t seem to quit their jobs without a clear plan for where they will go next. Often they wait until they have another offer in hand that they intend to take. But when I realized just how miserable that job was making me, I made the—perhaps bold, perhaps courageous, perhaps foolish—decision to get out as soon as I possibly could.

It’s still hard for me to fully understand why working at Edinburgh made me so miserable. Many features of an academic career are very appealing to me. I love teaching, I like doing research; I like the relatively flexible hours (and kinda need them, because of my migraines).

I often construct formal decision models to help me make big choices—generally it’s a linear model, where I simply rate each option by its relative quality in a particular dimension, then try different weightings of all the different dimensions. I’ve used this successfully to pick out cars, laptops, even universities. I’m not entrusting my decisions to an algorithm; I often find myself tweaking the parameters to try to get a particular result—but that in itself tells me what I really want, deep down. (Don’t do that in research—people do, and it’s bad—but if the goal is to make yourself happy, your gut feelings are important too.)

My decision models consistently rank university teaching quite high. It generally only gets beaten by freelance writing—which means that maybe I should give freelance writing another try after all.

And yet, my actual experience at Edinburgh was miserable.

What went wrong?

Well, first of all, I should acknowledge that when I separate out the job “university professor” into teaching and research as separate jobs in my decision model, and include all that goes into both jobs—not just the actual teaching, but the grading and administrative tasks; not just doing the research, but also trying to fund and publish it—they both drop lower on the list, and research drops down a lot.

Also, I would rate them both even lower now, having more direct experience of just how awful the exam-grading, grant-writing and journal-submitting can be.

Designing and then grading an exam was tremendously stressful: I knew that many of my students’ futures rested on how they did on exams like this (especially in the UK system, where exams are absurdly overweighted! In most of my classes, the final exam was at least 60% of the grade!). I struggled mightily to make the exam as fair as I could, all the while knowing that it would never really feel fair and I didn’t even have the time to make it the best it could be. You really can’t assess how well someone understands an entire subject in a multiple-choice exam designed to take 90 minutes. It’s impossible.

The worst part of research for me was the rejection.

I mentioned in a previous post how I am hypersensitive to rejection; applying for grants and submitting to journals was clearly the worst feelings of rejection I’ve felt in any job. It felt like they were evaluting not only the value of my work, but my worth as a scientist. Failure felt like being told that my entire career was a waste of time.

It was even worse than the feeling of rejection in freelance writing (which is one of the few things that my model tells me is bad about freelancing as a career for me, along with relatively low and uncertain income). I think the difference is that a book publisher is saying “We don’t think we can sell it.”—’we’ and ‘sell’ being vital. They aren’t saying “this is a bad book; it shouldn’t exist; writing it was a waste of time.”; they’re just saying “It’s not a subgenre we generally work with.” or “We don’t think it’s what the market wants right now.” or even “I personally don’t care for it.”. They acknowledge their own subjective perspective and the fact that it’s ultimately dependent on forecasting the whims of an extremely fickle marketplace. They aren’t really judging my book, and they certainly aren’t judging me.

But in research publishing, it was different. Yes, it’s all in very polite language, thoroughly spiced with sophisticated jargon (though some reviewers are more tactful than others). But when your grant application gets rejected by a funding agency or your paper gets rejected by a journal, the sense really basically is “This project is not worth doing.”; “This isn’t good science.”; “It was/would be a waste of time and money.”; “This (theory or experiment you’ve spent years working on) isn’t interesting or important.” Nobody ever came out and said those things, nor did they come out and say “You’re a bad economist and you should feel bad.”; but honestly a couple of the reviews did kinda read to me like they wanted to say that. They thought that the whole idea that human beings care about each other is fundamentally stupid and naive and not worth talking about, much less running experiments on.

It isn’t so much that I believed them that my work was bad science. I did make some mistakes along the way (but nothing vital; I’ve seen far worse errors by Nobel Laureates). I didn’t have very large samples (because every person I add to the experiment is money I have to pay, and therefore funding I have to come up with). But overall I do believe that my work is sufficiently rigorous to be worth publishing in scientific journals.

It’s more that I came to feel that my work is considered bad, that the kind of work I wanted to do would forever be an uphill battle against an implacable enemy. I already feel exhausted by that battle, and it had only barely begun. I had thought that behavioral economics was a more successful paradigm by now, that it had largely displaced the neoclassical assumptions that came before it; but I was wrong. Except specifically in journals dedicated to experimental and behavioral economics (of which prestigious journals are few—I quickly exhausted them), it really felt like a lot of the feedback I was getting amounted to, “I refuse to believe your paradigm.”.

Part of the problem, also, was that there simply aren’t that many prestigious journals, and they don’t take that many papers. The top 5 journals—which, for whatever reason, command far more respect than any other journals among economists—each accept only about 5-10% of their submissions. Surely more than that are worth publishing; and, to be fair, much of what they reject probably gets published later somewhere else. But it makes a shockingly large difference in your career how many “top 5s” you have; other publications almost don’t matter at all. So once you don’t get into any of those (which of course I didn’t), should you even bother trying to publish somewhere else?

And what else almost doesn’t matter? Your teaching. As long as you show up to class and grade your exams on time (and don’t, like, break the law or something), research universities basically don’t seem to care how good a teacher you are. That was certainly my experience at Edinburgh. (Honestly even their responses to professors sexually abusing their students are pretty unimpressive.)

Some of the other faculty cared, I could tell; there were even some attempts to build a community of colleagues to support each other in improving teaching. But the administration seemed almost actively opposed to it; they didn’t offer any funding to support the program—they wouldn’t even buy us pizza at the meetings, the sort of thing I had as an undergrad for my activist groups—and they wanted to take the time we spent in such pedagogy meetings out of our grading time (probably because if they didn’t, they’d either have to give us less grading, or some of us would be over our allotted hours and they’d owe us compensation).

And honestly, it is teaching that I consider the higher calling.

The difference between 0 people knowing something and 1 knowing it is called research; the difference between 1 person knowing it and 8 billion knowing it is called education.

Yes, of course, research is important. But if all the research suddenly stopped, our civilization would stagnate at its current level of technology, but otherwise continue unimpaired. (Frankly it might spare us the cyberpunk dystopia/AI apocalypse we seem to be hurtling rapidly toward.) Whereas if all education suddenly stopped, our civilization would slowly decline until it ultimately collapsed into the Stone Age. (Actually it might even be worse than that; even Stone Age cultures pass on knowledge to their children, just not through formal teaching. If you include all the ways parents teach their children, it may be literally true that humans cannot survive without education.)

Yet research universities seem to get all of their prestige from their research, not their teaching, and prestige is the thing they absolutely value above all else, so they devote the vast majority of their energy toward valuing and supporting research rather than teaching. In many ways, the administrators seem to see teaching as an obligation, as something they have to do in order to make money that they can spend on what they really care about, which is research.

As such, they are always making classes bigger and bigger, trying to squeeze out more tuition dollars (well, in this case, pounds) from the same number of faculty contact hours. It becomes impossible to get to know all of your students, much less give them all sufficient individual attention. At Edinburgh they even had the gall to refer to their seminars as “tutorials” when they typically had 20+ students. (That is not tutoring!)And then of course there were the lectures, which often had over 200 students.

I suppose it could be worse: It could be athletics they spend all their money on, like most Big Ten universities. (The University of Michigan actually seems to strike a pretty good balance: they are certainly not hurting for athletic funding, but they also devote sizeable chunks of their budget to research, medicine, and yes, even teaching. And unlike virtually all other varsity athletic programs, University of Michigan athletics turns a profit!)

If all the varsity athletics in the world suddenly disappeared… I’m not convinced we’d be any worse off, actually. We’d lose a source of entertainment, but it could probably be easily replaced by, say, Netflix. And universities could re-focus their efforts on academics, instead of acting like a free training and selection system for the pro leagues. The University of California, Irvine certainly seemed no worse off for its lack of varsity football. (Though I admit it felt a bit strange, even to a consummate nerd like me, to have a varsity League of Legends team.)

They keep making the experience of teaching worse and worse, even as they cut faculty salaries and make our jobs more and more precarious.

That might be what really made me most miserable, knowing how expendable I was to the university. If I hadn’t quit when I did, I would have been out after another semester anyway, and going through this same process a bit later. It wasn’t even that I was denied tenure; it was never on the table in the first place. And perhaps because they knew I wouldn’t stay anyway, they didn’t invest anything in mentoring or supporting me. Ostensibly I was supposed to be assigned a faculty mentor immediately; I know the first semester was crazy because of COVID, but after two and a half years I still didn’t have one. (I had a small research budget, which they reduced in the second year; that was about all the support I got. I used it—once.)

So if I do continue on that “academia” road, I’m going to need to do a lot of things differently. I’m not going to put up with a lot of things that I did. I’ll demand a long-term position—if not tenure-track, at least renewable indefinitely, like a lecturer position (as it is in the US, where the tenure-track position is called “assistant professor” and “lecturer” is permanent but not tenured; in the UK, “lecturers” are tenure-track—except at Oxford, and as of 2021, Cambridge—just to confuse you). Above all, I’ll only be applying to schools that actually have some track record for valuing teaching and supporting their faculty.

And if I can’t find any such positions? Then I just won’t apply at all. I’m not going in with the “I’ll take what I can get” mentality I had last time. Our household finances are stable enough that I can afford to wait awhile.

But maybe I won’t even do that. Maybe I’ll take a different path entirely.

For now, I just don’t know.

Do I want to stay in academia?

Apr 5 JDN 2458945

This is a very personal post. You’re not going to learn any new content today; but this is what I needed to write about right now.

I am now nearly finished with my dissertation. It only requires three papers (which, quite honestly, have very little to do with one another). I just got my second paper signed off on, and my third is far enough along that I can probably finish it in a couple of months.

I feel like I ought to be more excited than I am. Mostly what I feel right now is dread.

Yes, some of that dread is the ongoing pandemic—though I am pleased to report that the global number of cases of COVID-19 has substantially undershot the estimates I made last week, suggesting that at least most places are getting the virus under control. The number of cases and number of deaths has about doubled in the past week, which is a lot better than doubling every two days as it was at the start of the pandemic. And that’s all I want to say about COVID-19 today, because I’m sure you’re as tired of the wall-to-wall coverage of it as I am.

But most of the dread is about my own life, mainly my career path. More and more I’m finding that the world of academic research just isn’t working for me. The actual research part I like, and I’m good at it; but then it comes time to publish, and the journal system is so fundamentally broken, so agonizingly capricious, and has such ludicrous power over the careers of young academics that I’m really not sure I want to stay in this line of work. I honestly think I’d prefer they just flip a coin when you graduate and you get a tenure-track job if you get heads. Or maybe journals could roll a 20-sided die for each paper submitted and publish the papers that get 19 or 20. At least then the powers that be couldn’t convince themselves that their totally arbitrary and fundamentally unjust selection process was actually based on deep wisdom and selecting the most qualified individuals.

In any case I’m fairly sure at this point that I won’t have any publications in peer-reviewed journals by the time I graduate. It’s possible I still could—I actually still have decent odds with two co-authored papers, at least—but I certainly do not expect to. My chances of getting into a top journal at this point are basically negligible.

If I weren’t trying to get into academia, that fact would be basically irrelevant. I think most private businesses and government agencies are fairly well aware of the deep defects in the academic publishing system, and really don’t put a whole lot of weight on its conclusions. But in academia, publication is everything. Specifically, publication in top journals.

For this reason, I am now seriously considering leaving academia once I graduate. The more contact I have with the academic publishing system the more miserable I feel. The idea of spending another six or seven years desperately trying to get published in order to satisfy a tenure committee sounds about as appealing right now as having my fingernails pulled out one by one.

This would mean giving up on a lifelong dream. It would mean wondering why I even bothered with the PhD, when the first MA—let alone the second—would probably have been enough for most government or industry careers. And it means trying to fit myself into a new mold that I may find I hate just as much for different reasons: A steady 9-to-5 work schedule is a lot harder to sustain when waking up before 10 AM consistently gives you migraines. (In theory, there are ways to get special accommodations for that sort of thing; in practice, I’m sure most employers would drag their feet as much as possible, because in our culture a phase-delayed circadian rhythm is tantamount to being lazy and therefore worthless.)

Or perhaps I should aim for a lecturer position, perhaps at a smaller college, that isn’t so obsessed with research publication. This would still dull my dream, but would not require abandoning it entirely.

I was asked a few months ago what my dream job is, and I realized: It is almost what I actually have. It is so tantalizingly close to what I am actually headed for that it is painful. The reality is a twisted mirror of the dream.

I want to teach. I want to do research. I want to write. And I get to do those things, yes. But I want to them without the layers of bureaucracy, without the tiers of arbitrary social status called ‘prestige’, without the hyper-competitive and capricious system of journal publication. Honestly I want to do them without grading or dealing with publishers at all—though I can at least understand why some mechanisms for evaluating student progress and disseminating research are useful, even if our current systems for doing so are fundamentally defective.

It feels as though I have been running a marathon, but was only given a vague notion of the route beforehand. There were a series of flags to follow: This way to the bachelor’s, this way to the master’s, that way to advance to candidacy. Then when I come to the last set of flags, the finish line now visible at the horizon, I see that there is an obstacle course placed in my way, with obstacles I was never warned about, much less trained for. A whole new set of skills, maybe even a whole different personality, is necessary to surpass these new obstacles, and I feel utterly unprepared.

It is as if the last mile of my marathon must bedone on horseback, and I’ve never learned to ride a horse—no one ever told me I would need to ride a horse. (Or maybe they did and I didn’t listen?) And now every time I try to mount one, I fall off immediately; and the injuries I sustain seem to be worse every time. The bruises I thought would heal only get worse. The horses I must ride are research journals, and the injuries when I fall are psychological—but no less real, all too real. With each attempt I keep hoping that my fear will fade, but instead it only intensifies.

It’s the same pain, the same fear, that pulled me away from fiction writing. I want to go back, I hope to go back—but I am not strong enough now, and cannot be sure I ever will be. I was told that working in a creative profession meant working hard and producing good output; it turns out it doesn’t mean that at all. A successful career in a creative field actually means satisfying the arbitrary desires of a handful of inscrutable gatekeepers. It means rolling the dice over, and over, and over again, each time a little more painful than the last. And it turns out that this just isn’t something I’m good at. It’s not what I’m cut out for. And maybe it never will be.

An incompetent narcissist would surely fare better than I, willing to re-submit whatever refuse they produce a thousand times because they are certain they deserve to succeed. For, deep down, I never feel that I deserve it. Others tell me I do, and I try to believe them; but the only validation that feels like it will be enough is the kind that comes directly from those gatekeepers, the kind that I can never get. And truth be told, maybe if I do finally get that, it still won’t be enough. Maybe nothing ever will be.

If I knew that it would get easier one day, that the pain would, if not go away, at least retreat to a dull roar I could push aside, then maybe I could stay on this path. But this cannot be the rest of my life. If this is really what it means to have an academic career, maybe I don’t want one after all.

Or maybe it’s not academia that’s broken. Maybe it’s just me.

Information theory proves that multiple-choice is stupid

Mar 19, JDN 2457832

This post is a bit of a departure from my usual topics, but it’s something that has bothered me for a long time, and I think it fits broadly into the scope of uniting economics with the broader realm of human knowledge.

Multiple-choice questions are inherently and objectively poor methods of assessing learning.

Consider the following question, which is adapted from actual tests I have been required to administer and grade as a teaching assistant (that is, the style of question is the same; I’ve changed the details so that it wouldn’t be possible to just memorize the response—though in a moment I’ll get to why all this paranoia about students seeing test questions beforehand would also be defused if we stopped using multiple-choice):

The demand for apples follows the equation Q = 100 – 5 P.
The supply of apples follows the equation Q = 10 P.
If a tax of $2 per apple is imposed, what is the equilibrium price, quantity, tax revenue, consumer surplus, and producer surplus?

A. Price = $5, Quantity = 10, Tax revenue = $50, Consumer Surplus = $360, Producer Surplus = $100

B. Price = $6, Quantity = 20, Tax revenue = $40, Consumer Surplus = $200, Producer Surplus = $300

C. Price = $6, Quantity = 60, Tax revenue = $120, Consumer Surplus = $360, Producer Surplus = $300

D. Price = $5, Quantity = 60, Tax revenue = $120, Consumer Surplus = $280, Producer Surplus = $500

You could try solving this properly, setting supply equal to demand, adjusting for the tax, finding the equilibrium, and calculating the surplus, but don’t bother. If I were tutoring a student in preparing for this test, I’d tell them not to bother. You can get the right answer in only two steps, because of the multiple-choice format.

Step 1: Does tax revenue equal $2 times quantity? We said the tax was $2 per apple.
So that rules out everything except C and D. Welp, quantity must be 60 then.

Step 2: Is quantity 10 times price as the supply curve says? For C they are, for D they aren’t; guess it must be C then.

Now, to do that, you need to have at least a basic understanding of the economics underlying the question (How is tax revenue calculated? What does the supply curve equation mean?). But there’s an even easier technique you can use that doesn’t even require that; it’s called Answer Splicing.

Here’s how it works: You look for repeated values in the answer choices, and you choose the one that has the most repeated values. Prices $5 and $6 are repeated equally, so that’s not helpful (maybe the test designer planned at least that far). Quantity 60 is repeated, other quantities aren’t, so it’s probably that. Likewise with tax revenue $120. Consumer surplus $360 and Producer Surplus $300 are both repeated, so those are probably it. Oh, look, we’ve selected a unique answer choice C, the correct answer!

You could have done answer splicing even if the question were about 18th century German philosophy, or even if the question were written in Arabic or Japanese. In fact you even do it if it were written in a cipher, as long as the cipher was a consistent substitution cipher.

Could the question have been designed to better avoid answer splicing? Probably. But this is actually quite difficult to do, because there is a fundamental tradeoff between two types of “distractors” (as they are known in the test design industry). You want the answer choices to contain correct pieces and resemble the true answer, so that students who basically understand the question but make a mistake in the process still get it wrong. But you also want the answer choices to be distinct enough in a random enough pattern that answer splicing is unreliable. These two goals are inherently contradictory, and the result will always be a compromise between them. Professional test-designers usually lean pretty heavily against answer-splicing, which I think is probably optimal so far as it goes; but I’ve seen many a professor err too far on the side of similar choices and end up making answer splicing quite effective.

But of course, all of this could be completely avoided if I had just presented the question as an open-ended free-response. Then you’d actually have to write down the equations, show me some algebra solving them, and then interpret your results in a coherent way to answer the question I asked. What’s more, if you made a minor mistake somewhere (carried a minus sign over wrong, forgot to divide by 2 when calculating the area of the consumer surplus triangle), I can take off a few points for that error, rather than all the points just because you didn’t get the right answer. At the other extreme, if you just randomly guess, your odds of getting the right answer are miniscule, but even if you did—or copied from someone else—if you don’t show me the algebra you won’t get credit.

So the free-response question is telling me a lot more about what the student actually knows, in a much more reliable way, that is much harder to cheat or strategize against.

Moreover, this isn’t a matter of opinion. This is a theorem of information theory.

The information that is carried over a message channel can be quantitatively measured as its Shannon entropy. It is usually measured in bits, which you may already be familiar with as a unit of data storage and transmission rate in computers—and yes, those are all fundamentally the same thing. A proper formal treatment of information theory would be way too complicated for this blog, but the basic concepts are fairly straightforward: think in terms of how long a sequence of 1s and 0s it would take to convey the message. That is, roughly speaking, the Shannon entropy of that message.

How many bits are conveyed by a multiple-choice response with four choices? 2. Always. At maximum. No exceptions. It is fundamentally, provably, mathematically impossible to convey more than 2 bits of information via a channel that only has 4 possible states. Any multiple-choice response—any multiple-choice response—of four choices can be reduced to the sequence 00, 01, 10, 11.

True-false questions are a bit worse—literally, they convey 1 bit instead of 2. It’s possible to fully encode the entire response to a true-false question as simply 0 or 1.

For comparison, how many bits can I get from the free-response question? Well, in principle the answer to any mathematical question has the cardinality of the real numbers, which is infinite (in some sense beyond infinite, in fact—more infinite than mere “ordinary” infinity); but in reality you can only write down a small number of possible symbols on a page. I can’t actually write down the infinite diversity of numbers between 3.14159 and the true value of pi; in 10 digits or less, I can only (“only”) write down a few billion of them. So let’s suppose that handwritten text has about the same information density as typing, which in ASCII or Unicode has 8 bits—one byte—per character. If the response to this free-response question is 300 characters (note that this paragraph itself is over 800 characters), then the total number of bits conveyed is about 2400.

That is to say, one free-response question conveys six hundred times as much information as a multiple-choice question. Of course, a lot of that information is redundant; there are many possible correct ways to write the answer to a problem (if the answer is 1.5 you could say 3/2 or 6/4 or 1.500, etc.), and many problems have multiple valid approaches to them, and it’s often safe to skip certain steps of algebra when they are very basic, and so on. But it’s really not at all unrealistic to say that I am getting between 10 and 100 times as much useful information about a student from reading one free response than I would from one multiple-choice question.

Indeed, it’s actually a bigger difference than it appears, because when evaluating a student’s performance I’m not actually interested in the information density of the message itself; I’m interested in the product of that information density and its correlation with the true latent variable I’m trying to measure, namely the student’s actual understanding of the content. (A sequence of 500 random symbols would have a very high information density, but would be quite useless in evaluating a student!) Free-response questions aren’t just more information, they are also better information, because they are closer to the real-world problems we are training for, harder to cheat, harder to strategize, nearly impossible to guess, and provided detailed feedback about exactly what the student is struggling with (for instance, maybe they could solve the equilibrium just fine, but got hung up on calculating the consumer surplus).

As I alluded to earlier, free-response questions would also remove most of the danger of students seeing your tests beforehand. If they saw it beforehand, learned how to solve it, memorized the steps, and then were able to carry them out on the test… well, that’s actually pretty close to what you were trying to teach them. It would be better for them to learn a whole class of related problems and then be able to solve any problem from that broader class—but the first step in learning to solve a whole class of problems is in fact learning to solve one problem from that class. Just change a few details each year so that the questions aren’t identical, and you will find that any student who tried to “cheat” by seeing last year’s exam would inadvertently be studying properly for this year’s exam. And then perhaps we could stop making students literally sign nondisclosure agreements when they take college entrance exams. Listen to this Orwellian line from the SAT nondisclosure agreement:

Misconduct includes,but is not limited to:

Taking any test questions or essay topics from the testing room, including through memorization, giving them to anyone else, or discussing them with anyone else through anymeans, including, but not limited to, email, text messages or the Internet

Including through memorization. You are not allowed to memorize SAT questions, because God forbid you actually learn something when we are here to make money off evaluating you.

Multiple-choice tests fail in another way as well; by definition they cannot possibly test generation or recall of knowledge, they can only test recognition. You don’t need to come up with an answer; you know for a fact that the correct answer must be in front of you, and all you need to do is recognize it. Recall and recognition are fundamentally different memory processes, and recall is both more difficult and more important.

Indeed, the real mystery here is why we use multiple-choice exams at all.
There are a few types of very basic questions where multiple-choice is forgivable, because there are just aren’t that many possible valid answers. If I ask whether demand for apples has increased, you can pretty much say “it increased”, “it decreased”, “it stayed the same”, or “it’s impossible to determine”. So a multiple-choice format isn’t losing too much in such a case. But most really interesting and meaningful questions aren’t going to work in this format.

I don’t think it’s even particularly controversial among educators that multiple-choice questions are awful. (Though I do recall an “educational training” seminar a few weeks back that was basically an apologia for multiple choice, claiming that it is totally possible to test “higher-order cognitive skills” using multiple-choice, for reals, believe me.) So why do we still keep using them?

Well, the obvious reason is grading time. The one thing multiple-choice does have over a true free response is that it can be graded efficiently and reliably by machines, which really does make a big difference when you have 300 students in a class. But there are a couple reasons why even this isn’t a sufficient argument.

First of all, why do we have classes that big? It’s absurd. At that point you should just email the students video lectures. You’ve already foreclosed any possibility of genuine student-teacher interaction, so why are you bothering with having an actual teacher? It seems to be that universities have tried to work out what is the absolute maximum rent they can extract by structuring a class so that it is just good enough that students won’t revolt against the tuition, but they can still spend as little as possible by hiring only one adjunct or lecturer when they should have been paying 10 professors.

And don’t tell me they can’t afford to spend more on faculty—first of all, supporting faculty is why you exist. If you can’t afford to spend enough providing the primary service that you exist as an institution to provide, then you don’t deserve to exist as an institution. Moreover, they clearly can afford it—they simply prefer to spend on hiring more and more administrators and raising the pay of athletic coaches. PhD comics visualized it quite well; the average pay for administrators is three times that of even tenured faculty, and athletic coaches make ten times as much as faculty. (And here I think the mean is the relevant figure, as the mean income is what can be redistributed. Firing one administrator making $300,000 does actually free up enough to hire three faculty making $100,000 or ten grad students making $30,000.)

But even supposing that the institutional incentives here are just too strong, and we will continue to have ludicrously-huge lecture classes into the foreseeable future, there are still alternatives to multiple-choice testing.

Ironically, the College Board appears to have stumbled upon one themselves! About half the SAT math exam is organized into a format where instead of bubbling in one circle to give your 2 bits of answer, you bubble in numbers and symbols corresponding to a more complicated mathematical answer, such as entering “3/4” as “0”, “3”, “/”, “4” or “1.28” as “1”, “.”, “2”, “8”. This could easily be generalized to things like “e^2” as “e”, “^”, “2” and “sin(3pi/2)” as “sin”, “3” “pi”, “/”, “2”. There are 12 possible symbols currently allowed by the SAT, and each response is up to 4 characters, so we have already increased our possible responses from 4 to over 20,000—which is to say from 2 bits to 14. If we generalize it to include symbols like “pi” and “e” and “sin”, and allow a few more characters per response, we could easily get it over 20 bits—10 times as much information as a multiple-choice question.

But we can do better still! Even if we insist upon automation, high-end text-recognition software (of the sort any university could surely afford) is now getting to the point where it could realistically recognize a properly-formatted algebraic formula, so you’d at least know if the student remembered the formula correctly. Sentences could be transcribed into typed text, checked for grammar, and sorted for keywords—which is not nearly as good as a proper reading by an expert professor, but is still orders of magnitude better than filling circle “C”. Eventually AI will make even more detailed grading possible, though at that point we may have AIs just taking over the whole process of teaching. (Leaving professors entirely for research, presumably. Not sure if this would be good or bad.)

Automation isn’t the only answer either. You could hire more graders and teaching assistants—say one for every 30 or 40 students instead of one for every 100 students. (And then the TAs might actually be able to get to know their students! What a concept!) You could give fewer tests, or shorter ones—because a small, reliable sample is actually better than a large, unreliable one. A bonus there would be reducing students’ feelings of test anxiety. You could give project-based assignments, which would still take a long time to grade, but would also be a lot more interesting and fulfilling for both the students and the graders.

Or, and perhaps this is the most radical answer of all: You could stop worrying so much about evaluating student performance.

I get it, you want to know whether students are doing well, both so that you can improve your teaching and so that you can rank the students and decide who deserves various awards and merits. But do you really need to be constantly evaluating everything that students do? Did it ever occur to you that perhaps that is why so many students suffer from anxiety—because they are literally being formally evaluated with long-term consequences every single day they go to school?

If we eased up on all this evaluation, I think the fear is that students would just detach entirely; all teachers know students who only seem to show up in class because they’re being graded on attendance. But there are a couple of reasons to think that maybe this fear isn’t so well-founded after all.

If you give up on constant evaluation, you can open up opportunities to make your classes a lot more creative and interesting—and even fun. You can make students want to come to class, because they get to engage in creative exploration and collaboration instead of memorizing what you drone on at them for hours on end. Most of the reason we don’t do creative, exploratory activities is simply that we don’t know how to evaluate them reliably—so what if we just stopped worrying about that?

Moreover, are those students who only show up for the grade really getting anything out of it anyway? Maybe it would be better if they didn’t show up—indeed, if they just dropped out of college entirely and did something else with their lives until they get their heads on straight. Maybe all this effort that we are currently expending trying to force students to learn who clearly don’t appreciate the value of learning could instead be spent enriching the students who do appreciate learning and came here to do as much of it as possible. Because, ultimately, you can lead a student to algebra, but you can’t make them think. (Let me be clear, I do not mean students with less innate ability or prior preparation; I mean students who aren’t interested in learning and are only showing up because they feel compelled to. I admire students with less innate ability who nonetheless succeed because they work their butts off, and wish I were quite so motivated myself.)
There’s a downside to that, of course. Compulsory education does actually seem to have significant benefits in making people into better citizens. Maybe if we let those students just leave college, they’d never come back, and they would squander their potential. Maybe we need to force them to show up until something clicks in their brains and they finally realize why we’re doing it. In fact, we’re really not forcing them; they could drop out in most cases and simply don’t, probably because their parents are forcing them. Maybe the signaling problem is too fundamental, and the only way we can get unmotivated students to accept not getting prestigious degrees is by going through this whole process of forcing them to show up for years and evaluating everything they do until we can formally justify ultimately failing them. (Of course, almost by construction, a student who does the absolute bare minimum to pass will pass.) But college admission is competitive, and I can’t shake this feeling there are thousands of students out there who got rejected from the school they most wanted to go to, the school they were really passionate about and willing to commit their lives to, because some other student got in ahead of them—and that other student is now sitting in the back of the room playing with an iPhone, grumbling about having to show up for class every day. What about that squandered potential? Perhaps competitive admission and compulsory attendance just don’t mix, and we should stop compelling students once they get their high school diploma.