The case against phys ed

Dec 4 JDN 2459918

If I want to stop someone from engaging in an activity, what should I do? I could tell them it’s wrong, and if they believe me, that would work. But what if they don’t believe me? Or I could punish them for doing it, and as long as I can continue to do that reliably, that should deter them from doing it. But what happens after I remove the punishment?

If I really want to make someone not do something, the best way to accomplish that is to make them not want to do it. Make them dread doing it. Make them hate the very thought of it. And to accomplish that, a very efficient method would be to first force them to do it, but make that experience as miserable and humiliating is possible. Give them a wide variety of painful or outright traumatic experiences that are directly connected with the undesired activity, to carry with them for the rest of their life.

This is precisely what physical education does, with regard to exercise. Phys ed is basically optimized to make people hate exercise.

Oh, sure, some students enjoy phys ed. These are the students who are already athletic and fit, who already engage in regular exercise and enjoy doing so. They may enjoy phys ed, may even benefit a little from it—but they didn’t really need it in the first place.

The kids who need more physical activity are the kids who are obese, or have asthma, or suffer from various other disabilities that make exercising difficult and painful for them. And what does phys ed do to those kids? It makes them compete in front of their peers at various athletic tasks at which they will inevitably fail and be humiliated.

Even the kids who are otherwise healthy but just don’t get enough exercise will go into phys ed class at a disadvantage, and instead of being carefully trained to improve their skills and physical condition at their own level, they will be publicly shamed by their peers for their inferior performance.

I know this, because I was one of those kids. I have exercise-induced bronchoconstriction, a lung condition similar to asthma (actually there’s some debate as to whether it should be considered a form of asthma), in which intense aerobic exercise causes the airways of my lungs to become constricted and inflamed, making me unable to get enough air to continue.

It’s really quite remarkable I wasn’t diagnosed with this as a child; I actually once collapsed while running in gym class, and all they thought to do at the time was give me water and let me rest for the remainder of the class. Nobody thought to call the nurse. I was never put on a beta agonist or an inhaler. (In fact at one point I was put on a beta blocker for my migraines; I now understand why I felt so fatigued when taking it—it was literally the opposite of the drug my lungs needed.)

Actually it’s been a few years since I had an attack. This is of course partly due to me generally avoiding intense aerobic exercise; but even when I do get intense exercise, I rarely seem to get bronchoconstriction attacks. My working hypothesis is that the norepinephrine reuptake inhibition of my antidepressant acts like a beta agonist; both drugs mimic norepinephrine.

But as a child, I got such attacks quite frequently; and even when I didn’t, my overall athletic performance was always worse than most of the other kids. They knew it, I knew it, and while only a few actively tried to bully me for it, none of the others did anything to make me feel better. So gym class was always a humiliating and painful experience that I came to dread.

As a result, as soon as I got out of school and had my own autonomy in how to structure my own life, I basically avoided exercise whenever I could. Even knowing that it was good for me—really, exercise is ridiculously good for you; it honestly doesn’t even make sense to me how good it is for you—I could rarely get myself to actually go out and exercise. I certainly couldn’t do it with anyone else; sometimes, if I was very disciplined, I could manage to maintain an exercise routine by myself, as long as there was no one else there who could watch me, judge me, or compare themselves to me.

In fact, I’d probably have avoided exercise even more, had I not also had some more positive experiences with it outside of school. I trained in martial arts for a few years, getting almost to a black belt in tae kwon do; I quit precisely when it started becoming very competitive and thus began to feel humiliated again when I performed worse than others. Part of me wishes I had stuck with it long enough to actually get the black belt; but the rest of me knows that even if I’d managed it, I would have been miserable the whole time and it probably would have made me dread exercise even more.

The details of my story are of course individual to me; but the general pattern is disturbingly common. A kid does poorly in gym class, or even suffers painful attacks of whatever disabling condition they have, but nobody sees it as a medical problem; they just see the kid as weak and lazy. Or even if the adults are sympathetic, the other kids aren’t; they just see a peer who performed worse than them, and they have learned by various subtle (and not-so-subtle) cultural pressures that anyone who performs worse at a culturally-important task is worthy of being bullied and shunned.

Even outside the directly competitive environment of sports, the very structure of a phys ed class, where a large group of students are all expected to perform the same athletic tasks and can directly compare their performance against each other, invites this kind of competition. Kids can see, right in their faces, who is doing better and who is doing worse. And our culture is astonishingly bad at teaching children (or anyone else, for that matter) how to be sympathetic to others who perform worse. Worse performance is worse character. Being bad at running, jumping and climbing is just being bad.

Part of the problem is that school administrators seem to see physical education as a training and selection regimen for their sports programs. (In fact, some of them seem to see their entire school as existing to serve their sports programs.) Here is a UK government report bemoaning the fact that “only a minority of schools play competitive sport to a high level”, apparently not realizing that this is necessarily true because high-level sports performance is a relative concept. Only one team can win the championship each year. Only 10% of students will ever be in the top 10% of athletes. No matter what. Anything else is literally mathematically impossible. We do not live in Lake Wobegon; not all the children can be above average.

There are good phys ed programs out there. They have highly-trained instructors and they focus on matching tasks to a student’s own skill level, as well as actually educating them—teaching them about anatomy and physiology rather than just making them run laps. Actually the one phys ed class I took that I actually enjoyed was actually an anatomy and physiology class; we didn’t do any physical exercise in that class. But well-taught phys ed classes are clearly the exception, not the norm.

Of course, it could be that some students actually benefit from phys ed, perhaps even enough to offset the harms to people like me. (Though then the question should be asked whether phys ed should be compulsory for all students—if an intervention helps some and hurts others, maybe only give it to the ones it helps?) But I know very few people who actually described their experiences of phys ed class as positive ones. While many students describe their experiences of math class in similarly-negative terms (which is also a problem with how math classes are taught), I definitely do know people who actually enjoyed and did well in math class. Still, my sample is surely biased—it’s comprised of people similar to me, and I hated gym and loved math. So let’s look at the actual data.

Or rather, I’d like to, but there isn’t that much out there. The empirical literature on the effects of physical education is surprisingly limited.

A lot of analyses of physical education simply take as axiomatic that more phys ed means more exercise, and so they use the—overwhelming, unassailable—evidence that exercise is good to support an argument for more phys ed classes. But they never seem to stop and take a look at whether phys ed classes are actually making kids exercise more, particularly once those kids grow up and become adults.

In fact, the surprisingly weak correlations between higher physical activity and better mental health among adolescents (despite really strong correlations in adults) could be because exercise among adolescents is largely coerced via phys ed, and the misery of being coerced into physical humiliation counteracts any benefits that might have been obtained from increased exercise.

The best long-term longitudinal study I can find did show positive effects of phys ed on long-term health, though by a rather odd mechanism: Women exercised more as adults if they had phys ed in primary school, but men didn’t; they just smoked less. And this study was back in 1999, studying a cohort of adults who had phys ed quite a long time ago, when it was better funded.

The best experiment I can find actually testing whether phys ed programs work used a very carefully designed phys ed program with a lot of features that it would be really nice to have, but the vast majority of actual gym classes do not, including carefully structured activities with specific developmental goals, and, perhaps most importantly, children were taught to track and evaluate their own individual progress rather than evaluate themselves in comparison to others.

And even then, the effects are not all that large. The physical activity scores of the treatment group rose from 932 minutes per week to 1108 minutes per week for first-graders, and from 1212 to 1454 for second-graders. But the physical activity scores of the control group rose from 906 to 996 for first-graders, and 1105 to 1211 for second-graders. So of the 176 minutes per week gained by first-graders, 90 would have happened anyway. Likewise, of the 242 minutes per week gained by second-graders, 106 were not attributable to the treatment. Only about half of the gains were due to the intervention, and they amount to about a 10% increase in overall physical activity. It also seems a little odd to me that the control groups both started worse off than the experimental groups and both groups gained; it raises some doubts about the randomization.

The researchers also measured psychological effects, and these effects are even smaller and honestly a little weird. On a scale of “somatic anxiety” (basically, how bad do you feel about your body’s physical condition?), this well-designed phys ed program only reduced scores in the treatment group from 4.95 to 4.55 among first-graders, and from 4.50 to 4.10 among second-graders. Seeing as the scores for second-graders also fell in the control group from 4.63 to 4.45, only about half of the observed reduction—0.2 points on a 10-point scale—is really attributable to the treatment. And the really baffling part is that the measure of social anxiety actually fell more, which makes me wonder if they’re really measuring what they think they are.

Clearly, exercise is good. We should be trying to get people to exercise more. Actually, this is more important than almost anything else we could do for public health, with the possible exception of vaccinations. All of these campaigns trying to get kids to lose weight should be removed and replaced with programs to get them to exercise more, because losing weight doesn’t benefit health and exercising more does.

But I am not convinced that physical education as we know it actually makes people exercise more. In the short run, it forces kids to exercise, when there were surely ways to get kids to exercise that didn’t require such coercion; and in the long run, it gives them painful, even traumatic memories of exercise that make them not want to continue it once they get older. It’s too competitive, too one-size-fits-all. It doesn’t account for innate differences in athletic ability or match challenge levels to skill levels. It doesn’t help kids cope with having less ability, or even teach kids to be compassionate toward others with less ability than them.

And it makes kids miserable.

What’s wrong with police unions?

Nov 14 JDN 2459531

In a previous post I talked about why unions, even though they are collusive, are generally a good thing. But there is one very important exception to this rule: Police unions are almost always harmful.

Most recently, police unions have been leading the charge to fight vaccine mandates. This despite the fact that COVID-19 now kills more police officers than any other cause. They threatened that huge numbers of officers would leave if the mandates were imposed—but it didn’t happen.

But there is a much broader pattern than this: Police unions systematically take the side of individual police offers over the interests of public safety. Even the most incompetent, negligent, or outright murderous behavior by police officers will typically be defended by police unions. (One encouraging development is that lately even some police unions have been reluctant to defend the most outrageous killings by police officers—but this very much the exception, not the rule.)

Police unions are also unusual among unions in their political ties. Conservatives generally oppose unions, but are much friendlier toward police unions. At the other end of the spectrum, socialists normally love unions, but have distanced themselves from police unions for a long time. (The argument in that article that this is because “no other job involves killing people” is a bit weird: Ostensibly, the circumstances in which police are allowed to kill people are not all that different from the circumstances in which private citizens are. Just like us, they’re only supposed to use deadly force to prevent death or grievous bodily harm to themselves or others. The main thing that police are allowed to do that we aren’t is imprison people. Killing isn’t supposed to be a major part of the job.)

Police union also have some other weird features. The total membership of all police unions exceeds the total number of police officers in the United States, because a single officer is often affiliated with multiple unions—normally not at all how unions work. Police unions are also especially powerful and well-organized among unions. They are especially well-funded, and their members are especially loyal.

If we were to adopt a categorical view that unions are always good or always bad—as many people seem to want to—it’s difficult to see why police unions should be different from teachers’ unions or factory workers’ unions. But my argument was very careful not to make such categorical statements. Unions aren’t always or inherently good; they are usually good, because of how they are correcting a power imbalance between workers and corporations.

But when it comes to police, the situation is quite different. Police unions give more bargaining power to government officers against… what? Public accountability? The democratic system? Corporate CEOs are accountable only to their shareholders, but the mayors and city councils who decide police policy are elected (in most of the UK, even police commissioners are directly elected). It’s not clear that there was an imbalance in bargaining power here we would want to correct.

A similar case could be made against all public-sector unions, and indeed that case often is extended to teachers’ unions. If we must sacrifice teachers’ unions in order to destroy police unions, I’d be prepared to bite that bullet. But there are vital differences here as well. Teachers are not responsible for imprisoning people, and bad teachers almost never kill people. (In the rare cases in which teachers have committed murder, they have been charged to the full extent of the law, just as they would be in any other profession.) There surely is some misconduct by teachers that some unions may be protecting, but the harm caused by that misconduct is far lower than the harm caused by police misconduct. Teacher unions also provide a layer of protection for teachers to exercise autonomy, promoting academic freedom.

The form of teacher misconduct I would be most concerned about is sexual abuse of students. And while I’ve seen many essays claiming that teacher unions protect sexual abusers, the only concrete evidence I could find on the subject was a teachers’ union publicly complaining that the government had failed to pass stricter laws against sexual abuse by teachers. The research on teacher misconduct mainly focuses on other casual factors aside from union representation.

Even this Fox News article cherry-picking the worst examples of unions protecting abusive teachers includes line after line like “he was ultimately fired”, “he was pressured to resign”, and “his license was suspended”. So their complaint seems to be that it wasn’t done fast enough? But a fair justice system is necessarily slow. False accusations are rare, but they do happen—we can’t just take someone’s word for it. Ensuring that you don’t get fired until the district mounts strong evidence of misconduct against you is exactly what unions should be doing.

Whether unions are good or bad in a particular industry is ultimately an empirical question. So let’s look at the data, shall we? Teacher unions are positively correlated with school performance. But police unions are positively correlated with increased violent misconduct. There you have it: Teacher unions are good, but police unions are bad.

Sheepskin effect doesn’t prove much

Sep 20 JDN 2459113

The sheepskin effect is the observation that the increase in income from graduating from college after four years, relative going through college for three years, is much higher than the increase in income from simply going through college for three years instead of two.

It has been suggested that this provides strong evidence that education is primarily due to signaling, and doesn’t provide any actual value. In this post I’m going to show why this view is mistaken. The sheepskin effect in fact tells us very little about the true value of college. (Noah Smith actually made a pretty decent argument that it provides evidence against signaling!)

To see this, consider two very simple models.

In both models, we’ll assume that markets are competitive but productivity is not directly observable, so employers sort you based on your education level and then pay a wage equal to the average productivity of people at your education level, compensated for the cost of getting that education.

Model 1:

In this model, people all start with the same productivity, and are randomly assigned by their life circumstances to go to either 0, 1, 2, 3, or 4 years of college. College itself has no long-term cost.

The first year of college you learn a lot, the next couple of years you don’t learn much because you’re trying to find your way, and then in the last year of college you learn a lot of specialized skills that directly increase your productivity.

So this is your productivity after x years of college:

Years of collegeProductivity
010
117
222
325
431

We assumed that you’d get paid your productivity, so these are also your wages.

The increase in income each year goes from +7, to +5, to +3, then jumps up to +6. So if you compare the 4-year-minus-3-year gap (+6) with the 3-year-minus-2-year gap (+3), you get a sheepskin effect.

Model 2:

In this model, college is useless and provides no actual benefits. People vary in their intrinsic productivity, which is also directly correlated with the difficulty of making it through college.

In particular, there are five types of people:

TypeProductivityCost per year of college
0108
1116
2144
3193
4310

The wages for different levels of college education are as follows:

Years of collegeWage
010
117
222
325
431

Notice that these are exactly the same wages as in scenario 1. This is of course entirely intentional. In a moment I’ll show why this is a Nash equilibrium.

Consider the choice of how many years of college to attend. You know your type, so you know the cost of college to you. You want to maximize your net benefit, which is the wage you’ll get minus the total cost of going to college.

Let’s assume that if a given year of college isn’t worth it, you won’t try to continue past it and see if more would be.

For a type-0 person, they could get 10 by not going to college at all, or 17-(1)(8) = 9 by going for 1 year, so they stop.

For a type-1 person, they could get 10 by not going to college at all, or 17-(1)(6) = 11 by going for 1 year, or 22-(2)(6) = 10 by going for 2 years, so they stop.

Filling out all the possibilities yields this table:

Years \ Type01234
01010101010
1911131417
2
10141622
3

131925
4


1930

I’d actually like to point out that it was much harder to find numbers that allowed me to make the sheepskin effect work in the second model, where education was all signaling. In the model where education provides genuine benefit, all I need to do is posit that the last year of college is particularly valuable (perhaps because high-level specialized courses are more beneficial to productivity). I could pretty much vary that parameter however I wanted, and get whatever magnitude of sheepskin effect I chose.

For the signaling model, I had to carefully calibrate the parameters so that the costs and benefits lined up just right to make sure that each type chose exactly the amount of college I wanted them to choose while still getting the desired sheepskin effect. It took me about two hours of very frustrating fiddling just to get numbers that worked. And that’s with the assumption that someone who finds 2 years of college not worth it won’t consider trying for 4 years of college (which, given the numbers above, they actually might want to), as well as the assumption that when type-3 individuals are indifferent between staying and dropping out they drop out.

And yet the sheepskin effect is supposed to be evidence that the world works like the signaling model?

I’m sure a more sophisticated model could make the signaling explanation a little more robust. The biggest limitation of these models is that once you observe someone’s education level, you immediately know their true productivity, whether it came from college or not. Realistically we should be allowing for unobserved variation that can’t be sorted out by years of college.

Maybe it seems implausible that the last year of college is actually more beneficial to your productivity than the previous years. This is probably the intuition behind the idea that sheepskin effects are evidence of signaling rather than genuine learning.

So how about this model?

Model 3:

As in the second model, there are four types of people, types 0, 1, 2, 3, and 4. They all start with the same level of productivity, and they have the same cost of going to college; but they get different benefits from going to college.

The problem is, people don’t start out knowing what type they are. Nor can they observe their productivity directly. All they can do is observe their experience of going to college and then try to figure out what type they must be.

Type 0s don’t benefit from college at all, and they know they are type 0; so they don’t go to college.

Type 1s benefit a tiny amount from college (+1 productivity per year), but don’t realize they are type 1s until after one year of college.

Type 2s benefit a little from college (+2 productivity per year), but don’t realize they are type 2s until after two years of college.

Type 3s benefit a moderate amount from college (+3 productivity per year), but don’t realize they are type 3s until after three years of college.

Type 4s benefit a great deal from college (+5 productivity per year), but don’t realize they are type 4s until after three years of college.

What then will happen? Type 0s will not go to college. Type 1s will go one year and then drop out. Type 2s will go two years and then drop out. Type 3s will go three years and then drop out. And type 4s will actually graduate.

That results in the following before-and-after productivity:

TypeProductivity before collegeYears of collegeProductivity after college
010010
110111
210214
310319
410430

If each person is paid a wage equal to their productivity, there will be a huge sheepskin effect; wages only go up +1 for 1 year, +3 for 2 years, +5 for 3 years, but then they jump up to +11 for graduation. It appears that the benefit of that last year of college is more than the other three combined. But in fact it’s not; for any given individual, the benefits of college are the same each year. It’s just that college is more beneficial to the people who decided to stay longer.

And I could of course change that assumption too, making the early years more beneficial, or varying the distribution of types, or adding more uncertainty—and so on. But it’s really not hard at all to make a model where college is beneficial and you observe a large sheepskin effect.

In reality, I am confident that some of the observed benefit of college is due to sorting—not the same thing as signaling—rather than the direct benefits of education. The earnings advantage of going to a top-tier school may be as much about the selection of students as they are the actual quality of the education, since once you control for measures of student ability like GPA and test scores those benefits drop dramatically.

Moreover, I agree that it’s worth looking at this: Insofar as college is about sorting or signaling, it’s wasteful from a societal perspective, and we should be trying to find more efficient sorting mechanisms.

But I highly doubt that all the benefits of college are due to sorting or signaling; there definitely are a lot of important things that people learn in college, not just conventional academic knowledge like how to do calculus, but also broader skills like how to manage time, how to work in groups, and how to present ideas to others. Colleges also cultivate friendships and provide opportunities for networking and exposure to a diverse community. Judging by voting patterns, I’m going to go out on a limb and say that college also makes you a better citizen, which would be well worth it by itself.

The truth is, we don’t know exactly why college is beneficial. We certainly know that it is beneficial: Unemployment rates and median earnings are directly sorted by education level. Yes, even PhDs in philosophy and sociology have lower unemployment and higher incomes (on average) than the general population. (And of course PhDs in economics do better still.)

Green New Deal Part 1: Why aren’t we building more infrastructure?

Apr 7 JDN 2458581

For the next few weeks, I’ll be doing a linked series of posts on the Green New Deal. Some parts of it are obvious and we should have been doing them for decades already; let’s call these “easy parts”. Some parts of it will be difficult, but are definitely worth doing; let’s call these “hard parts”. And some parts of it are quite radical and may ultimately not be feasible—but may still be worth trying; let’s call these “very hard parts”.

Today I’m going to talk about some of the easy parts.

“Repairing and upgrading the infrastructure in the United States, including [. . .] by eliminating pollution and greenhouse gas emissions as much as technologically feasible.”

“Building or upgrading to energy-efficient, distributed, and ‘smart’ power grids, and working to ensure affordable access to electricity.”

“Upgrading all existing buildings in the United States and building new buildings to achieve maximal energy efficiency, water efficiency, safety, affordability, comfort, and durability, including through electrification.”

Every one of these proposals is basically a no-brainer. We should have been spending something like $100 billion dollars a year for the last 30 years doing this, and if we had, we’d have infrastructure that would be the envy of the world.
Instead, the ASCE gives our infrastructure a D+: passing, but just barely. We are still in the top 10 in the World Bank’s infrastructure ratings, but we have been slowly slipping downward in the rankings.

 

Where did I get my $100 billion a year figure from? Well, we have about a $15 billion annual shortfall in highway maintenance, $13 billion in waterway maintenance, and $25 billion in dam repairs. That’s $53 billion. But that’s just to keep what we already have. In order to build more infrastructure, or upgrade it to be better, we’re going to need to spend considerably more. Double it and make it a nice round number, and you get $100 billion.

 

Of course, $100 billion a year is not a small amount of money.
How would we pay for such a thing?

 

That’s the thing: We wouldn’t need to.

 

Infrastructure investment doesn’t have to be “paid for” in the usual sense. We don’t need to raise taxes. We don’t need to cut spending. We can just add infrastructure spending onto other spending, raising the deficit directly. We can borrow money to fund the projects, and then by the time those bonds mature we will have made enough additional tax revenue from the increased productivity (and the Keynes multiplier) that we will have no problem paying back the debt.

 

Funding investment is what debt is supposed to be for. Particularly when interest rates are this low (currently about 3% nominal, which means about 1% adjusted for inflation), there is very little downside to taking out more debt if you’re going to plow that money into productive investments.

 

Of course debt can be used for anything money can, and using debt for all your spending is often not a good idea (but it can be, if your income is inconsistent or you have good reasons to think it will increase in the future). But I’m not suggesting the government should use debt to fund Medicare and Social Security payments; I’m merely suggesting that they should use debt to fund infrastructure investment. Medicare and Social Security are, at their core, social insurance programs; they spread wealth around, which has a lot of important benefits; but they don’t meaningfully create new wealth, so you need to be careful about how you pay for them. Infrastructure investment creates new wealth. The extra value is basically pulled from thin air; you’d be a fool not to take it.

 

This is also why I just can’t get all that upset about student loans (even though I personally would personally stand to gain a small house if student debt were to suddenly evaporate). Education is the most productive investment we have, and most of the benefits of education do actually accrue to the individual who is being educated. It therefore stands to reason that students should pay for their own education, and since most of us couldn’t afford to pay in cash, it stands to reason that we should be offered loans.

 

There are some minor changes I would make to the student loan system, such as lower interest rates, higher limits to subsidized loans, stricter regulations on private student loans, and a simpler forgiveness process that doesn’t result in ridiculous tax liability. But I really don’t see the need to go to a fully taxpayer-funded higher education system. On the other hand, it wouldn’t necessarily be bad to go to a fully taxpayer-funded system; it seems to work quite well in Germany, France, and most of Scandinavia. I just don’t see this as a top priority.

 

It feels awful having $100,000 in debt, but it’s really not that bad when you realize that a college education will increase your lifetime earnings by an average of $1 million (and more like $2 million in my case because I’m going for a PhD, PhDs are more valuable than bachelor’s degrees, and even among PhDs, economists are particularly well-paid). You are being offered the chance to apy $100,000 now to get $1 million later. You should definitely take that deal.

 

And yet, we still aren’t increasing our infrastructure investment. Trump said he would, and it seemed like one of his few actual good ideas (remember the Stopped Clock Principle: reversed stupidity is not intelligence); but so far, no serious infrastructure plan has materialized.

 

Despite extremely strong bipartisan support for increased infrastructure investment, we don’t seem to be able to actually get the job done.
I think I know why.

 

The first reason is that “infrastructure” is a vague concept, almost a feel-good Applause Light like “freedom” or “justice”. Nobody is ever going to say they are against freedom or justice. Instead they’ll disagree about what constitutes freedom or justice.

 

And likewise, while almost everyone will agree that infrastructure as a concept is a good thing, there can be large substantive disagreements over just what kind of infrastructure to build. We want better transportation: Does that mean more roads, or train lines instead? We want cheaper electricity: When we build new power plants, should they use natural gas, solar, or nuclear power? We want to revitalize inner cities: Does that mean public housing, community projects, or subsidies for developers? Nobody wants an inefficient electricity grid, but just how much are we willing to invest in making it more efficient, and how? Once the infrastructure is built, should it be publicly owned and tax-funded, or privatized and run for profit?
This reason is not going to go away. We simply have to face up to it, and find a way to argue substantively for the specific kinds of infrastructure we want. It should be trains, not roads. It should be solar, wind, and nuclear, not natural gas, and certainly not coal or oil. It should be public housing and community projects, not subsidies for developers. Most of the infrastructure should be publicly owned, and what isn’t should be strictly regulated.

 

Yet there is another reason, which I think we might be able to eliminate. Most people seem to think that we need to pay for infrastructure the way we would need to pay for expanded social programs or military spending. They keep asking “How will this be paid for?” (And despite a lot of conservatives frothing about it—I will not give them ad revenue by linking—Alexandria Ocasio-Cortez was not wrong when she said “The same way we pay for everything else.” We tax and spend; that’s what governments do. It’s always a question of what taxes and what spending.)

 

But we really don’t need to pay for infrastructure at all. Infrastructure will pay for itself; we simply need to finance it up front. And when we’re paying real interest rates of 1%, that’s not a difficult thing to do. If interest rates start to rise, we may want to pull back on that; but that’s not something that will happen overnight. We would see it coming, and have a variety of fiscal and monetary tools available to deal with it. The fear of possibly paying a bit more interest 30 years from now is a really stupid reason not to fix bridges that are crumbling today.

 

So when we talk about the Green New Deal (or at least the “easy parts”), let’s throw away this nonsense about “paying for it”. Almost all of these programs are long-term investments; they will pay for themselves. There are still substantive choices to be made about what exactly to build and where and how; but the US is an extraordinarily rich country with virtually unlimited borrowing power.

 

We can afford to do this.

 

Indeed, I think the question we should really be asking is:
How can we afford not to do this?

Impostor Syndrome

Feb 24 JDN 2458539

You probably have experienced Impostor Syndrome, even if you didn’t know the word for it. (Studies estimate that over 70% of the general population, and virtually 100% of graduate students, have experienced it at least once.)

Impostor Syndrome feels like this:

All your life you’ve been building up accomplishments, and people kept praising you for them, but those things were easy, or you’ve just gotten lucky so far. Everyone seems to think you are highly competent, but you know better: Now that you are faced with something that’s actually hard, you can’t do it. You’re not sure you’ll ever be able to do it. You’re scared to try because you know you’ll fail. And now you fear that at any moment, your whole house of cards is going to come crashing down, and everyone will see what a fraud and a failure you truly are.

The magnitude of that feeling varies: For most people it can be a fleeting experience, quickly overcome. But for some it is chronic, overwhelming, and debilitating.

It may surprise you that I am in the latter category. A few years ago, I went to a seminar on Impostor Syndrome, and they played a “Bingo” game where you collect spaces by exhibiting symptoms: I won.

In a group of about two dozen students who were there specifically because they were worried about Impostor Syndrome, I exhibited the most symptoms. On the Clance Impostor Phenomenon Scale, I score 90%. Anything above 60% is considered diagnostic, though there is no DSM disorder specifically for Impostor Syndrome.

Another major cause of Impostor Syndrome is being an underrepresented minority. Women, people of color, and queer people are at particularly high risk. While men are less likely to experience Impostor Syndrome, we tend to experience it more intensely when we do.

Aside from being a graduate student, which is basically coextensive with Impostor Syndrome, being a writer seems to be one of the strongest predictors of Impostor Syndrome. Megan McArdle of The Atlantic theorizes that it’s because we were too good in English class, or, more precisely, that English class was much too easy for us. We came to associate our feelings of competence and accomplishment with tasks simply coming so easily we barely even had to try.

But I think there’s a bigger reason, which is that writers face rejection letters. So many rejection letters. 90% of novels are rejected at the query stage; then a further 80% are rejected at the manuscript review stage; this means that a given query letter has about a 2% chance of acceptance. This means that even if you are doing everything right and will eventually get published, you can on average expected 50 rejection letters. I collected a little over 20 and ran out of steam, my will and self-confidence utterly crushed. But statistically I should have continued for at least 30 more. In fact, it’s worse than that; you should always expect to continue 50 more, up until you finally get accepted—this is a memoryless distribution. And if always having to expect to wait for 50 more rejection letters sounds utterly soul-crushing, that’s because it is.

And that’s something fiction writing has in common with academic research. Top journals in economics have acceptance rates between 3% and 8%. I’d say this means you need to submit between 13 and 34 times to get into a top journal, but that’s nonsense; there are only 5 top journals in economics. So it’s more accurate to say that with any given paper, no matter how many times you submit, you only have about a 30% chance of getting into a top journal. After that, your submissions will necessarily not be to top journals. There are enough good second-tier journals that you can probably get into one eventually—after submitting about a dozen times. And maybe a hiring or tenure committee will care about a second-tier publication. It might count for something. But it’s those top 5 journals that really matter. If for every paper you have in JEBO or JPubE, another candidate has a paper in AER or JPE, they’re going to hire the other candidate. Your paper could use better methodology on a more important question, and be better written—but if for whatever reason AER didn’t like it, that’s what will decide the direction of your career.

If I were trying to design a system that would inflict maximal Impostor Syndrome, I’m not sure I could do much better than this. I guess I’d probably have just one top journal instead of five, and I’d make the acceptance rate 1% instead of 3%. But this whole process of high-stakes checkpoints and low chances of getting on a tenure track that will by no means guarantee actually getting tenure? That’s already quite well-optimized. It’s really a brilliant design, if that’s the objective. You select a bunch of people who have experienced nothing but high achievement their whole lives. If they ever did have low achievement, for whatever reason (could be no fault of their own, you don’t care), you’d exclude them from the start. You give them a series of intensely difficult tasks—tasks literally no one else has ever done that may not even be possible—with minimal support and utterly irrelevant and useless “training”, and evaluate them constantly at extremely high stakes. And then at the end you give them an almost negligible chance of success, and force even those who do eventually succeed to go through multiple steps of failure and rejection beforehand. You really maximize the contrast between how long a streak of uninterrupted successes they must have had in order to be selected in the first place, and how many rejections they have to go through in order to make it to the next level.

(By the way, it’s not that there isn’t enough teaching and research for all these PhD graduates; that’s what universities want you to think. It’s that universities are refusing to open up tenure-track positions and instead relying upon adjuncts and lecturers. And the obvious reason for that is to save money.)

The real question is why we let them put us through this. I’m wondering that more and more every day.

I believe in science. I believe I could make a real contribution to human knowledge—at least, I think I still believe that. But I don’t know how much longer I can stand this gauntlet of constant evaluation and rejection.

I am going through a particularly severe episode of Impostor Syndrome at the moment. I am at an impasse in my third-year research paper, which is supposed to be done by the end of the summer. My dissertation committee wants me to revise my second-year paper to submit to journals, and I just… can’t do it. I have asked for help from multiple sources, and received conflicting opinions. At this point I can’t even bring myself to work on it.

I’ve been aiming for a career as an academic research scientist for as long as I can remember, and everyone tells me that this is what I should do and where I belong—but I don’t really feel like I belong anymore. I don’t know if I have a thick enough skin to get through all these layers of evaluation and rejection. Everyone tells me I’m good at this, but I don’t feel like I am. It doesn’t come easily the way I had come to expect things to come easily. And after I’ve done the research, written the paper—the stuff that I was told was the real work—there are all these extra steps that are actually so much harder, so much more painful—submitting to journals and being rejected over, and over, and over again, practically watching the graph of my career prospects plummet before my eyes.

I think that what really triggered my Impostor Syndrome was finally encountering things I’m not actually good at. It sounds arrogant when I say it, but the truth is, I had never had anything in my entire academic experience that felt genuinely difficult. There were things that were tedious, or time-consuming; there were other barriers I had to deal with, like migraines, depression, and the influenza pandemic. But there was never any actual educational content I had difficulty absorbing and understanding. Maybe if I had, I would be more prepared for this. But of course, if that were the case, they’d never let me into grad school at all. Just to be here, I had to have an uninterrupted streak of easy success after easy success—so now that it’s finally hard, I feel completely blindsided. I’m finally genuinely challenged by something academic, and I can’t handle it. There’s math I don’t know how to do; I’ve never felt this way before.

I know that part of the problem is internal: This is my own mental illness talking. But that isn’t much comfort. Knowing that the problem is me doesn’t exactly reduce the feeling of being a fraud and a failure. And even a problem that is 100% inside my own brain isn’t necessarily a problem I can fix. (I’ve had migraines in my brain for the last 18 years; I still haven’t fixed them.)

There is so much that the academic community could do so easily to make this problem better. Stop using the top 5 journals as a metric, and just look at overall publication rates. Referee publications double-blind, so that grad students know their papers will actually be read and taken seriously, rather than thrown out as soon as the referee sees they don’t already have tenure. Or stop obsessing over publications all together, and look at the detailed content of people’s work instead of maximizing the incentive to keep putting out papers that nobody will ever actually read. Open up more tenure-track faculty positions, and stop hiring lecturers and adjuncts. If you have to save money, do it by cutting salaries for administrators and athletic coaches. And stop evaluating constantly. Get rid of qualifying exams. Get rid of advancement exams. Start from the very beginning of grad school by assigning a mentor to each student and getting directly into working on a dissertation. Don’t make the applied econometrics researchers take exams in macro theory. Don’t make the empirical macroeconomists study game theory. Focus and customize coursework specifically on what grad students will actually need for the research they want to do, and don’t use grades at all. Remove the evaluative element completely. We should feel as though we are allowed to not know things. We should feel as though we are allowed to get things wrong. You are supposed to be teaching us, and you don’t seem to know how to do that; you just evaluate us constantly and expect us to learn on our own.

But none of those changes are going to happen. Certainly not in time for me, and probably not ever, because people like me who want the system to change are precisely the people the current system seems designed to weed out. It’s the ones who make it through the gauntlet, and convince themselves that it was their own brilliance and hard work that carried them through (not luck, not being a White straight upper-middle-class cis male, not even perseverance and resilience in the face of rejection), who end up making the policies for the next generation.

Because those who should be fixing the problem refuse to do so, that leaves the rest of us. What can we do to relieve Impostor Syndrome in ourselves or those around us?

You’d be right to take any advice I give now with a grain of salt; it’s obviously not working that well on me. But maybe it can help someone else. (And again I realize that “Don’t listen to me, I have no idea what I’m talking about” is exactly what someone with Impostor Syndrome would say.)

One of the standard techniques for dealing with Impostor Syndrome is called self-compassion. The idea is to be as forgiving to yourself as you would be to someone you love. I’ve never been good at this. I always hold myself to a much higher standard than I would hold anyone else—higher even than I would allow anyone to impose on someone else. After being told my whole life how brilliant and special I am, I internalized it in perhaps the most toxic way possible: I set my bar higher. Things that other people would count as great success I count as catastrophic failure. “Good enough” is never good enough.

Another good suggestion is to change your comparison set: Don’t compare yourself just to faculty or other grad students, compare yourself to the population as a whole. Others will tell you to stop comparing altogether, but I don’t know if that’s even possible in a capitalist labor market.

I’ve also had people encourage me to focus on my core motivations, remind myself what really matters and why I want to be a scientist in the first place. But it can be hard to keep my eye on that prize. Sometimes I wonder if I’ll ever be able to do the things I originally set out to do, or if it’s trying to fit other people’s molds and being rejected repeatedly over and over again for the rest of my life.

I think the best advice I’ve ever received on dealing with Impostor Syndrome was actually this: “Realize that nobody knows what they’re doing.” The people who are the very best at things… really aren’t all that good at them. If you look around carefully, the evidence of incompetence is everywhere. Look at all the books that get published that weren’t worth writing, all the songs that get recorded that weren’t worth singing. Think about the easily-broken electronic gadgets, the glitchy operating systems, the zero-day exploits, the data breaches, the traffic lights that are timed so badly they make the traffic jams worse. Remember that the leading cause of airplane crashes is pilot error, that medical mistakes are the third-leading cause of death in the United States. Think about every vending machine that ate your dollar, every time your cable went out in a storm. All those people around you who look like they are competent and successful? They aren’t. They are just as confused and ignorant and clumsy as you are. Most of them also feel like frauds, at least some of the time.

The sausage of statistics being made

 

Nov 11 JDN 2458434

“Laws, like sausages, cease to inspire respect in proportion as we know how they are made.”

~ John Godfrey Saxe, not Otto von Bismark

Statistics are a bit like laws and sausages. There are a lot of things in statistical practice that don’t align with statistical theory. The most obvious examples are the fact that many results in statistics are asymptotic: they only strictly apply for infinitely large samples, and in any finite sample they will be some sort of approximation (we often don’t even know how good an approximation).

But the problem runs deeper than this: The whole idea of a p-value was originally supposed to be used to assess one single hypothesis that is the only one you test in your entire study.

That’s frankly a ludicrous expectation: Why would you write a whole paper just to test one parameter?

This is why I don’t actually think this so-called multiple comparisons problem is a problem with researchers doing too many hypothesis tests; I think it’s a problem with statisticians being fundamentally unreasonable about what statistics is useful for. We have to do multiple comparisons, so you should be telling us how to do it correctly.

Statisticians have this beautiful pure mathematics that generates all these lovely asymptotic results… and then they stop, as if they were done. But we aren’t dealing with infinite or even “sufficiently large” samples; we need to know what happens when your sample is 100, not when your sample is 10^29. We can’t assume that our variables are independently identically distributed; we don’t know their distribution, and we’re pretty sure they’re going to be somewhat dependent.

Even in an experimental context where we can randomly and independently assign some treatments, we can’t do that with lots of variables that are likely to matter, like age, gender, nationality, or field of study. And applied econometricians are in an even tighter bind; they often can’t randomize anything. They have to rely upon “instrumental variables” that they hope are “close enough to randomized” relative to whatever they want to study.

In practice what we tend to do is… fudge it. We use the formal statistical methods, and then we step back and apply a series of informal norms to see if the result actually makes sense to us. This is why almost no psychologists were actually convinced by Daryl Bem’s precognition experiments, despite his standard experimental methodology and perfect p < 0.05 results; he couldn’t pass any of the informal tests, particularly the most basic one of not violating any known fundamental laws of physics. We knew he had somehow cherry-picked the data, even before looking at it; nothing else was possible.

This is actually part of where the “hierarchy of sciences” notion is useful: One of the norms is that you’re not allowed to break the rules of the sciences above you, but you can break the rules of the sciences below you. So psychology has to obey physics, but physics doesn’t have to obey psychology. I think this is also part of why there’s so much enmity between economists and anthropologists; really we should be on the same level, cognizant of each other’s rules, but economists want to be above anthropologists so we can ignore culture, and anthropologists want to be above economists so they can ignore incentives.

Another informal norm is the “robustness check”, in which the researcher runs a dozen different regressions approaching the same basic question from different angles. “What if we control for this? What if we interact those two variables? What if we use a different instrument?” In terms of statistical theory, this doesn’t actually make a lot of sense; the probability distributions f(y|x) of y conditional on x and f(y|x, z) of y conditional on x and z are not the same thing, and wouldn’t in general be closely tied, depending on the distribution f(x|z) of x conditional on z. But in practice, most real-world phenomena are going to continue to show up even as you run a bunch of different regressions, and so we can be more confident that something is a real phenomenon insofar as that happens. If an effect drops out when you switch out a couple of control variables, it may have been a statistical artifact. But if it keeps appearing no matter what you do to try to make it go away, then it’s probably a real thing.

Because of the powerful career incentives toward publication and the strange obsession among journals with a p-value less than 0.05, another norm has emerged: Don’t actually trust p-values that are close to 0.05. The vast majority of the time, a p-value of 0.047 was the result of publication bias. Now if you see a p-value of 0.001, maybe then you can trust it—but you’re still relying on a lot of assumptions even then. I’ve seen some researchers argue that because of this, we should tighten our standards for publication to something like p < 0.01, but that’s missing the point; what we need to do is stop publishing based on p-values. If you tighten the threshold, you’re just going to get more rejected papers and then the few papers that do get published will now have even smaller p-values that are still utterly meaningless.

These informal norms protect us from the worst outcomes of bad research. But they are almost certainly not optimal. It’s all very vague and informal, and different researchers will often disagree vehemently over whether a given interpretation is valid. What we need are formal methods for solving these problems, so that we can have the objectivity and replicability that formal methods provide. Right now, our existing formal tools simply are not up to that task.

There are some things we may never be able to formalize: If we had a formal algorithm for coming up with good ideas, the AIs would already rule the world, and this would be either Terminator or The Culture depending on whether we designed the AIs correctly. But I think we should at least be able to formalize the basic question of “Is this statement likely to be true?” that is the fundamental motivation behind statistical hypothesis testing.

I think the answer is likely to be in a broad sense Bayesian, but Bayesians still have a lot of work left to do in order to give us really flexible, reliable statistical methods we can actually apply to the messy world of real data. In particular, tell us how to choose priors please! Prior selection is a fundamental make-or-break problem in Bayesian inference that has nonetheless been greatly neglected by most Bayesian statisticians. So, what do we do? We fall back on informal norms: Try maximum likelihood, which is like using a very flat prior. Try a normally-distributed prior. See if you can construct a prior from past data. If all those give the same thing, that’s a “robustness check” (see previous informal norm).

Informal norms are also inherently harder to teach and learn. I’ve seen a lot of other grad students flail wildly at statistics, not because they don’t know what a p-value means (though maybe that’s also sometimes true), but because they don’t really quite grok the informal underpinnings of good statistical inference. This can be very hard to explain to someone: They feel like they followed all the rules correctly, but you are saying their results are wrong, and now you can’t explain why.

In fact, some of the informal norms that are in wide use are clearly detrimental. In economics, norms have emerged that certain types of models are better simply because they are “more standard”, such as the dynamic stochastic general equilibrium models that can basically be fit to everything and have never actually usefully predicted anything. In fact, the best ones just predict what we already knew from Keynesian models. But without a formal norm for testing the validity of models, it’s been “DSGE or GTFO”. At present, it is considered “nonstandard” (read: “bad”) not to assume that your agents are either a single unitary “representative agent” or a continuum of infinitely-many agents—modeling the actual fact of finitely-many agents is just not done. Yet it’s hard for me to imagine any formal criterion that wouldn’t at least give you some points for correctly including the fact that there is more than one but less than infinity people in the world (obviously your model could still be bad in other ways).

I don’t know what these new statistical methods would look like. Maybe it’s as simple as formally justifying some of the norms we already use; maybe it’s as complicated as taking a fundamentally new approach to statistical inference. But we have to start somewhere.

If you really want grad students to have better mental health, remove all the high-stakes checkpoints

Post 260: Oct 14 JDN 2458406

A study was recently published in Nature Biotechnology showing clear evidence of a mental health crisis among graduate students (no, I don’t know why they picked the biotechnology imprint—I guess it wasn’t good enough for Nature proper?). This is only the most recent of several studies showing exceptionally high rates of mental health issues among graduate students.

I’ve seen universities do a lot of public hand-wringing and lip service about this issue—but I haven’t seen any that were seriously willing to do what it takes to actually solve the problem.

I think this fact became clearest to me when I was required to fill out an official “Individual Development Plan” form as a prerequisite for my advancement to candidacy, which included one question about “What are you doing to support your own mental health and work/life balance?”

The irony here is absolutely excruciating, because advancement to candidacy has been overwhelmingly my leading source of mental health stress for at least the last six months. And it is only one of several different high-stakes checkpoints that grad students are expected to complete, always threatened with defunding or outright expulsion from the graduate program if the checkpoint is not met by a certain arbitrary deadline.

The first of these was the qualifying exams. Then comes advancement to candidacy. Then I have to complete and defend a second-year paper, then a third-year paper. Finally I have to complete and defend a dissertation, and then go onto the job market and go through a gauntlet of applications and interviews. I can’t think of any other time in my life when I was under this much academic and career pressure this consistently—even finishing high school and applying to college wasn’t like this.

If universities really wanted to improve my mental health, they would find a way to get rid of all that.

Granted, a single university does not have total control over all this: There are coordination problems between universities regarding qualifying exams, advancement, and dissertation requirements. One university that unilaterally tried to remove all these would rapidly lose prestige, as it would not be regarded as “rigorous” to reduce the pressure on your grad students. But that itself is precisely the problem—we have equated “rigor” with pressuring grad students until they are on the verge of emotional collapse. Universities don’t seem to know how to make graduate school difficult in the ways that would actually encourage excellence in research and teaching; they simply know how to make it difficult in ways that destroy their students psychologically.

The job market is even more complicated; in the current funding environment, it would be prohibitively expensive to open up enough faculty positions to actually accept even half of all graduating PhDs to tenure-track jobs. Probably the best answer here is to refocus graduate programs on supporting employment outside academia, recognizing both that PhD-level skills are valuable in many workplaces and that not every grad student really wants to become a professor.

But there are clearly ways that universities could mitigate these effects, and they don’t seem genuinely interested in doing so. They could remove the advancement exam, for example; you could simply advance to candidacy as a formality when your advisor decides you are ready, never needing to actually perform a high-stakes presentation before a committee—because what the hell does that accomplish anyway? Speaking of advisors, they could have a formalized matching process that starts with interviewing several different professors and being matched to the one that best fits your goals and interests, instead of expecting you to reach out on your own and hope for the best. They could have you write a dissertation, but not perform a “dissertation defense”—because, again, what can they possibly learn from forcing you to present in a high-stakes environment that they couldn’t have learned from reading your paper and talking with you about it over several months?

They could adjust or even remove funding deadlines—especially for international students. Here at UCI at least, once you are accepted to the program, you are ostensibly guaranteed funding for as long as you maintain reasonable academic progress—but then they define “reasonable progress” in such a way that you have to form an advancement committee, fill out forms, write a paper, and present before a committee all by a certain date or your funding is in jeopardy. Residents of California (which includes all US students who successfully established residency after a full year) are given more time if we need it—but international students aren’t. How is that fair?

The unwillingness of universities to take such actions clearly shows that their commitment to improving students’ mental health is paper-thin. They are only willing to help their students improve their work-life balance as long as it doesn’t require changing anything about the graduate program. They will provide us with counseling services and free yoga classes, but they won’t seriously reduce the pressure they put on us at every step of the way.
I understand that universities are concerned about protecting their prestige, but I ask them this: Does this really improve the quality of your research or teaching output? Do you actually graduate better students by selecting only the ones who can survive being emotionally crushed? Do all these arbitrary high-stakes performances actually result in greater advancement of human knowledge?

Or is it perhaps that you yourselves were put through such hazing rituals years ago, and now your cognitive dissonance won’t let you admit that it was all for naught? “This must be worth doing, or else they wouldn’t have put me through so much suffering!” Are you trying to transfer your own psychological pain onto your students, lest you be forced to face it yourself?

Is grade inflation a real problem?

Mar 4 JDN 2458182

You can’t spend much time teaching at the university level and not hear someone complain about “grade inflation”. Almost every professor seems to believe in it, and yet they must all be participating in it, if it’s really such a widespread problem.

This could be explained as a collective action problem, a Tragedy of the Commons: If the incentives are always to have the students with the highest grades—perhaps because of administrative pressure, or in order to get better reviews from students—then even if all professors would prefer a harsher grading scheme, no individual professor can afford to deviate from the prevailing norms.

But in fact I think there is a much simpler explanation: Grade inflation doesn’t exist.

In economic growth theory, economists make a sharp distinction between inflation—increase in prices without change in underlying fundamentals—and growth—increase in the real value of output. I contend that there is no such thing as grade inflation—what we are in fact observing is grade growth.
Am I saying that students are actually smarter now than they were 30 years ago?

Yes. That’s exactly what I’m saying.

But don’t take it from me. Take it from the decades of research on the Flynn Effect: IQ scores have been rising worldwide at a rate of about 0.3 IQ points per year for as long as we’ve been keeping good records. Students today are about 10 IQ points smarter than students 30 years ago—a 2018 IQ score of 95 is equivalent to a 1988 score of 105, which is equivalent to a 1958 score of 115. There is reason to think this trend won’t continue indefinitely, since the effect is mainly concentrated at the bottom end of the distribution; but it has continued for quite some time already.

This by itself would probably be enough to explain the observed increase in grades, but there’s more: College students are also a self-selected sample, admitted precisely because they were believed to be the smartest individuals in the application pool. Rising grades at top institutions are easily explained by rising selectivity at top schools: Harvard now accepts 5.6% of applicants. In 1942, Harvard accepted 92% of applicants. The odds of getting in have fallen from 9:1 in favor to 19:1 against. Today, you need a 4.0 GPA, a 36 ACT in every category, glowing letters of recommendation, and hundreds of hours of extracurricular activities (or a family member who donated millions of dollars, of course) to get into Harvard. In the 1940s, you needed a high school diploma and a B average.

In fact, when educational researchers have tried to quantitatively study the phenomenon of “grade inflation”, they usually come back with the result that they simply can’t find it. The US department of education conducted a study in 1995 showing that average university grades had declined since 1965. Given that the Flynn effect raised IQ by almost 10 points during that time, maybe we should be panicking about grade deflation.

It really wouldn’t be hard to make that case: “Back in my day, you could get an A just by knowing basic algebra! Now they want these kids to take partial derivatives?” “We used to just memorize facts to ace the exam; but now teachers keep asking for reasoning and critical thinking?”

More recently, a study in 2013 found that grades rose at the high school level, but fell at the college level, and showed no evidence of losing any informativeness as a signaling mechanism. The only recent study I could find showing genuinely compelling evidence for grade inflation was a 2017 study of UK students estimating that grades are growing about twice as fast as the Flynn effect alone would predict. Most studies don’t even consider the possibility that students are smarter than they used to be—they just take it for granted that any increase in average grades constitutes grade inflation. Many of them don’t even control for the increase in selectivity—here’s one using the fact that Harvard’s average rose from 2.7 to 3.4 from 1960 to 2000 as evidence of “grade inflation” when Harvard’s acceptance rate fell from almost 30% to only 10% during that period.

Indeed, the real mystery is why so many professors believe in grade inflation, when the evidence for it is so astonishingly weak.

I think it’s availability heuristic. Who are professors? They are the cream of the crop. They aced their way through high school, college, and graduate school, then got hired and earned tenure—they were one of a handful of individuals who won a fierce competition with hundreds of competitors at each stage. There are over 320 million people in the US, and only 1.3 million college faculty. This means that college professors represent about the top 0.4% of high-scoring students.

Combine that with the fact that human beings assort positively (we like to spend time with people who are similar to us) and use availability heuristic (we judge how likely something is based on how many times we have seen it).

Thus, when a professor compares to her own experience of college, she is remembering her fellow top-scoring students at elite educational institutions. She is recalling the extreme intellectual demands she had to meet to get where she is today, and erroneously assuming that these are representative of most the population of her generation. She probably went to school at one of a handful of elite institutions, even if she now teaches at a mid-level community college: three quarters of college faculty come from the top one quarter of graduate schools.

And now she compares to the students she has to teach, most of whom would not be able to meet such demands—but of course most people in her generation couldn’t either. She frets for the future of humanity only because not everyone is a genius like her.

Throw in the Curse of Knowledge: The professor doesn’t remember how hard it was to learn what she has learned so far, and so the fact that it seems easy now makes her think it was easy all along. “How can they not know how to take partial derivatives!?” Well, let’s see… were you born knowing how to take partial derivatives?

Giving a student an A for work far inferior to what you’d have done in their place isn’t unfair. Indeed, it would clearly be unfair to do anything less. You have years if not decades of additional education ahead of them, and you are from self-selected elite sample of highly intelligent individuals. Expecting everyone to perform as well as you would is simply setting up most of the population for failure.

There are potential incentives for grade inflation that do concern me: In particular, a lot of international student visas and scholarship programs insist upon maintaining a B or even A- average to continue. Professors are understandably loathe to condemn a student to having to drop out or return to their home country just because they scored 81% instead of 84% on the final exam. If we really intend to make C the average score, then students shouldn’t lose funding or visas just for scoring a B-. Indeed, I have trouble defending any threshold above outright failing—which is to say, a minimum score of D-. If you pass your classes, that should be good enough to keep your funding.

Yet apparently even this isn’t creating too much upward bias, as students who are 10 IQ points smarter are still getting about the same scores as their forebears. We should be celebrating that our population is getting smarter, but instead we’re panicking over “easy grading”.

But kids these days, am I right?

Stop telling people they need to vote. Tell them they need to cast informed votes.

Feb 11 JDN 2458161

I just spent last week’s post imploring you to defend the norms of democracy. This week, I want to talk about a norm of democracy that I actually think needs an adjustment.

Right now, there is a very strong norm that simply says: VOTE.

“It is our civic duty to vote.” “You are unpatriotic if you don’t vote.” “Voting is a moral obligation.” Etc.

The goal here is laudable: We want people to express the altruistic motivation that will drive them to escape the so-called Downs Paradox and actually go vote to make democracy work.

But the norm is missing something quite important. It’s not actually such a great thing if everyone just goes out and votes, because most people are seriously, disturbingly uninformed about politics.

The norm shouldn’t be that you must vote. The norm should be that you must cast an informed vote.

Best if you vote informed, but if you won’t get informed, then better if you don’t vote at all. Adding random noise or bias toward physical attractiveness and height does not improve electoral outcomes.

How uninformed are voters?

Most voters don’t understand even basic facts about the federal budget, like the fact that Medicare and Social Security spending are more than defense spending, or the fact that federal aid and earmarks are tiny portions of the budget. A couple years ago I had to debunk a meme that was claiming that we spend a vastly larger portion of the budget on defense than we actually do.

It gets worse: Only a quarter of Americans can even name all three branches of government. Almost half couldn’t identify the Bill of Rights. We literally required them to learn this in high school. By law they were supposed to know this.

But of course I’m not one of the ignorant ones, right? In a classic case of Dunning-Kruger Effect, nobody ever thinks they are. When asked to predict if they would pass the civics exam required to obtain citizenship, 89% of voters surveyed predicted they would. When they took it, only 17% actually passed it. (For the record, I took it and got a perfect score. You can try it yourself here.)

More informed voters already tend to be more politically engaged. But they are almost evenly divided between Democrats and Republicans, which means (especially with the way the Electoral College works) that elections are primarily determined by low-information voters. Low-information voters were decisive for Trump in a way that is unprecedented for as far back as we have data on voter knowledge (which, sadly, is not all that far back).

To be fair, more information is no panacea; humans are very good at rationalizing beliefs that they hold for tribal reasons. People who follow political news heavily typically have more distorted views on some political issues, because they only hear one side and they think they know but they don’t. To truly be more informed voters we must seek out information from reliable, nonpartisan sources, and listen to a variety of sources with differing views. Get your ideas about climate change from NPR or the IPCC, not from Huffington Post—and certainly not from Fox News. But still, maybe it’s worth reading National Review or Reason on occasion. Even when they are usually wrong, it is good for you to expose yourself to views from the other side—because sometimes they can be right. (Reason recently published an excellent article on the huge waste of government funds on building stadiums, for example, and National Review made some really good points against the New Mexico proposal to mandate college applications for high school graduates.)

And of course even those of us who are well-informed obviously have lots of other things we don’t know. Given my expertise in economics and my level of political engagement, I probably know more about politics than 99% of American voters; but I still can’t name more than a handful of members of Congress or really any state legislators aside from the ones who ran for my own district. I can’t even off the top of my head recall who heads the Orange County Water District, even though they literally decide whether I get to drink and take a shower. I’m not asking voters to know everything there is to know about politics, as no human being could possibly do such a thing. I’m merely asking that they know enough basic information to make an informed decision about who to vote for.

Moreover, I think this is a unique time in history where changing this norm has really become viable. We are living in a golden age of information access—almost literally anything you could care to know about politics, you could find in a few minutes of Google searching. I didn’t know who ran my water district, but I looked it up, and I do now: apparently Stephen R. Sheldon. I can’t name that many members of Congress, but I don’t vote for that many members of Congress, and I do carefully research each candidate running in my district when it comes time to vote. (In the next California state legislature election, Mimi Walters has got to go—she has consistently failed to stand against Trump, choosing her party over her constituency.)

This means that if you are uninformed about politics and yet still vote, you chose to do that. You aren’t living in a world where it’s extremely expensive or time-consuming to learn about politics. It is spectacularly easy to learn about politics if you actually want to; if you didn’t learn, it was because you chose not to learn. And if even this tiny cost is too much for you, then how about this? If you don’t have time to get informed, you don’t have time to vote.

Voting electronically would also help with this. People could, in the privacy of their own homes, look up information on candidates while their ballots are right there in front of them. While mail-in voter fraud actually does exist (unlike in-person voter fraud, which basically doesn’t), there are safeguards already in widespread use in Internet-based commerce that we could institute on electronic voting to provide sufficient protection. Basically, all we need to do is public-key signing: issue every voter a private key to sign their votes, which are then decrypted at the county office using a database of public keys. If public keys were stolen, that could compromise secret-ballot anonymity, but it would not allow anyone to actually change votes. Voters could come in person to collect their private keys when they register to vote, at their convenience weeks or months before the election. Of course, we’d have to make it user-friendly enough that people who aren’t very good with computers would understand the system. We could always leave open the option of in-person voting for anyone who prefers that.

Of course, establishing this norm would most likely reduce voter turnout, even if it did successfully increase voter knowledge. But we don’t actually need everyone to vote. We need everyone’s interests accurately represented. If you aren’t willing to get informed, then casting your vote isn’t representing your interests anyway, so why bother?

Information theory proves that multiple-choice is stupid

Mar 19, JDN 2457832

This post is a bit of a departure from my usual topics, but it’s something that has bothered me for a long time, and I think it fits broadly into the scope of uniting economics with the broader realm of human knowledge.

Multiple-choice questions are inherently and objectively poor methods of assessing learning.

Consider the following question, which is adapted from actual tests I have been required to administer and grade as a teaching assistant (that is, the style of question is the same; I’ve changed the details so that it wouldn’t be possible to just memorize the response—though in a moment I’ll get to why all this paranoia about students seeing test questions beforehand would also be defused if we stopped using multiple-choice):

The demand for apples follows the equation Q = 100 – 5 P.
The supply of apples follows the equation Q = 10 P.
If a tax of $2 per apple is imposed, what is the equilibrium price, quantity, tax revenue, consumer surplus, and producer surplus?

A. Price = $5, Quantity = 10, Tax revenue = $50, Consumer Surplus = $360, Producer Surplus = $100

B. Price = $6, Quantity = 20, Tax revenue = $40, Consumer Surplus = $200, Producer Surplus = $300

C. Price = $6, Quantity = 60, Tax revenue = $120, Consumer Surplus = $360, Producer Surplus = $300

D. Price = $5, Quantity = 60, Tax revenue = $120, Consumer Surplus = $280, Producer Surplus = $500

You could try solving this properly, setting supply equal to demand, adjusting for the tax, finding the equilibrium, and calculating the surplus, but don’t bother. If I were tutoring a student in preparing for this test, I’d tell them not to bother. You can get the right answer in only two steps, because of the multiple-choice format.

Step 1: Does tax revenue equal $2 times quantity? We said the tax was $2 per apple.
So that rules out everything except C and D. Welp, quantity must be 60 then.

Step 2: Is quantity 10 times price as the supply curve says? For C they are, for D they aren’t; guess it must be C then.

Now, to do that, you need to have at least a basic understanding of the economics underlying the question (How is tax revenue calculated? What does the supply curve equation mean?). But there’s an even easier technique you can use that doesn’t even require that; it’s called Answer Splicing.

Here’s how it works: You look for repeated values in the answer choices, and you choose the one that has the most repeated values. Prices $5 and $6 are repeated equally, so that’s not helpful (maybe the test designer planned at least that far). Quantity 60 is repeated, other quantities aren’t, so it’s probably that. Likewise with tax revenue $120. Consumer surplus $360 and Producer Surplus $300 are both repeated, so those are probably it. Oh, look, we’ve selected a unique answer choice C, the correct answer!

You could have done answer splicing even if the question were about 18th century German philosophy, or even if the question were written in Arabic or Japanese. In fact you even do it if it were written in a cipher, as long as the cipher was a consistent substitution cipher.

Could the question have been designed to better avoid answer splicing? Probably. But this is actually quite difficult to do, because there is a fundamental tradeoff between two types of “distractors” (as they are known in the test design industry). You want the answer choices to contain correct pieces and resemble the true answer, so that students who basically understand the question but make a mistake in the process still get it wrong. But you also want the answer choices to be distinct enough in a random enough pattern that answer splicing is unreliable. These two goals are inherently contradictory, and the result will always be a compromise between them. Professional test-designers usually lean pretty heavily against answer-splicing, which I think is probably optimal so far as it goes; but I’ve seen many a professor err too far on the side of similar choices and end up making answer splicing quite effective.

But of course, all of this could be completely avoided if I had just presented the question as an open-ended free-response. Then you’d actually have to write down the equations, show me some algebra solving them, and then interpret your results in a coherent way to answer the question I asked. What’s more, if you made a minor mistake somewhere (carried a minus sign over wrong, forgot to divide by 2 when calculating the area of the consumer surplus triangle), I can take off a few points for that error, rather than all the points just because you didn’t get the right answer. At the other extreme, if you just randomly guess, your odds of getting the right answer are miniscule, but even if you did—or copied from someone else—if you don’t show me the algebra you won’t get credit.

So the free-response question is telling me a lot more about what the student actually knows, in a much more reliable way, that is much harder to cheat or strategize against.

Moreover, this isn’t a matter of opinion. This is a theorem of information theory.

The information that is carried over a message channel can be quantitatively measured as its Shannon entropy. It is usually measured in bits, which you may already be familiar with as a unit of data storage and transmission rate in computers—and yes, those are all fundamentally the same thing. A proper formal treatment of information theory would be way too complicated for this blog, but the basic concepts are fairly straightforward: think in terms of how long a sequence of 1s and 0s it would take to convey the message. That is, roughly speaking, the Shannon entropy of that message.

How many bits are conveyed by a multiple-choice response with four choices? 2. Always. At maximum. No exceptions. It is fundamentally, provably, mathematically impossible to convey more than 2 bits of information via a channel that only has 4 possible states. Any multiple-choice response—any multiple-choice response—of four choices can be reduced to the sequence 00, 01, 10, 11.

True-false questions are a bit worse—literally, they convey 1 bit instead of 2. It’s possible to fully encode the entire response to a true-false question as simply 0 or 1.

For comparison, how many bits can I get from the free-response question? Well, in principle the answer to any mathematical question has the cardinality of the real numbers, which is infinite (in some sense beyond infinite, in fact—more infinite than mere “ordinary” infinity); but in reality you can only write down a small number of possible symbols on a page. I can’t actually write down the infinite diversity of numbers between 3.14159 and the true value of pi; in 10 digits or less, I can only (“only”) write down a few billion of them. So let’s suppose that handwritten text has about the same information density as typing, which in ASCII or Unicode has 8 bits—one byte—per character. If the response to this free-response question is 300 characters (note that this paragraph itself is over 800 characters), then the total number of bits conveyed is about 2400.

That is to say, one free-response question conveys six hundred times as much information as a multiple-choice question. Of course, a lot of that information is redundant; there are many possible correct ways to write the answer to a problem (if the answer is 1.5 you could say 3/2 or 6/4 or 1.500, etc.), and many problems have multiple valid approaches to them, and it’s often safe to skip certain steps of algebra when they are very basic, and so on. But it’s really not at all unrealistic to say that I am getting between 10 and 100 times as much useful information about a student from reading one free response than I would from one multiple-choice question.

Indeed, it’s actually a bigger difference than it appears, because when evaluating a student’s performance I’m not actually interested in the information density of the message itself; I’m interested in the product of that information density and its correlation with the true latent variable I’m trying to measure, namely the student’s actual understanding of the content. (A sequence of 500 random symbols would have a very high information density, but would be quite useless in evaluating a student!) Free-response questions aren’t just more information, they are also better information, because they are closer to the real-world problems we are training for, harder to cheat, harder to strategize, nearly impossible to guess, and provided detailed feedback about exactly what the student is struggling with (for instance, maybe they could solve the equilibrium just fine, but got hung up on calculating the consumer surplus).

As I alluded to earlier, free-response questions would also remove most of the danger of students seeing your tests beforehand. If they saw it beforehand, learned how to solve it, memorized the steps, and then were able to carry them out on the test… well, that’s actually pretty close to what you were trying to teach them. It would be better for them to learn a whole class of related problems and then be able to solve any problem from that broader class—but the first step in learning to solve a whole class of problems is in fact learning to solve one problem from that class. Just change a few details each year so that the questions aren’t identical, and you will find that any student who tried to “cheat” by seeing last year’s exam would inadvertently be studying properly for this year’s exam. And then perhaps we could stop making students literally sign nondisclosure agreements when they take college entrance exams. Listen to this Orwellian line from the SAT nondisclosure agreement:

Misconduct includes,but is not limited to:

Taking any test questions or essay topics from the testing room, including through memorization, giving them to anyone else, or discussing them with anyone else through anymeans, including, but not limited to, email, text messages or the Internet

Including through memorization. You are not allowed to memorize SAT questions, because God forbid you actually learn something when we are here to make money off evaluating you.

Multiple-choice tests fail in another way as well; by definition they cannot possibly test generation or recall of knowledge, they can only test recognition. You don’t need to come up with an answer; you know for a fact that the correct answer must be in front of you, and all you need to do is recognize it. Recall and recognition are fundamentally different memory processes, and recall is both more difficult and more important.

Indeed, the real mystery here is why we use multiple-choice exams at all.
There are a few types of very basic questions where multiple-choice is forgivable, because there are just aren’t that many possible valid answers. If I ask whether demand for apples has increased, you can pretty much say “it increased”, “it decreased”, “it stayed the same”, or “it’s impossible to determine”. So a multiple-choice format isn’t losing too much in such a case. But most really interesting and meaningful questions aren’t going to work in this format.

I don’t think it’s even particularly controversial among educators that multiple-choice questions are awful. (Though I do recall an “educational training” seminar a few weeks back that was basically an apologia for multiple choice, claiming that it is totally possible to test “higher-order cognitive skills” using multiple-choice, for reals, believe me.) So why do we still keep using them?

Well, the obvious reason is grading time. The one thing multiple-choice does have over a true free response is that it can be graded efficiently and reliably by machines, which really does make a big difference when you have 300 students in a class. But there are a couple reasons why even this isn’t a sufficient argument.

First of all, why do we have classes that big? It’s absurd. At that point you should just email the students video lectures. You’ve already foreclosed any possibility of genuine student-teacher interaction, so why are you bothering with having an actual teacher? It seems to be that universities have tried to work out what is the absolute maximum rent they can extract by structuring a class so that it is just good enough that students won’t revolt against the tuition, but they can still spend as little as possible by hiring only one adjunct or lecturer when they should have been paying 10 professors.

And don’t tell me they can’t afford to spend more on faculty—first of all, supporting faculty is why you exist. If you can’t afford to spend enough providing the primary service that you exist as an institution to provide, then you don’t deserve to exist as an institution. Moreover, they clearly can afford it—they simply prefer to spend on hiring more and more administrators and raising the pay of athletic coaches. PhD comics visualized it quite well; the average pay for administrators is three times that of even tenured faculty, and athletic coaches make ten times as much as faculty. (And here I think the mean is the relevant figure, as the mean income is what can be redistributed. Firing one administrator making $300,000 does actually free up enough to hire three faculty making $100,000 or ten grad students making $30,000.)

But even supposing that the institutional incentives here are just too strong, and we will continue to have ludicrously-huge lecture classes into the foreseeable future, there are still alternatives to multiple-choice testing.

Ironically, the College Board appears to have stumbled upon one themselves! About half the SAT math exam is organized into a format where instead of bubbling in one circle to give your 2 bits of answer, you bubble in numbers and symbols corresponding to a more complicated mathematical answer, such as entering “3/4” as “0”, “3”, “/”, “4” or “1.28” as “1”, “.”, “2”, “8”. This could easily be generalized to things like “e^2” as “e”, “^”, “2” and “sin(3pi/2)” as “sin”, “3” “pi”, “/”, “2”. There are 12 possible symbols currently allowed by the SAT, and each response is up to 4 characters, so we have already increased our possible responses from 4 to over 20,000—which is to say from 2 bits to 14. If we generalize it to include symbols like “pi” and “e” and “sin”, and allow a few more characters per response, we could easily get it over 20 bits—10 times as much information as a multiple-choice question.

But we can do better still! Even if we insist upon automation, high-end text-recognition software (of the sort any university could surely afford) is now getting to the point where it could realistically recognize a properly-formatted algebraic formula, so you’d at least know if the student remembered the formula correctly. Sentences could be transcribed into typed text, checked for grammar, and sorted for keywords—which is not nearly as good as a proper reading by an expert professor, but is still orders of magnitude better than filling circle “C”. Eventually AI will make even more detailed grading possible, though at that point we may have AIs just taking over the whole process of teaching. (Leaving professors entirely for research, presumably. Not sure if this would be good or bad.)

Automation isn’t the only answer either. You could hire more graders and teaching assistants—say one for every 30 or 40 students instead of one for every 100 students. (And then the TAs might actually be able to get to know their students! What a concept!) You could give fewer tests, or shorter ones—because a small, reliable sample is actually better than a large, unreliable one. A bonus there would be reducing students’ feelings of test anxiety. You could give project-based assignments, which would still take a long time to grade, but would also be a lot more interesting and fulfilling for both the students and the graders.

Or, and perhaps this is the most radical answer of all: You could stop worrying so much about evaluating student performance.

I get it, you want to know whether students are doing well, both so that you can improve your teaching and so that you can rank the students and decide who deserves various awards and merits. But do you really need to be constantly evaluating everything that students do? Did it ever occur to you that perhaps that is why so many students suffer from anxiety—because they are literally being formally evaluated with long-term consequences every single day they go to school?

If we eased up on all this evaluation, I think the fear is that students would just detach entirely; all teachers know students who only seem to show up in class because they’re being graded on attendance. But there are a couple of reasons to think that maybe this fear isn’t so well-founded after all.

If you give up on constant evaluation, you can open up opportunities to make your classes a lot more creative and interesting—and even fun. You can make students want to come to class, because they get to engage in creative exploration and collaboration instead of memorizing what you drone on at them for hours on end. Most of the reason we don’t do creative, exploratory activities is simply that we don’t know how to evaluate them reliably—so what if we just stopped worrying about that?

Moreover, are those students who only show up for the grade really getting anything out of it anyway? Maybe it would be better if they didn’t show up—indeed, if they just dropped out of college entirely and did something else with their lives until they get their heads on straight. Maybe all this effort that we are currently expending trying to force students to learn who clearly don’t appreciate the value of learning could instead be spent enriching the students who do appreciate learning and came here to do as much of it as possible. Because, ultimately, you can lead a student to algebra, but you can’t make them think. (Let me be clear, I do not mean students with less innate ability or prior preparation; I mean students who aren’t interested in learning and are only showing up because they feel compelled to. I admire students with less innate ability who nonetheless succeed because they work their butts off, and wish I were quite so motivated myself.)
There’s a downside to that, of course. Compulsory education does actually seem to have significant benefits in making people into better citizens. Maybe if we let those students just leave college, they’d never come back, and they would squander their potential. Maybe we need to force them to show up until something clicks in their brains and they finally realize why we’re doing it. In fact, we’re really not forcing them; they could drop out in most cases and simply don’t, probably because their parents are forcing them. Maybe the signaling problem is too fundamental, and the only way we can get unmotivated students to accept not getting prestigious degrees is by going through this whole process of forcing them to show up for years and evaluating everything they do until we can formally justify ultimately failing them. (Of course, almost by construction, a student who does the absolute bare minimum to pass will pass.) But college admission is competitive, and I can’t shake this feeling there are thousands of students out there who got rejected from the school they most wanted to go to, the school they were really passionate about and willing to commit their lives to, because some other student got in ahead of them—and that other student is now sitting in the back of the room playing with an iPhone, grumbling about having to show up for class every day. What about that squandered potential? Perhaps competitive admission and compulsory attendance just don’t mix, and we should stop compelling students once they get their high school diploma.