Escaping the wrong side of the Yerkes-Dodson curve

Jul 25 JDN 2459421

I’ve been under a great deal of stress lately. Somehow I ended up needing to finish my dissertation, get married, and move overseas to start a new job all during the same few months—during a global pandemic.

A little bit of stress is useful, but too much can be very harmful. On complicated tasks (basically anything that involves planning or careful thought), increased stress will increase performance up to a point, and then decrease it after that point. This phenomenon is known as the Yerkes-Dodson law.

The Yerkes-Dodson curve very closely resembles the Laffer curve, which shows that since extremely low tax rates raise little revenue (obviously), and extremely high tax rates also raise very little revenue (because they cause so much damage to the economy), the tax rate that maximizes government revenue is actually somewhere in the middle. There is a revenue-maximizing tax rate (usually estimated to be about 70%).

Instead of a revenue-maximizing tax rate, the Yerkes-Dodson law says that there is a performance-maximizing stress level. You don’t want to have zero stress, because that means you don’t care and won’t put in any effort. But if your stress level gets too high, you lose your ability to focus and your performance suffers.

Since stress (like taxes) comes with a cost, you may not even want to be at the maximum point. Performance isn’t everything; you might be happier choosing a lower level of performance in order to reduce your own stress.

But once thing is certain: You do not want to be to the right of that maximum. Then you are paying the cost of not only increased stress, but also reduced performance.

And yet I think many of us spent a great deal of our time on the wrong side of the Yerkes-Dodson curve. I certainly feel like I’ve been there for quite awhile now—most of grad school, really, and definitely this past month when suddenly I found out I’d gotten an offer to work in Edinburgh.

My current circumstances are rather exceptional, but I think the general pattern of being on the wrong side of the Yerkes-Dodson curve is not.

Over 80% of Americans report work-related stress, and the US economy loses about half a trillion dollars a year in costs related to stress.

The World Health Organization lists “work-related stress” as one of its top concerns. Over 70% of people in a cross-section of countries report physical symptoms related to stress, a rate which has significantly increased since before the pandemic.

The pandemic is clearly a contributing factor here, but even without it, there seems to be an awful lot of stress in the world. Even back in 2018, over half of Americans were reporting high levels of stress. Why?

For once, I think it’s actually fair to blame capitalism.

One thing capitalism is exceptionally good at is providing strong incentives for work. This is often a good thing: It means we get a lot of work done, so employment is high, productivity is high, GDP is high. But it comes with some important downsides, and an excessive level of stress is one of them.

But this can’t be the whole story, because if markets were incentivizing us to produce as much as possible, that ought to put us near the maximum of the Yerkes-Dodson curve—but it shouldn’t put us beyond it. Maximizing productivity might not be what makes us happiest—but many of us are currently so stressed that we aren’t even maximizing productivity.

I think the problem is that competition itself is stressful. In a capitalist economy, we aren’t simply incentivized to do things well—we are incentivized to do them better than everyone else. Often quite small differences in performance can lead to large differences in outcome, much like how a few seconds can make the difference between an Olympic gold medal and an Olympic “also ran”.

An optimally productive economy would be one that incentivizes you to perform at whatever level maximizes your own long-term capability. It wouldn’t be based on competition, because competition depends too much on what other people are capable of. If you are not especially talented, competition will cause you great stress as you try to compete with people more talented than you. If you happen to be exceptionally talented, competition won’t provide enough incentive!

Here’s a very simple model for you. Your total performance p is a function of two components, your innate ability aand your effort e. In fact let’s just say it’s a sum of the two: p = a + e

People are randomly assigned their level of capability from some probability distribution, and then they choose their effort. For the very simplest case, let’s just say there are two people, and it turns out that person 1 has less innate ability than person 2, so a1 < a2.

There is also a certain amount of inherent luck in any competition. As it says in Ecclesiastes (by far the best book of the Old Testament), “The race is not to the swift or the battle to the strong, nor does food come to the wise or wealth to the brilliant or favor to the learned; but time and chance happen to them all.” So as usual I’ll model this as a contest function, where your probability of winning depends on your total performance, but it’s not a sure thing.

Let’s assume that the value of winning and cost of effort are the same across different people. (It would be simple to remove this assumption, but it wouldn’t change much in the results.) The value of winning I’ll call y, and I will normalize the cost of effort to 1.


Then this is each person’s expected payoff ui:

ui = (ai + ei)/(a1+e1+a2 + e2) V – ei

You choose effort, not ability, so maximize in terms of ei:

(a2 + e2) V = (a1 +e1+a2 + e2)2 = (a1 + e1) V

a1 + e1 = a2 + e2

p1 = p2

In equilibrium, both people will produce exactly the same level of performance—but one of them will be contributing more effort to compensate for their lesser innate ability.

I’ve definitely had this experience in both directions: Effortlessly acing math tests that I knew other people barely passed despite hours of studying, and running until I could barely breathe to keep up with other people who barely seemed winded. Clearly I had too little incentive in math class and too much in gym class—and competition was obviously the culprit.

If you vary the cost of effort between people, or make it not linear, you can make the two not exactly equal; but the overall pattern will remain that the person who has more ability will put in less effort because they can win anyway.

Yet presumably the amount of effort we want to incentivize isn’t less for those who are more talented. If anything, it may be more: Since an hour of work produces more when done by the more talented person, if the cost to them is the same, then the net benefit of that hour of work is higher than the same hour of work by someone less talented.

In a large population, there are almost certainly many people whose talents are similar to your own—but there are also almost certainly many below you and many above you as well. Unless you are properly matched with those of similar talent, competition will systematically lead to some people being pressured to work too hard and others not pressured enough.

But if we’re all stressed, where are the people not pressured enough? We see them on TV. They are celebrities and athletes and billionaires—people who got lucky enough, either genetically (actors who were born pretty, athletes who were born with more efficient muscles) or environmentally (inherited wealth and prestige), to not have to work as hard as the rest of us in order to succeed. Indeed, we are constantly bombarded with images of these fantastically lucky people, and by the availability heuristic our brains come to assume that they are far more plentiful than they actually are.

This dramatically exacerbates the harms of competition, because we come to feel that we are competing specifically with the people who were handed the world on a silver platter. Born without the innate advantages of beauty or endurance or inheritance, there’s basically no chance we could ever measure up; and thus we feel utterly inadequate unless we are constantly working as hard as we possibly can, trying to catch up in a race in which we always fall further and further behind.

How can we break out of this terrible cycle? Well, we could try to replace capitalism with something like the automated luxury communism of Star Trek; but this seems like a very difficult and long-term solution. Indeed it might well take us a few hundred years as Roddenberry predicted.

In the shorter term, we may not be able to fix the economic problem, but there is much we can do to fix the psychological problem.

By reflecting on the full breadth of human experience, not only here and now, but throughout history and around the world, you can come to realize that you—yes, you, if you’re reading this—are in fact among the relatively fortunate. If you have a roof over your head, food on your table, clean water from your tap, and ibuprofen in your medicine cabinet, you are far more fortunate than the average person in Senegal today; your television, car, computer, and smartphone are things that would be the envy even of kings just a few centuries ago. (Though ironically enough that person in Senegal likely has a smartphone, or at least a cell phone!)

Likewise, you can reflect upon the fact that while you are likely not among the world’s most very most talented individuals in any particular field, there is probably something you are much better at than most people. (A Fermi estimate suggests I’m probably in the top 250 behavioral economists in the world. That’s probably not enough for a Nobel, but it does seem to be enough to get a job at the University of Edinburgh.) There are certainly many people who are less good at many things than you are, and if you must think of yourself as competing, consider that you’re also competing with them.

Yet perhaps the best psychological solution is to learn not to think of yourself as competing at all. So much as you can afford to do so, try to live your life as if you were already living in a world that rewards you for making the best of your own capabilities. Try to live your life doing what you really think is the best use of your time—not your corporate overlords. Yes, of course, we must do what we need to in order to survive, and not just survive, but indeed remain physically and mentally healthy—but this is far less than most First World people realize. Though many may try to threaten you with homelessness or even starvation in order to exploit you and make you work harder, the truth is that very few people in First World countries actually end up that way (it couldbe brought to zero, if our public policy were better), and you’re not likely to be among them. “Starving artists” are typically a good deal happier than the general population—because they’re not actually starving, they’ve just removed themselves from the soul-crushing treadmill of trying to impress the neighbors with manicured lawns and fancy SUVs.

Why business owners are always so wrong about regulations

Jun 20 JDN 2459386

Minimum wage. Environmental regulations. Worker safety. Even bans on child slavery.No matter what the regulation is, it seems that businesses will always oppose it, always warn that these new regulations will destroy their business and leave thousands out of work—and always be utterly, completely wrong.

In fact, the overall impact of US federal government regulations on employment is basically negligible, and the impact on GDP is very clearly positive. This really isn’t surprising if you think about it: Despite what some may have you believe, our government doesn’t go around randomly regulating things for no reason. The regulations we impose are specifically chosen because their benefits outweighed their costs, and the rigorous, nonpartisan analysis of our civil service is one of the best-kept secrets of American success and the envy of the world.

But when businesses are so consistently insistent that new regulations (of whatever kind, however minor or reasonable they may be) will inevitably destroy their industry—when such catastrophic outcomes have basically never occurred, that cries out for an explanation. How can such otherwise competent, experienced, knowledgeable people be always so utterly wrong about something so basic? These people are experts in what they do. Shouldn’t business owners know what would happen if we required them to raise wages a little, or require basic safety standards, or reduce pollution caps, or not allow their suppliers to enslave children?

Well, what do you mean by “them”? Herein lies the problem. There is a fundamental difference between what would happen if we required any specific business to comply with a new regulation (but left their competitors exempt), versus what happens if we require an entire industry to comply with that same regulation.

Business owners are accustomed to thinking in an open system, what economists call partial equilibrium: They think about how things will affect them specifically, and not how they will affect broader industries or the economy as a whole. If wages go up, they’ll lay off workers. If the price of their input goes down, they’ll buy more inputs and produce more outputs. They aren’t thinking about how these effects interact with one another at a systemic level, because they don’t have to.

This works because even a huge multinational corporation is only a small portion of the US economy, and doesn’t have much control over the system as a whole. So in general when a business tries to maximize its profit in partial equilibrium, it tends to get the right answer (at least as far as maximizing GDP goes).

But large-scale regulation is one time where we absolutely cannot do this. If we try to analyze federal regulations purely in partial equilibrium terms, we will be consistently and systematically wrong—as indeed business owners are.

If we went to a specific corporation and told them, “You must pay your workers $2 more per hour.”, what would happen? They would be forced to lay off workers. No doubt about it. If we specifically targeted one particular corporation and required them to raise their wages, they would be unable to compete with other businesses who had not been forced to comply. In fact, they really might go out of business completely. This is the panic that business owners are expressing when they warn that even really basic regulations like “You can’t dump toxic waste in our rivers” or “You must not force children to pick cocoa beans for you” will cause total economic collapse.

But when you regulate an entire industry in this way, no such dire outcomes happen. The competitors are also forced to comply, and so no businesses are given special advantages relative to one another. Maybe there’s some small reduction in employment or output as a result, but at least if the regulation is reasonably well-planned—as virtually all US federal regulations are, by extremely competent people—those effects will be much smaller than the benefits of safer workers, or cleaner water, or whatever was the reason for the regulation in the first place.

Think of it this way. Businesses are in a constant state of fierce, tight competition. So let’s consider a similarly tight competition such as the Olympics. The gold medal for the 100-meter sprint is typically won by someone who runs the whole distance in less than 10 seconds.

Suppose we had told one of the competitors: “You must wait an extra 3 seconds before starting.” If we did this to one specific runner, that runner would lose. With certainty. There has never been an Olympic 100-meter sprint where the first-place runner was more than 3 seconds faster than the second-place runner. So it is basically impossible for that runner to ever win the gold, simply because of that 3-second handicap. And if we imposed that constraint on some runners but not others, we would ensure that only runners without the handicap had any hope of winning the race.

But now suppose we had simply started the competition 3 seconds late. We had a minor technical issue with the starting gun, we fixed it in 3 seconds, and then everything went as normal. Basically no one would notice. The winner of the race would be the same as before, all the running times would be effectively the same. Things like this have almost certainly happened, perhaps dozens of times, and no one noticed or cared.

It’s the same 3-second delay, but the outcome is completely different.

The difference is simple but vital: Are you imposing this constraint on some competitors, or on all competitors? A constraint imposed on some competitors will be utterly catastrophic for those competitors. A constraint imposed on all competitors may be basically unnoticeable to all involved.

Now, with regulations it does get a bit more complicated than that: We typically can’t impose regulations on literally everyone, because there is no global federal government with the authority to do that. Even international human rights law, sadly, is not that well enforced. (International intellectual property lawvery nearly is—and that contrast itself says something truly appalling about our entire civilization.) But when regulation is imposed by a large entity like the United States (or even the State of California), it generally affects enough of the competitors—and competitors who already had major advantages to begin with, like the advanced infrastructure, impregnable national security, and educated population of the United States—that the effects on competition are, if not negligible, at least small enough to be outweighed by the benefits of the regulation.

So, whenever we propose a new regulation and business owners immediately panic about its catastrophic effects, we can safely ignore them. They do this every time, and they are always wrong.

But take heed: Economists are trained to think in terms of closed systems and general equilibrium. So if economists are worried about the outcome of a regulation, then there is legitimate reason to be concerned. It’s not that we know better how to run their businesses—we certainly don’t. Rather, we much better understand the difference between imposing a 3-second delay on a single runner versus simply starting the whole race 3 seconds later.

Why is cryptocurrency popular?

May 30 JDN 2459365

At the time of writing, the price of most cryptocurrencies has crashed, likely due to a ban on conventional banks using cryptocurrency in China (though perhaps also due to Elon Musk personally refusing to accept Bitcoin at his businesses). But for all I know by the time this post goes live the price will surge again. Or maybe they’ll crash even further. Who knows? The prices of popular cryptocurrencies have been extremely volatile.

This post isn’t really about the fluctuations of cryptocurrency prices. It’s about something a bit deeper: Why are people willing to put money into cryptocurrencies at all?

The comparison is often made to fiat currency: “Bitcoin isn’t backed by anything, but neither is the US dollar.”

But the US dollar is backed by something: It’s backed by the US government. Yes, it’s not tradeable for gold at a fixed price, but so what? You can use it to pay taxes. The government requires it to be legal tender for all debts. There are certain guaranteed exchange rights built into the US dollar, which underpin the value that the dollar takes on in other exchanges. Moreover, the US Federal Reserve carefully manages the supply of US dollars so as to keep their value roughly constant.

Bitcoin does not have this (nor does Dogecoin, or Etherium, or any of the other hundreds of lesser-known cryptocurrencies). There is no central bank. There is no government making them legal tender for any debts at all, let alone all of them. Nobody collects taxes in Bitcoin.

And so, because its value is untethered, Bitcoin’s price rises and falls, often in huge jumps, more or less randomly. If you look all the way back to when it was introduced, Bitcoin does seem to have an overall upward price trend, but this honestly seems like a statistical inevitability: If you start out being worthless, the only way your price can change is upward. While some people have become quite rich by buying into Bitcoin early on, there’s no particular reason to think that it will rise in value from here on out.

Nor does Bitcoin have any intrinsic value. You can’t eat it, or build things out of it, or use it for scientific research. It won’t even entertain you (unless you have a very weird sense of entertainment). Bitcoin doesn’t even have “intrinsic value” the way gold does (which is honestly an abuse of the term, since gold isn’t actually especially useful): It isn’t innately scarce. It was made scarce by its design: Through the blockchain, a clever application of encryption technology, it was made difficult to generate new Bitcoins (called “mining”) in an exponentially increasing way. But the decision of what encryption algorithm to use was utterly arbitrary. Bitcoin mining could just as well have been made a thousand times easier or a thousand times harder. They seem to have hit a sweet spot where they made it just hard enough that it make Bitcoin seem scarce while still making it feel feasible to get.

We could actually make a cryptocurrency that does something useful, by tying its mining to a genuinely valuable pursuit, like analyzing scientific data or proving mathematical theorems. Perhaps I should suggest a partnership with Folding@Home to make FoldCoin, the crypto coin you mine by folding proteins. There are some technical details there that would be a bit tricky, but I think it would probably be feasible. And then at least all this computing power would accomplish something, and the money people make would be to compensate them for their contribution.

But Bitcoin is not useful. No institution exists to stabilize its value. It constantly rises and falls in price. Why do people buy it?

In a word, FOMO. The fear of missing out. People buy Bitcoin because they see that a handful of other people have become rich by buying and selling Bitcoin. Bitcoin symbolizes financial freedom: The chance to become financially secure without having to participate any longer in our (utterly broken) labor market.

In this, volatility is not a bug but a feature: A stable currency won’t change much in value, so you’d only buy into it because you plan on spending it. But an unstable currency, now, there you might manage to get lucky speculating on its value and get rich quick for nothing. Or, more likely, you’ll end up poorer. You really have no way of knowing.

That makes cryptocurrency fundamentally like gambling. A few people make a lot of money playing poker, too; but most people who play poker lose money. Indeed, those people who get rich are only able to get rich because other people lose money. The game is zero-sum—and likewise so is cryptocurrency.

Note that this is not how the stock market works, or at least not how it’s supposed to work (sometimes maybe). When you buy a stock, you are buying a share of the profits of a corporation—a real, actual corporation that produces and sells goods or services. You’re (ostensibly) supplying capital to fund the operations of that corporation, so that they might make and sell more goods in order to earn more profit, which they will then share with you.

Likewise when you buy a bond: You are lending money to an institution (usually a corporation or a government) that intends to use that money to do something—some real actual thing in the world, like building a factory or a bridge. They are willing to pay interest on that debt in order to get the money now rather than having to wait.

Initial Coin Offerings were supposed to be away to turn cryptocurrency into a genuine investment, but at least in their current virtually unregulated form, they are basically indistinguishable from a Ponzi scheme. Unless the value of the coin is somehow tied to actual ownership of the corporation or shares of its profits (the way stocks are), there’s nothing to ensure that the people who buy into the coin will actually receive anything in return for the capital they invest. There’s really very little stopping a startup from running an ICO, receiving a bunch of cash, and then absconding to the Cayman Islands. If they made it really obvious like that, maybe a lawsuit would succeed; but as long as they can create even the appearance of a good-faith investment—or even actually make their business profitable!—there’s nothing forcing them to pay a cent to the owners of their cryptocurrency.

The really frustrating thing for me about all this is that, sometimes, it works. There actually are now thousands of people who made decisions that by any objective standard were irrational and irresponsible, and then came out of it millionaires. It’s much like the lottery: Playing the lottery is clearly and objectively a bad idea, but every once in awhile it will work and make you massively better off.

It’s like I said in a post about a year ago: Glorifying superstars glorifies risk. When a handful of people can massively succeed by making a decision, that makes a lot of other people think that it was a good decision. But quite often, it wasn’t a good decision at all; they just got spectacularly lucky.

I can’t exactly say you shouldn’t buy any cryptocurrency. It probably has better odds than playing poker or blackjack, and it certainly has better odds than playing the lottery. But what I can say is this: It’s about odds. It’s gambling. It may be relatively smart gambling (poker and blackjack are certainly a better idea than roulette or slot machines), with relatively good odds—but it’s still gambling. It’s a zero-sum high-risk exchange of money that makes a few people rich and lots of other people poorer.

With that in mind, don’t put any money into cryptocurrency that you couldn’t afford to lose at a blackjack table. If you’re looking for something to seriously invest your savings in, the answer remains the same: Stocks. All the stocks.

I doubt this particular crash will be the end for cryptocurrency, but I do think it may be the beginning of the end. I think people are finally beginning to realize that cryptocurrencies are really not the spectacular innovation that they were hyped to be, but more like a high-tech iteration of the ancient art of the Ponzi scheme. Maybe blockchain technology will ultimately prove useful for something—hey, maybe we should actually try making FoldCoin. But the future of money remains much as it has been for quite some time: Fiat currency managed by central banks.

On the accuracy of testing

Jan 31 JDN 2459246

One of the most important tools we have for controlling the spread of a pandemic is testing to see who is infected. But no test is perfectly reliable. Currently we have tests that are about 80% accurate. But what does it mean to say that a test is “80% accurate”? Many people get this wrong.

First of all, it certainly does not mean that if you have a positive result, you have an 80% chance of having the virus. Yet this is probably what most people think when they hear “80% accurate”.

So I thought it was worthwhile to demystify this a little bit, an explain just what we are talking about when we discuss the accuracy of a test—which turns out to have deep implications not only for pandemics, but for knowledge in general.

There are really two key measures of a test’s accuracy, called sensitivity and specificity, The sensitivity is the probability that, if the true answer is positive (you have the virus), the test result will be positive. This is the sense in which our tests are 80% accurate. The specificity is the probability that, if the true answer is negative (you don’t have the virus), the test result is negative. The terms make sense: A test is sensitive if it always picks up what’s there, and specific if it doesn’t pick up what isn’t there.

These two measures need not be the same, and typically are quite different. In fact, there is often a tradeoff between them: Increasing the sensitivity will often decrease the specificity.

This is easiest to see with an extreme example: I can create a COVID test that has “100% accuracy” in the sense of sensitivity. How do I accomplish this miracle? I simply assume that everyone in the world has COVID. Then it is absolutely guaranteed that I will have zero false negatives.

I will of course have many false positives—indeed the vast majority of my “positive results” will be me assuming that COVID is present without any evidence. But I can guarantee a 100% true positive rate, so long as I am prepared to accept a 0% true negative rate.

It’s possible to combine tests in ways that make them more than the sum of their parts. You can first run a test with a high specificity, and then re-test with a test that has a high sensitivity. The result will have both rates higher than either test alone.

For example, suppose test A has a sensitivity of 70% and a specificity of 90%, while test B has the reverse.

Then, if the true answer is positive, test A will return true 70% of the time, while test B will return true 90% of the time. So there is a 70% + (30%)(90%) = 97% chance of getting a positive result on the combined test.

If the true answer is negative, test A will return false 90% of the time, while test B will return false 70% of the time. So there is a 90% + (10%)(70%) = 97% chance of getting a negative result on the combined test.

Actually if we are going to specify the accuracy of a test in a single number, I think it would be better to use a much more obscure term, the informedness. Informedness is sensitivity plus specificity, minus one. It ranges between -1 and 1, where 1 is a perfect test, and 0 is a test that tells you absolutely nothing. -1 isn’t the worst possible test; it’s a test that’s simply calibrated backwards! Re-label it, and you’ve got a perfect test. So really maybe we should talk about the absolute value of the informedness.

It’s much harder to play tricks with informedness: My “miracle test” that just assumes everyone has the virus actually has an informedness of zero. This makes sense: The “test” actually provides no information you didn’t already have.

Surprisingly, I was not able to quickly find any references to this really neat mathematical result for informedness, but I find it unlikely that I am the only one who came up with it: The informedness of a test is the non-unit eigenvalue of a Markov matrix representing the test. (If you don’t know what all that means, don’t worry about it; it’s not important for this post. I just found it a rather satisfying mathematical result that I couldn’t find anyone else talking about.)

But there’s another problem as well: Even if we know everything about the accuracy of a test, we still can’t infer the probability of actually having the virus from the test result. For that, we need to know the baseline prevalence. Failing to account for that is the very common base rate fallacy.

Here’s a quick example to help you see what the problem is. Suppose that 1% of the population has the virus. And suppose that the tests have 90% sensitivity and 95% specificity. If I get a positive result, what is the probability I have the virus?

If you guessed something like 90%, you have committed the base rate fallacy. It’s actually much smaller than that. In fact, the true probability you have the virus is only 15%.

In a population of 10000 people, 100 (1%) will have the virus while 9900 (99%) will not. Of the 100 who have the virus, 90 (90%) will test positive and 10 (10%) will test negative. Of the 9900 who do not have the virus, 495 (5%) will test positive and 9405 (95%) will test negative.

This means that out of 585 positive test results, only 90 will actually be true positives!

If we wanted to improve the test so that we could say that someone who tests positive is probably actually positive, would it be better to increase sensitivity or specificity? Well, let’s see.

If we increased the sensitivity to 95% and left the specificity at 95%, we’d get 95 true positives and 495 false positives. This raises the probability to only 16%.

But if we increased the specificity to 97% and left the sensitivity at 90%, we’d get 90 true positives and 297 false positives. This raises the probability all the way to 23%.

But suppose instead we care about the probability that you don’t have the virus, given that you test negative. Our original test had 9900 true negatives and 10 false negatives, so it was quite good in this regard; if you test negative, you only have a 0.1% chance of having the virus.

Which approach is better really depends on what we care about. When dealing with a pandemic, false negatives are much worse than false positives, so we care most about sensitivity. (Though my example should show why specificity also matters.) But there are other contexts in which false positives are more harmful—such as convicting a defendant in a court of law—and then we want to choose a test which has a high true negative rate, even if it means accepting a low true positive rate.

In science in general, we seem to care a lot about false positives; a p-value is simply one minus the specificity of the statistical test, and as we all know, low p-values are highly sought after. But the sensitivity of statistical tests is often quite unclear. This means that we can be reasonably confident of our positive results (provided the baseline probability wasn’t too low, the statistics weren’t p-hacked, etc.); but we really don’t know how confident to be in our negative results. Personally I think negative results are undervalued, and part of how we got a replication crisis and p-hacking was by undervaluing those negative results. I think it would be better in general for us to report 95% confidence intervals (or better yet, 95% Bayesian prediction intervals) for all of our effects, rather than worrying about whether they meet some arbitrary threshold probability of not being exactly zero. Nobody really cares whether the effect is exactly zero (and it almost never is!); we care how big the effect is. I think the long-run trend has been toward this kind of analysis, but it’s still far from the norm in the social sciences. We’ve become utterly obsessed with specificity, and basically forgot that sensitivity exists.

Above all, be careful when you encounter a statement like “the test is 80% accurate”; what does that mean? 80% sensitivity? 80% specificity? 80% informedness? 80% probability that an observed positive is true? These are all different things, and the difference can matter a great deal.

Signaling and the Curse of Knowledge

Jan 3 JDN 2459218

I received several books for Christmas this year, and the one I was most excited to read first was The Sense of Style by Steven Pinker. Pinker is exactly the right person to write such a book: He is both a brilliant linguist and cognitive scientist and also an eloquent and highly successful writer. There are two other books on writing that I rate at the same tier: On Writing by Stephen King, and The Art of Fiction by John Gardner. Don’t bother with style manuals from people who only write style manuals; if you want to learn how to write, learn from people who are actually successful at writing.

Indeed, I knew I’d love The Sense of Style as soon as I read its preface, containing some truly hilarious takedowns of Strunk & White. And honestly Strunk & White are among the best standard style manuals; they at least actually manage to offer some useful advice while also being stuffy, pedantic, and often outright inaccurate. Most style manuals only do the second part.

One of Pinker’s central focuses in The Sense of Style is on The Curse of Knowledge, an all-too-common bias in which knowing things makes us unable to appreciate the fact that other people don’t already know it. I think I succumbed to this failing most greatly in my first book, Special Relativity from the Ground Up, in which my concept of “the ground” was above most people’s ceilings. I was trying to write for high school physics students, and I think the book ended up mostly being read by college physics professors.

The problem is surely a real one: After years of gaining expertise in a subject, we are all liable to forget the difficulty of reaching our current summit and automatically deploy concepts and jargon that only a small group of experts actually understand. But I think Pinker underestimates the difficulty of escaping this problem, because it’s not just a cognitive bias that we all suffer from time to time. It’s also something that our society strongly incentivizes.

Pinker points out that a small but nontrivial proportion of published academic papers are genuinely well written, using this to argue that obscurantist jargon-laden writing isn’t necessary for publication; but he didn’t seem to even consider the fact that nearly all of those well-written papers were published by authors who already had tenure or even distinction in the field. I challenge you to find a single paper written by a lowly grad student that could actually get published without being full of needlessly technical terminology and awkward passive constructions: “A murian model was utilized for the experiment, in an acoustically sealed environment” rather than “I tested using mice and rats in a quiet room”. This is not because grad students are more thoroughly entrenched in the jargon than tenured professors (quite the contrary), nor that grad students are worse writers in general (that one could really go either way), but because grad students have more to prove. We need to signal our membership in the tribe, whereas once you’ve got tenure—or especially once you’ve got an endowed chair or something—you have already proven yourself.

Pinker seems to briefly touch this insight (p. 69), without fully appreciating its significance: “Even when we have an inlkling that we are speaking in a specialized lingo, we may be reluctant to slip back into plain speech. It could betray to our peers the awful truth that we are still greenhorns, tenderfoots, newbies. And if our readers do know the lingo, we might be insulting their intelligence while spelling it out. We would rather run the risk of confusing them while at least appearing to be soophisticated than take a chance at belaboring the obvious while striking them as naive or condescending.”

What we are dealing with here is a signaling problem. The fact that one can write better once one is well-established is the phenomenon of countersignaling, where one who has already established their status stops investing in signaling.

Here’s a simple model for you. Suppose each person has a level of knowledge x, which they are trying to demonstrate. They know their own level of knowledge, but nobody else does.

Suppose that when we observe someone’s knowledge, we get two pieces of information: We have an imperfect observation of their true knowledge which is x+e, the real value of x plus some amount of error e. Nobody knows exactly what the error is. To keep the model as simple as possible I’ll assume that e is drawn from a uniform distribution between -1 and 1.

Finally, assume that we are trying to select people above a certain threshold: Perhaps we are publishing in a journal, or hiring candidates for a job. Let’s call that threshold z. If x < z-1, then since e can never be larger than 1, we will immediately observe that they are below the threshold and reject them. If x > z+1, then since e can never be smaller than -1, we will immediately observe that they are above the threshold and accept them.

But when z-1 < x < z+1, we may think they are above the threshold when they actually are not (if e is positive), or think they are not above the threshold when they actually are (if e is negative).

So then let’s say that they can invest in signaling by putting some amount of visible work in y (like citing obscure papers or using complex jargon). This additional work may be costly and provide no real value in itself, but it can still be useful so long as one simple condition is met: It’s easier to do if your true knowledge x is high.

In fact, for this very simple model, let’s say that you are strictly limited by the constraint that y <= x. You can’t show off what you don’t know.

If your true value x > z, then you should choose y = x. Then, upon observing your signal, we know immediately that you must be above the threshold.

But if your true value x < z, then you should choose y = 0, because there’s no point in signaling that you were almost at the threshold. You’ll still get rejected.

Yet remember before that only those with z-1 < x < z+1 actually need to bother signaling at all. Those with x > z+1 can actually countersignal, by also choosing y = 0. Since you already have tenure, nobody doubts that you belong in the club.

This means we’ll end up with three groups: Those with x < z, who don’t signal and don’t get accepted; those with z < x < z+1, who signal and get accepted; and those with x > z+1, who don’t signal but get accepted. Then life will be hardest for those who are just above the threshold, who have to spend enormous effort signaling in order to get accepted—and that sure does sound like grad school.

You can make the model more sophisticated if you like: Perhaps the error isn’t uniformly distributed, but some other distribution with wider support (like a normal distribution, or a logistic distribution); perhaps the signalling isn’t perfect, but itself has some error; and so on. With such additions, you can get a result where the least-qualified still signal a little bit so they get some chance, and the most-qualified still signal a little bit to avoid a small risk of being rejected. But it’s a fairly general phenomenon that those closest to the threshold will be the ones who have to spend the most effort in signaling.

This reveals a disturbing overlap between the Curse of Knowledge and Impostor Syndrome: We write in impenetrable obfuscationist jargon because we are trying to conceal our own insecurity about our knowledge and our status in the profession. We’d rather you not know what we’re talking about than have you realize that we don’t know what we’re talking about.

For the truth is, we don’t know what we’re talking about. And neither do you, and neither does anyone else. This is the agonizing truth of research that nearly everyone doing research knows, but one must be either very brave, very foolish, or very well-established to admit out loud: It is in the nature of doing research on the frontier of human knowledge that there is always far more that we don’t understand about our subject than that we do understand.

I would like to be more open about that. I would like to write papers saying things like “I have no idea why it turned out this way; it doesn’t make sense to me; I can’t explain it.” But to say that the profession disincentivizes speaking this way would be a grave understatement. It’s more accurate to say that the profession punishes speaking this way to the full extent of its power. You’re supposed to have a theory, and it’s supposed to work. If it doesn’t actually work, well, maybe you can massage the numbers until it seems to, or maybe you can retroactively change the theory into something that does work. Or maybe you can just not publish that paper and write a different one.

Here is a graph of one million published z-scores in academic journals:

It looks like a bell curve, except that almost all the values between -2 and 2 are mysteriously missing.

If we were actually publishing all the good science that gets done, it would in fact be a very nice bell curve. All those missing values are papers that never got published, or results that were excluded from papers, or statistical analyses that were massaged, in order to get a p-value less than the magical threshold for publication of 0.05. (For the statistically uninitiated, a z-score less than -2 or greater than +2 generally corresponds to a p-value less than 0.05, so these are effectively the same constraint.)

I have literally never read a single paper published in an academic journal in the last 50 years that said in plain language, “I have no idea what’s going on here.” And yet I have read many papers—probably most of them, in fact—where that would have been an appropriate thing to say. It’s actually quite a rare paper, at least in the social sciences, that actually has a theory good enough to really precisely fit the data and not require any special pleading or retroactive changes. (Often the bar for a theory’s success is lowered to “the effect is usually in the right direction”.) Typically results from behavioral experiments are bizarre and baffling, because people are a little screwy. It’s just that nobody is willing to stake their career on being that honest about the depth of our ignorance.

This is a deep shame, for the greatest advances in human knowledge have almost always come from people recognizing the depth of their ignorance. Paradigms never shift until people recognize that the one they are using is defective.

This is why it’s so hard to beat the Curse of Knowledge: You need to signal that you know what you’re talking about, and the truth is you probably don’t, because nobody does. So you need to sound like you know what you’re talking about in order to get people to listen to you. You may be doing nothing more than educated guesses based on extremely limited data, but that’s actually the best anyone can do; those other people saying they have it all figured out are either doing the same thing, or they’re doing something even less reliable than that. So you’d better sound like you have it all figured out, and that’s a lot more convincing when you “utilize a murian model” than when you “use rats and mice”.

Perhaps we can at least push a little bit toward plainer language. It helps to be addressing a broader audience: it is both blessing and curse that whatever I put on this blog is what you will read, without any gatekeepers in my path. I can use plainer language here if I so choose, because no one can stop me. But of course there’s a signaling risk here as well: The Internet is a public place, and potential employers can read this as well, and perhaps decide they don’t like me speaking so plainly about the deep flaws in the academic system. Maybe I’d be better off keeping my mouth shut, at least for awhile. I’ve never been very good at keeping my mouth shut.

Once we get established in the system, perhaps we can switch to countersignaling, though even this doesn’t always happen. I think there are two reasons this can fail: First, you can almost always try to climb higher. Once you have tenure, aim for an endowed chair. Once you have that, try to win a Nobel. Second, once you’ve spent years of your life learning to write in a particular stilted, obscurantist, jargon-ridden way, it can be very difficult to change that habit. People have been rewarding you all your life for writing in ways that make your work unreadable; why would you want to take the risk of suddenly making it readable?

I don’t have a simple solution to this problem, because it is so deeply embedded. It’s not something that one person or even a small number of people can really fix. Ultimately we will need to, as a society, start actually rewarding people for speaking plainly about what they don’t know. Admitting that you have no clue will need to be seen as a sign of wisdom and honesty rather than a sign of foolishness and ignorance. And perhaps even that won’t be enough: Because the fact will still remain that knowing what you know that other people don’t know is a very difficult thing to do.

Terrible but not likely, likely but not terrible

May 17 JDN 2458985

The human brain is a remarkably awkward machine. It’s really quite bad at organizing data, relying on associations rather than formal categories.

It is particularly bad at negation. For instance, if I tell you that right now, no matter what, you must not think about a yellow submarine, the first thing you will do is think about a yellow submarine. (You may even get the Beatles song stuck in your head, especially now that I’ve mentioned it.) A computer would never make such a grievous error.

The human brain is also quite bad at separation. Daniel Dennett coined a word “deepity” for a particular kind of deep-sounding but ultimately trivial aphorism that seems to be quite common, which relies upon this feature of the brain. A deepity has at least two possible readings: On one reading, it is true, but utterly trivial. On another, it would be profound if true, but it simply isn’t true. But if you experience both at once, your brain is triggered for both “true” and “profound” and yields “profound truth”. The example he likes to use is “Love is just a word”. Well, yes, “love” is in fact just a word, but who cares? Yeah, words are words. But love, the underlying concept it describes, is not just a word—though if it were that would change a lot.

One thing I’ve come to realize about my own anxiety is that it involves a wide variety of different scenarios I imagine in my mind, and broadly speaking these can be sorted into two categories: Those that are likely but not terrible, and those that are terrible but not likely.

In the former category we have things like taking an extra year to finish my dissertation; the mean time to completion for a PhD is over 8 years, so finishing in 6 instead of 5 can hardly be considered catastrophic.

In the latter category we have things like dying from COVID-19. Yes, I’m a male with type A blood and asthma living in a high-risk county; but I’m also a young, healthy nonsmoker living under lockdown. Even without knowing the true fatality rate of the virus, my chances of actually dying from it are surely less than 1%.

But when both of those scenarios are running through my brain at the same time, the first triggers a reaction for “likely” and the second triggers a reaction for “terrible”, and I get this feeling that something terrible is actually likely to happen. And indeed if my probability of dying were as high as my probability of needing a 6th year to finish my PhD, that would be catastrophic.

I suppose it’s a bit strange that the opposite doesn’t happen: I never seem to get the improbability of dying attached to the mildness of needing an extra year. The confusion never seems to trigger “neither terrible nor likely”. Or perhaps it does, and my brain immediately disregards that as not worthy of consideration? It makes a certain sort of sense: An event that is neither probable nor severe doesn’t seem to merit much anxiety.

I suspect that many other people’s brains work the same way, eliding distinctions between different outcomes and ending up with a sort of maximal product of probability and severity.
The solution to this is not an easy one: It requires deliberate effort and extensive practice, and benefits greatly from formal training by a therapist. Counter-intuitively, you need to actually focus more on the scenarios that cause you anxiety, and accept the anxiety that such focus triggers in you. I find that it helps to actually write down the details of each scenario as vividly as possible, and review what I have written later. After doing this enough times, you can build up a greater separation in your mind, and more clearly categorize—this one is likely but not terrible, that one is terrible but not likely. It isn’t a cure, but it definitely helps me a great deal. Perhaps it could help you.

Darkest Before the Dawn: Bayesian Impostor Syndrome

Jan 12 JDN 2458860

At the time of writing, I have just returned from my second Allied Social Sciences Association Annual Meeting, the AEA’s annual conference (or AEA and friends, I suppose, since there several other, much smaller economics and finance associations are represented as well). This one was in San Diego, which made it considerably cheaper for me to attend than last year’s. Alas, next year’s conference will be in Chicago. At least flights to Chicago tend to be cheap because it’s a major hub.

My biggest accomplishment of the conference was getting some face-time and career advice from Colin Camerer, the Caltech economist who literally wrote the book on behavioral game theory. Otherwise I would call the conference successful, but not spectacular. Some of the talks were much better than others; I think I liked the one by Emmanuel Saez best, and I also really liked the one on procrastination by Matthew Gibson. I was mildly disappointed by Ben Bernanke’s keynote address; maybe I would have found it more compelling if I were more focused on macroeconomics.

But while sitting through one of the less-interesting seminars I had a clever little idea, which may help explain why Impostor Syndrome seems to occur so frequently even among highly competent, intelligent people. This post is going to be more technical than most, so be warned: Here There Be Bayes. If you fear yon algebra and wish to skip it, I have marked below a good place for you to jump back in.

Suppose there are two types of people, high talent H and low talent L. (In reality there is of course a wide range of talents, so I could assign a distribution over that range, but it would complicate the model without really changing the conclusions.) You don’t know which one you are; all you know is a prior probability h that you are high-talent. It doesn’t matter too much what h is, but for concreteness let’s say h = 0.50; you’ve got to be in the top 50% to be considered “high-talent”.

You are engaged in some sort of activity that comes with a high risk of failure. Many creative endeavors fit this pattern: Perhaps you are a musician looking for a producer, an actor looking for a gig, an author trying to secure an agent, or a scientist trying to publish in a journal. Or maybe you’re a high school student applying to college, or a unemployed worker submitting job applications.

If you are high-talent, you’re more likely to succeed—but still very likely to fail. And even low-talent people don’t always fail; sometimes you just get lucky. Let’s say the probability of success if you are high-talent is p, and if you are low-talent, the probability of success is q. The precise value depends on the domain; but perhaps p = 0.10 and q = 0.02.

Finally, let’s suppose you are highly rational, a good and proper Bayesian. You update all your probabilities based on your observations, precisely as you should.

How will you feel about your talent, after a series of failures?

More precisely, what posterior probability will you assign to being a high-talent individual, after a series of n+k attempts, of which k met with success and n met with failure?

Since failure is likely even if you are high-talent, you shouldn’t update your probability too much on a failurebut each failure should, in fact, lead to revising your probability downward.

Conversely, since success is rare, it should cause you to revise your probability upward—and, as will become important, your revisions upon success should be much larger than your revisions upon failure.

We begin as any good Bayesian does, with Bayes’ Law:

P[H|(~S)^n (S)^k] = P[(~S)^n (S)^k|H] P[H] / P[(~S)^n (S)^k]

In words, this reads: The posterior probability of being high-talent, given that you have observed k successes and n failures, is equal to the probability of observing such an outcome, given that you are high-talent, times the prior probability of being high-skill, divided by the prior probability of observing such an outcome.

We can compute the probabilities on the right-hand side using the binomial distribution:

P[H] = h

P[(~S)^n (S)^k|H] = (n+k C k) p^k (1-p)^n

P[(~S)^n (S)^k] = (n+k C k) p^k (1-p)^n h + (n+k C k) q^k (1-q)^n (1-h)

Plugging all this back in and canceling like terms yields:

P[H|(~S)^n (S)^k] = 1/(1 + [1-h/h] [q/p]^k [(1-q)/(1-p)]^n)

This turns out to be particularly convenient in log-odds form:

L[X] = ln [ P(X)/P(~X) ]

L[(~S)^n) (S)^k|H] = ln [h/(1-h)] + k ln [p/q] + n ln [(1-p)/(1-q)]

Since p > q, ln[p/q] is a positive number, while ln[(1-p)/(1-q)] is a negative number. This corresponds to the fact that you will increase your posterior when you observe a success (k increases by 1) and decrease your posterior when you observe a failure (n increases by 1).

But when p and q are small, it turns out that ln[p/q] is much larger in magnitude than ln[(1-p)/(1-q)]. For the numbers I gave above, p = 0.10 and q = 0.02, ln[p/q] = 1.609 while ln[(1-p)/(1-q)] = -0.085. You will therefore update substantially more upon a success than on a failure.

Yet successes are rare! This means that any given success will most likely be first preceded by a sequence of failures. This results in what I will call the darkest-before-dawn effect: Your opinion of your own talent will tend to be at its very worst in the moments just preceding a major success.

I’ve graphed the results of a few simulations illustrating this: On the X-axis is the number of overall attempts made thus far, and on the Y-axis is the posterior probability of being high-talent. The simulated individual undergoes randomized successes and failures with the probabilities I chose above.

Bayesian_Impostor_full

There are 10 simulations on that one graph, which may make it a bit confusing. So let’s focus in on two runs in particular, which turned out to be run 6 and run 10:

[If you skipped over the math, here’s a good place to come back. Welcome!]

Bayesian_Impostor_focus

Run 6 is a lucky little devil. They had an immediate success, followed by another success in their fourth attempt. As a result, they quickly update their posterior to conclude that they are almost certainly a high-talent individual, and even after a string of failures beyond that they never lose faith.

Run 10, on the other hand, probably has Impostor Syndrome. Failure after failure after failure slowly eroded their self-esteem, leading them to conclude that they are probably a low-talent individual. And then, suddenly, a miracle occurs: On their 20th attempt, at last they succeed, and their whole outlook changes; perhaps they are high-talent after all.

Note that all the simulations are of high-talent individuals. Run 6 and run 10 are equally competent. Ex ante, the probability of success for run 6 and run 10 was exactly the same. Moreover, both individuals are completely rational, in the sense that they are doing perfect Bayesian updating.

And yet, if you compare their self-evaluations after the 19th attempt, they could hardly look more different: Run 6 is 85% sure that they are high-talent, even though they’ve been in a slump for the last 13 attempts. Run 10, on the other hand, is 83% sure that they are low-talent, because they’ve never succeeded at all.

It is darkest just before the dawn: Run 10’s self-evaluation is at its very lowest right before they finally have a success, at which point their self-esteem surges upward, almost to baseline. With just one more success, their opinion of themselves would in fact converge to the same as Run 6’s.

This may explain, at least in part, why Impostor Syndrome is so common. When successes are few and far between—even for the very best and brightest—then a string of failures is the most likely outcome for almost everyone, and it can be difficult to tell whether you are so bright after all. Failure after failure will slowly erode your self-esteem (and should, in some sense; you’re being a good Bayesian!). You’ll observe a few lucky individuals who get their big break right away, and it will only reinforce your fear that you’re not cut out for this (whatever this is) after all.

Of course, this model is far too simple: People don’t just come in “talented” and “untalented” varieties, but have a wide range of skills that lie on a continuum. There are degrees of success and failure as well: You could get published in some obscure field journal hardly anybody reads, or in the top journal in your discipline. You could get into the University of Northwestern Ohio, or into Harvard. And people face different barriers to success that may have nothing to do with talent—perhaps why marginalized people such as women, racial minorities, LGBT people, and people with disabilities tend to have the highest rates of Impostor Syndrome. But I think the overall pattern is right: People feel like impostors when they’ve experienced a long string of failures, even when that is likely to occur for everyone.

What can be done with this information? Well, it leads me to three pieces of advice:

1. When success is rare, find other evidence. If truly “succeeding” (whatever that means in your case) is unlikely on any given attempt, don’t try to evaluate your own competence based on that extremely noisy signal. Instead, look for other sources of data: Do you seem to have the kinds of skills that people who succeed in your endeavors have—preferably based on the most objective measures you can find? Do others who know you or your work have a high opinion of your abilities and your potential? This, perhaps is the greatest mistake we make when falling prey to Impostor Syndrome: We imagine that we have somehow “fooled” people into thinking we are competent, rather than realizing that other people’s opinions of us are actually evidence that we are in fact competent. Use this evidence. Update your posterior on that.

2. Don’t over-update your posterior on failures—and don’t under-update on successes. Very few living humans (if any) are true and proper Bayesians. We use a variety of heuristics when judging probability, most notably the representative and availability heuristics. These will cause you to over-respond to failures, because this string of failures makes you “look like” the kind of person who would continue to fail (representative), and you can’t conjure to mind any clear examples of success (availability). Keeping this in mind, your update upon experiencing failure should be small, probably as small as you can make it. Conversely, when you do actually succeed, even in a small way, don’t dismiss it. Don’t look for reasons why it was just luck—it’s always luck, at least in part, for everyone. Try to update your self-evaluation more when you succeed, precisely because success is rare for everyone.

3. Don’t lose hope. The next one really could be your big break. While astronomically baffling (no, it’s darkest at midnight, in between dusk and dawn!), “it is always darkest before the dawn” really does apply here. You are likely to feel the worst about yourself at the very point where you are about to finally succeed. The lowest self-esteem you ever feel will be just before you finally achieve a major success. Of course, you can’t know if the next one will be it—or if it will take five, or ten, or twenty more tries. And yes, each new failure will hurt a little bit more, make you doubt yourself a little bit more. But if you are properly grounded by what others think of your talents, you can stand firm, until that one glorious day comes and you finally make it.

Now, if I could only manage to take my own advice….

Pascal’s Mugging

Nov 10 JDN 2458798

In the Singularitarian community there is a paradox known as “Pascal’s Mugging”. The name is an intentional reference to Pascal’s Wager (and the link is quite apt, for reasons I’ll discuss in a later post.)

There are a few different versions of the argument; Yudkowsky’s original argument in which he came up with the name “Pascal’s Mugging” relies upon the concept of the universe as a simulation and an understanding of esoteric mathematical notation. So here is a more intuitive version:

A strange man in a dark hood comes up to you on the street. “Give me five dollars,” he says, “or I will destroy an entire planet filled with ten billion innocent people. I cannot prove to you that I have this power, but how much is an innocent life worth to you? Even if it is as little as $5,000, are you really willing to bet on ten trillion to one odds that I am lying?”

Do you give him the five dollars? I suspect that you do not. Indeed, I suspect that you’d be less likely to give him the five dollars than if he had merely said he was homeless and asked for five dollars to help pay for food. (Also, you may have objected that you value innocent lives, even faraway strangers you’ll never meet, at more than $5,000 each—but if that’s the case, you should probably be donating more, because the world’s best charities can save a live for about $3,000.)

But therein lies the paradox: Are you really willing to bet on ten trillion to one odds?

This argument gives me much the same feeling as the Ontological Argument; as Russell said of the latter, “it is much easier to be persuaded that ontological arguments are no good than it is to say exactly what is wrong with them.” It wasn’t until I read this post on GiveWell that I could really formulate the answer clearly enough to explain it.

The apparent force of Pascal’s Mugging comes from the idea of expected utility: Even if the probability of an event is very small, if it has a sufficiently great impact, the expected utility can still be large.

The problem with this argument is that extraordinary claims require extraordinary evidence. If a man held a gun to your head and said he’d shoot you if you didn’t give him five dollars, you’d give him five dollars. This is a plausible claim and he has provided ample evidence. If he were instead wearing a bomb vest (or even just really puffy clothing that could conceal a bomb vest), and he threatened to blow up a building unless you gave him five dollars, you’d probably do the same. This is less plausible (what kind of terrorist only demands five dollars?), but it’s not worth taking the chance.

But when he claims to have a Death Star parked in orbit of some distant planet, primed to make another Alderaan, you are right to be extremely skeptical. And if he claims to be a being from beyond our universe, primed to destroy so many lives that we couldn’t even write the number down with all the atoms in our universe (which was actually Yudkowsky’s original argument), to say that you are extremely skeptical seems a grievous understatement.

That GiveWell post provides a way to make this intuition mathematically precise in terms of Bayesian logic. If you have a normal prior with mean 0 and standard deviation 1, and you are presented with a likelihood with mean X and standard deviation X, what should you make your posterior distribution?

Normal priors are quite convenient; they conjugate nicely. The precision (inverse variance) of the posterior distribution is the sum of the two precisions, and the mean is a weighted average of the two means, weighted by their precision.

So the posterior variance is 1/(1 + 1/X^2).

The posterior mean is 1/(1+1/X^2)*(0) + (1/X^2)/(1+1/X^2)*(X) = X/(X^2+1).

That is, the mean of the posterior distribution is just barely higher than zero—and in fact, it is decreasing in X, if X > 1.

For those who don’t speak Bayesian: If someone says he’s going to have an effect of magnitude X, you should be less likely to believe him the larger that X is. And indeed this is precisely what our intuition said before: If he says he’s going to kill one person, believe him. If he says he’s going to destroy a planet, don’t believe him, unless he provides some really extraordinary evidence.

What sort of extraordinary evidence? To his credit, Yudkowsky imagined the sort of evidence that might actually be convincing:

If a poorly-dressed street person offers to save 10(10^100) lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10-(10^100) probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky.

This post he called “Pascal’s Muggle”, after the term from the Harry Potter series, since some of the solutions that had been proposed for dealing with Pascal’s Mugging had resulted in a situation almost as absurd, in which the mugger could exhibit powers beyond our imagining and yet nevertheless we’d never have sufficient evidence to believe him.

So, let me go on record as saying this: Yes, if someone snaps his fingers and causes the sky to rip open and reveal a silhouette of himself, I’ll do whatever that person says. The odds are still higher that I’m dreaming or hallucinating than that this is really a being from beyond our universe, but if I’m dreaming, it makes no difference, and if someone can make me hallucinate that vividly he can probably cajole the money out of me in other ways. And there might be just enough chance that this could be real that I’m willing to give up that five bucks.

These seem like really strange thought experiments, because they are. But like many good thought experiments, they can provide us with some important insights. In this case, I think they are telling us something about the way human reasoning can fail when faced with impacts beyond our normal experience: We are in danger of both over-estimating and under-estimating their effects, because our brains aren’t equipped to deal with magnitudes and probabilities on that scale. This has made me realize something rather important about both Singularitarianism and religion, but I’ll save that for next week’s post.

Revealed preference: Does the fact that I did it mean I preferred it?

Post 312 Oct 27 JDN 2458784

One of the most basic axioms of neoclassical economics is revealed preference: Because we cannot observe preferences directly, we infer them from actions. Whatever you chose must be what you preferred.

Stated so badly, this is obviously not true: We often make decisions that we later come to regret. We may choose under duress, or confusion; we may lack necessary information. We change our minds.

And there really do seem to be economists who use it in this bald way: From the fact that a particular outcome occurred in a free market, they will infer that it must be optimally efficient. (“Freshwater” economists who are dubious of any intervention into markets seem to be most guilty of this.) In the most extreme form, this account would have us believe that people who trip and fall do so on purpose.

I doubt anyone believes that particular version—but there do seem to be people who believe that unemployment is the result of people voluntarily choosing not to work, and revealed preference has also led economists down some strange paths when trying to explain what sure looks like irrational behavior—such as “rational addiction” theory, positing that someone can absolutely become addicted to alcohol or heroin and end up ruining their life all based on completely rational, forward-thinking decision planning.

The theory can be adapted to deal with these issues, by specifying that it’s only choices made with full information and all of our faculties intact that count as revealing our preferences.

But when are we ever in such circumstances? When do we ever really have all the information we need in order to make a rational decision? Just what constitutes intact faculties? No one is perfectly rational—so how rational must we be in order for our decisions to count as revealing our preferences?

Revealed preference theory also quickly becomes tautologous: Why do we choose to do things? Because we prefer them. What do we prefer? What we choose to do. Without some independent account of what our preferences are, we can’t really predict behavior this way.

A standard counter-argument to this is that revealed preference theory imposes certain constraints of consistency and transitivity, so it is not utterly vacuous. The problem with this answer is that human beings don’t obey those constraints. The Allais Paradox, the Ellsberg Paradox, the sunk cost fallacy. It’s even possible to use these inconsistencies to create “money pumps” that will cause people to systematically give you money; this has been done in experiments. While real-world violations seem to be small, they’re definitely present. So insofar as your theory is testable, it’s false.

The good news is that we really don’t need revealed preference theory. We already have ways of telling what human beings prefer that are considerably richer than simply observing what they choose in various scenarios. One very simple but surprisingly powerful method is to ask. In general, if you ask people what they want and they have no reason to distrust you, they will in fact tell you what they want.

We also have our own introspection, as well as our knowledge about millions of years of evolutionary history that shaped our brains. We don’t expect a lot of people to prefer suffering, for instance (even masochists, who might be said to ‘prefer pain’, seem to be experiencing that pain rather differently than the rest of us would). We generally expect that people will prefer to stay alive rather than die. Some may prefer chocolate, others vanilla; but few prefer motor oil. Our preferences may vary, but they do follow consistent patterns; they are not utterly arbitrary and inscrutable.

There is a deeper problem that any account of human desires must face, however: Sometimes we are actually wrong about our own desires. Affective forecasting, the prediction of our own future mental states, is astonishingly unreliable. People often wildly overestimate the emotional effects of both positive and negative outcomes. (Interestingly, people with depression tend not to do this—those with severe depression often underestimate the emotional effects of positive outcomes, while those with mild depression seem to be some of the most accurate forecasters, an example of the depressive realism effect.)

There may be no simple solution to this problem. Human existence is complicated; we spend large portions of our lives trying to figure out what it is we really want.
This means that we should not simply trust that whatever it is happens is what everyone—or even necessarily anyone—wants to happen. People make mistakes, even large, systematic, repeated mistakes. Sometimes what happens is just bad, and we should be trying to change it. Indeed, sometimes people need to be protected from their own bad decisions.

The backfire effect has been greatly exaggerated

Sep 8 JDN 2458736

Do a search for “backfire effect” and you’re likely to get a large number of results, many of them from quite credible sources. The Oatmeal did an excellent comic on it. The basic notion is simple: “[…]some individuals when confronted with evidence that conflicts with their beliefs come to hold their original position even more strongly.”

The implications of this effect are terrifying: There’s no point in arguing with anyone about anything controversial, because once someone strongly holds a belief there is nothing you can do to ever change it. Beliefs are fixed and unchanging, stalwart cliffs against the petty tides of evidence and logic.

Fortunately, the backfire effect is not actually real—or if it is, it’s quite rare. Over many years those seemingly-ineffectual tides can erode those cliffs down and turn them into sandy beaches.

The most recent studies with larger samples and better statistical analysis suggest that the typical response to receiving evidence contradicting our beliefs is—lo and behold—to change our beliefs toward that evidence.

To be clear, very few people completely revise their worldview in response to a single argument. Instead, they try to make a few small changes and fit them in as best they can.

But would we really expect otherwise? Worldviews are holistic, interconnected systems. You’ve built up your worldview over many years of education, experience, and acculturation. Even when someone presents you with extremely compelling evidence that your view is wrong, you have to weigh that against everything else you have experienced prior to that point. It’s entirely reasonable—rational, even—for you to try to fit the new evidence in with a minimal overall change to your worldview. If it’s possible to make sense of the available evidence with only a small change in your beliefs, it makes perfect sense for you to do that.

What if your whole worldview is wrong? You might have based your view of the world on a religion that turns out not to be true. You might have been raised into a culture with a fundamentally incorrect concept of morality. What if you really do need a radical revision—what then?

Well, that can happen too. People change religions. They abandon their old cultures and adopt new ones. This is not a frequent occurrence, to be sure—but it does happen. It happens, I would posit, when someone has been bombarded with contrary evidence not once, not a few times, but hundreds or thousands of times, until they can no longer sustain the crumbling fortress of their beliefs against the overwhelming onslaught of argument.

I think the reason that the backfire effect feels true to us is that our life experience is largely that “argument doesn’t work”; we think back to all the times that we have tried to convince to change a belief that was important to them, and we can find so few examples of when it actually worked. But this is setting the bar much too high. You shouldn’t expect to change an entire worldview in a single conversation. Even if your worldview is correct and theirs is not, that one conversation can’t have provided sufficient evidence for them to rationally conclude that. One person could always be mistaken. One piece of evidence could always be misleading. Even a direct experience could be a delusion or a foggy memory.

You shouldn’t be trying to turn a Young-Earth Creationist into an evolutionary biologist, or a climate change denier into a Greenpeace member. You should be trying to make that Creationist question whether the Ussher chronology is really so reliable, or if perhaps the Earth might be a bit older than a 17th century theologian interpreted it to be. You should be getting the climate change denier to question whether scientists really have such a greater vested interest in this than oil company lobbyists. You can’t expect to make them tear down the entire wall—just get them to take out one brick today, and then another brick tomorrow, and perhaps another the day after that.

The proverb is of uncertain provenance, variously attributed, rarely verified, but it is still my favorite: No single raindrop feels responsible for the flood.

Do not seek to be a flood. Seek only to be a raindrop—for if we all do, the flood will happen sure enough. (There’s a version more specific to our times: So maybe we’re snowflakes. I believe there is a word for a lot of snowflakes together: Avalanche.)

And remember this also: When you argue in public (which includes social media), you aren’t just arguing for the person you’re directly engaged with; you are also arguing for everyone who is there to listen. Even if you can’t get the person you’re arguing with to concede even a single point, maybe there is someone else reading your post who now thinks a little differently because of something you said. In fact, maybe there are many people who think a little differently—the marginal impact of slacktivism can actually be staggeringly large if the audience is big enough.

This can be frustrating, thankless work, for few people will ever thank you for changing their mind, and many will condemn you even for trying. Finding out you were wrong about a deeply-held belief can be painful and humiliating, and most people will attribute that pain and humiliation to the person who called them out for being wrong—rather than placing the blame where it belongs, which is on whatever source or method made you wrong in the first place. Being wrong feels just like being right.

But this is important work, among the most important work that anyone can do. Philosophy, mathematics, science, technology—all of these things depend upon it. Changing people’s minds by evidence and rational argument is literally the foundation of civilization itself. Every real, enduring increment of progress humanity has ever made depends upon this basic process. Perhaps occasionally we have gotten lucky and made the right choice for the wrong reasons; but without the guiding light of reason, there is nothing to stop us from switching back and making the wrong choice again soon enough.

So I guess what I’m saying is: Don’t give up. Keep arguing. Keep presenting evidence. Don’t be afraid that your arguments will backfire—because in fact they probably won’t.