On foxes and hedgehogs, part II

Aug 3 JDN 2460891

In last week’s post I described Philip E. Tetlock’s experiment showing that “foxes” (people who are open-minded and willing to consider alternative views) make more accurate predictions than “hedgehogs” (people who are dogmatic and conform strictly to a single ideology).

As I explained at the end of the post, he, uh, hedges on this point quite a bit, coming up with various ways that the hedgehogs might be able to redeem themselves, but still concluding that in most circumstances, the foxes seem to be more accurate.

Here are my thoughts on this:

I think he went too easy on the hedgehogs.

I consider myself very much a fox, and I honestly would never assign a probability of 0% or 100% to any physically possible event. Honestly I consider it a flaw in Tetlock’s design that he included those as options but didn’t include probabilities I would assign, like 1%, 0.1%, or 0.01%.

He only let people assign probabilities in 10% increments. So I guess if you thought something was 3% likely, you’re supposed to round to 0%? That still feels terrible. I’d probably still write 10%. There weren’t any questions like “Aliens from the Andromeda Galaxy arrive to conquer our planet, thus rendering all previous political conflicts moot”, but man, had there been, I’d still be tempted to not put 0%. I guess I would put 0% for that though? Because in 99.999999% of cases, I’d get it right—it wouldn’t happen—and I’d get more points. But man, even single-digit percentages? I’d mash the 10% button. I am pretty much allergic to overconfidence.

In fact, I think in my mind I basically try to use a logarithmic score, which unlike a Brier score, severely (technically, infinitely) punishes you for saying that something impossible happened or something inevitable didn’t. Like, really, if you’re doing it right, that should never, ever happen to you. If you assert that something has 0% probability and it happens, you have just conclusively disproven your worldview. (Admittedly it’s possible you could fix it with small changes—but a full discussion of that would get us philosophically too far afield. “outside the scope of this paper”.)

So honestly I think he was too lenient on overconfidence by using a Brier score, which does penalize this kind of catastrophic overconfidence, but only by a moderate amount. If you say that something has a 0% chance and then it happens, you get a Brier score of -1. But if you say that something has a 50% chance and then it happens (which it would, you know, 50% of the time), you’d get a Brier score of -0.25. So even absurd overconfidence isn’t really penalized that badly.

Compare this to a logarithmic rule: Say 0% and it happens, and you get negative infinity. You lose. You fail. Go home. Your worldview is bad and you should feel bad. This should never happen to you if you have a coherent worldview (modulo the fact that he didn’t let you say 0.01%).

So if I had designed this experiment, I would have given finer-grained options at the extremes, and then brought the hammer down on anybody who actually asserted a 0% chance of an event that actually occurred. (There’s no need for the finer-grained options elsewhere; over millennia of history, the difference between 0% and 0.1% is whether it won’t happen or it will—quite relevant for, say, full-scale nuclear war—while the difference between 40% and 42.1% is whether it’ll happen every 2 to 3 years or… every 2 to 3 years.)

But okay, let’s say we stick with the Brier score, because infinity is scary.

  1. About the adjustments:
    1. The “value adjustments” are just absolute nonsense. Those would be reasons to adjust your policy response, via your utility function—they are not a reason to adjust your probability. Yes, a nuclear terrorist attack would be a really big deal if it happened and we should definitely be taking steps to prevent that; but that doesn’t change the fact that the probability of one happening is something like 0.1% per year and none have ever happened. Predicting things that don’t happen is bad forecasting, even if the things you are predicting would be very important if they happened.
    2. The “difficulty adjustments” are sort of like applying a different scoring rule, so that I’m more okay with; but that wasn’t enough to make the hedgehogs look better than the foxes.
    3. The “fuzzy set” adjustments could be legitimate, but only under particular circumstances. Being “almost right” is only valid if you clearly showed that the result was anomalous because of some other unlikely event, and—because the timeframe was clearly specified in the questions—“might still happen” should still get fewer points than accurately predicting that it hasn’t happened yet. Moreover, it was very clear that people only ever applied these sort of changes when they got things wrong; they rarely if ever said things like “Oh, wow, I said that would happen and it did, but for completely different reasons that I didn’t expect—I was almost wrong there.” (Crazy example, but if the Soviet Union had been taken over by aliens, “the Soviet Union will fall” would be correct—but I don’t think you could really attribute that to good political prediction.)
  2. The second exercise shows that even the foxes are not great Bayesians, and that some manipulations can make people even more inaccurate than before; but the hedgehogs also perform worse and also make some of the same crazy mistakes and still perform worse overall than the foxes, even in that experiment.
  3. I guess he’d call me a “hardline neopositivist”? Because I think that your experiment asking people to predict things should require people to, um, actually predict things? The task was not to get the predictions wrong but be able to come up with clever excuses for why they were wrong that don’t challenge their worldview. The task was to not get the predictions wrong. Apparently this very basic level of scientific objectivity is now considered “hardline neopositivism”.

I guess we can reasonably acknowledge that making policy is about more than just prediction, and indeed maybe being consistent and decisive is advantageous in a game-theoretic sense (in much the same way that the way to win a game of Chicken is to very visibly throw away your steering wheel). So you could still make a case for why hedgehogs are good decision-makers or good leaders.

But I really don’t see how you weasel out of the fact that hedgehogs are really bad predictors. If I were running a corporation, or a government department, or an intelligence agency, I would want accurate predictions. I would not be interested in clever excuses or rich narratives. Maybe as leaders one must assemble such narratives in order to motivate people; so be it, there’s a division of labor there. Maybe I’d have a separate team of narrative-constructing hedgehogs to help me with PR or something. But the people who are actually analyzing the data should be people who are good at making accurate predictions, full stop.

And in fact, I don’t think hedgehogs are good decision-makers or good leaders. I think they are good politicians. I think they are good at getting people to follow them and believe what they say. But I do not think they are actually good at making the decisions that would be the best for society.

Indeed, I think this is a very serious problem.

I think we systematically elect people to higher office—and hire them for jobs, and approve them for tenure, and so on—because they express confidence rather than competence. We pick the people who believe in themselves the most, who (by regression to the mean if nothing else) are almost certainly the people who are most over-confident in themselves.

Given that confidence is easier to measure than competence in most areas, it might still make sense to choose confident people if confidence were really positively correlated with competence, but I’m not convinced that it is. I think part of what Tetlock is showing us is that the kind of cognitive style that yields high confidence—a hedgehog—simply is not the kind of cognitive style that yields accurate beliefs—a fox. People who are really good at their jobs are constantly questioning themselves, always open to new ideas and new evidence; but that also means that they hedge their bets, say “on the other hand” a lot, and often suffer from Impostor Syndrome. (Honestly, testing someone for Impostor Syndrome might be a better measure of competence than a traditional job interview! Then again, Goodhart’s Law.)

Indeed, I even see this effect within academic science; the best scientists I know are foxes through and through, but they’re never the ones getting published in top journals and invited to give keynote speeches at conferences. The “big names” are always hedgehog blowhards with some pet theory they developed in the 1980s that has failed to replicate but somehow still won’t die.

Moreover, I would guess that trustworthiness is actually pretty strongly inversely correlated to confidence—“con artist” is short for “confidence artist”, after all.

Then again, I tried to find rigorous research comparing openness (roughly speaking “fox-ness”) or humility to honesty, and it was surprisingly hard to find. Actually maybe the latter is just considered an obvious consensus in the literature, because there is a widely-used construct called honesty-humility. (In which case, yeah, my thinking on trustworthiness and confidence is an accepted fact among professional psychologists—but then, why don’t more people know that?)

But that still doesn’t tell me if there is any correlation between honesty-humility and openness.

I did find these studies showing that both honesty-humility and openness are both positively correlated with well-being, both positively correlated with cooperation in experimental games, and both positively correlated with being left-wing; but that doesn’t actually prove they are positively correlated with each other. I guess it provides weak evidence in that direction, but only weak evidence. It’s entirely possible for A to be positively correlated with both B and C but B and C are uncorrelated or negatively correlated. (Living in Chicago is positively correlated with being a White Sox fan and positively correlated with being a Cubs fan, but being a White Sox fan is not positively correlated with being a Cubs fan!)

I also found studies showing that higher openness predicts less right-wing authoritarianism and higher honesty predicts less social conformity; but that wasn’t the question either.

Here’s a factor analysis specifically arguing for designing measures of honesty-humility so that they don’t correlate with other personality traits, so it can be seen as its own independent personality trait. There are some uncomfortable degrees of freedom in designing new personality metrics, which may make this sort of thing possible; and then by construction honesty-humility and openness would be uncorrelated, because any shared components were parceled out to one trait or the other.

So, I guess I can’t really confirm my suspicion here; maybe people who think like hedgehogs aren’t any less honest, or are even more honest, than people who think like foxes. But I’d still bet otherwise. My own life experience has been that foxes are honest and humble while hedgehogs are deceitful and arrogant.

Indeed, I believe that in systematically choosing confident hedgehogs as leaders, the world economy loses tens of trillions of dollars a year in inefficiencies. In fact, I think that we could probably end world hunger if we only ever put leaders in charge who were both competent and trustworthy.

Of course, in some sense that’s a pipe dream; we’re never going to get all good leaders, just as we’ll never get zero death or zero crime.

But based on how otherwise-similar countries have taken wildly different trajectories based on differences in leadership, I suspect that even relatively small changes in that direction could have quite large impacts on a society’s outcomes: South Korea isn’t perfect at picking its leaders; but surely it’s better than North Korea, and indeed that seems like one of the primary things that differentiates the two countries. Botswana is not a utopian paradise, but it’s a much nicer place to live than Nigeria, and a lot of the difference seems to come down to who is in charge, or who has been in charge for the last few decades.

And I could put in a jab here about the current state of the United States, but I’ll resist. If you read my blog, you already know my opinions on this matter.

On foxes and hedgehogs, part I

Aug 3 JDN 2460891

Today I finally got around to reading Expert Political Judgment by Philip E. Tetlock, more or less in a single sitting because I’ve been sick the last week with some pretty tight limits on what activities I can do. (It’s mostly been reading, watching TV, or playing video games that don’t require intense focus.)

It’s really an excellent book, and I now both understand why it came so highly recommended to me, and now pass on that recommendation to you: Read it.

The central thesis of the book really boils down to three propositions:

  1. Human beings, even experts, are very bad at predicting political outcomes.
  2. Some people, who use an open-minded strategy (called “foxes”), perform substantially better than other people, who use a more dogmatic strategy (called “hedgehogs”).
  3. When rewarding predictors with money, power, fame, prestige, and status, human beings systematically favor (over)confident “hedgehogs” over (correctly) humble “foxes”.

I decided I didn’t want to make this post about current events, but I think you’ll probably agree with me when I say:

That explains a lot.

How did Tetlock determine this?

Well, he studies the issue several different ways, but the core experiment that drives his account is actually a rather simple one:

  1. He gathered a large group of subject-matter experts: Economists, political scientists, historians, and area-studies professors.
  2. He came up with a large set of questions about politics, economics, and similar topics, which could all be formulated as a set of probabilities: “How likely is this to get better/get worse/stay the same?” (For example, this was in the 1980s, so he asked about the fate of the Soviet Union: “By 1990, will they become democratic, remain as they are, or collapse and fragment?”)
  3. Each respondent answered a subset of the questions, some about their own particular field, some about another, more distant field; they assigned probabilities on an 11-point scale, from 0% to 100% in increments of 10%.
  4. A few years later, he compared the predictions to the actual results, scoring them using a Brier score, which penalizes you for assigning high probability to things that didn’t happen or low probability to things that did happen.
  5. He compared the resulting scores between people with different backgrounds, on different topics, with different thinking styles, and a variety of other variables. He also benchmarked them using some automated algorithms like “always say 33%” and “always give ‘stay the same’ 100%”.

I’ll show you the key results of that analysis momentarily, but to help it make more sense to you, let me elaborate a bit more on the “foxes” and “hedgehogs”. The notion is was first popularized by Isaiah Berlin in an essay called, simply, The Hedgehog and the Fox.

“The fox knows many things, but the hedgehog knows one very big thing.”

That is, someone who reasons as a “fox” combines ideas from many different sources and perspective, and tries to weigh them all together into some sort of synthesis that then yields a final answer. This process is messy and complicated, and rarely yields high confidence about anything.

Whereas, someone who reasons as a “hedgehog” has a comprehensive theory of the world, an ideology, that provides clear answers to almost any possible question, with the surely minor, insubstantial flaw that those answers are not particularly likely to be correct.

He also considered “hedge-foxes” (people who are mostly fox but also a little bit hedgehog) and “fox-hogs” (people who are mostly hedgehog but also a little bit fox).

Tetlock has decomposed the scores into two components: calibration and discrimination. (Both very overloaded words, but they are standard in the literature.)

Calibration is how well your stated probabilities matched up with the actual probabilities; that is, if you predicted 10% probability on 20 different events, you have very good calibration if precisely 2 of those events occurred, and very poor calibration if 18 of those events occurred.

Discrimination more or less describes how useful your predictions are, what information they contain above and beyond the simple base rate. If you just assign equal probability to all events, you probably will have reasonably good calibration, but you’ll have zero discrimination; whereas if you somehow managed to assign 100% to everything that happened and 0% to everything that didn’t, your discrimination would be perfect (and we would have to find out how you cheated, or else declare you clairvoyant).

For both measures, higher is better. The ideal for each is 100%, but it’s virtually impossible to get 100% discrimination and actually not that hard to get 100% calibration if you just use the base rates for everything.


There is a bit of a tradeoff between these two: It’s not too hard to get reasonably good calibration if you just never go out on a limb, but then your predictions aren’t as useful; we could have mostly just guessed them from the base rates.

On the graph, you’ll see downward-sloping lines that are meant to represent this tradeoff: Two prediction methods that would yield the same overall score but different levels of calibration and discrimination will be on the same line. In a sense, two points on the same line are equally good methods that prioritize usefulness over accuracy differently.

All right, let’s see the graph at last:

The pattern is quite clear: The more foxy you are, the better you do, and the more hedgehoggy you are, the worse you do.

I’d also like to point out the other two regions here: “Mindless competition” and “Formal models”.

The former includes really simple algorithms like “always return 33%” or “always give ‘stay the same’ 100%”. These perform shockingly well. The most sophisticated of these, “case-specific extrapolation” (35 and 36 on the graph, which basically assumes that each country will continue doing what it’s been doing) actually performs as well if not better than even the foxes.

And what’s that at the upper-right corner, absolutely dominating the graph? That’s “Formal models”. This describes basically taking all the variables you can find and shoving them into a gigantic logit model, and then outputting the result. It’s computationally intensive and requires a lot of data (hence why he didn’t feel like it deserved to be called “mindless”), but it’s really not very complicated, and it’s the best prediction method, in every way, by far.

This has made me feel quite vindicated about a weird nerd thing I do: When I have a big decision to make (especially a financial decision), I create a spreadsheet and assemble a linear utility model to determine which choice will maximize my utility, under different parameterizations based on my past experiences. Whichever result seems to win the most robustly, I choose. This is fundamentally similar to the “formal models” prediction method, where the thing I’m trying to predict is my own happiness. (It’s a bit less formal, actually, since I don’t have detailed happiness data to feed into the regression.) And it has worked for me, astonishingly well. It definitely beats going by my own gut. I highly recommend it.

What does this mean?

Well first of all, it means humans suck at predicting things. At least for this data set, even our experts don’t perform substantially better than mindless models like “always assume the base rate”.

Nor do experts perform much better in their own fields than in other fields; they do all perform better than undergrads or random people (who somehow perform worse than the “mindless” models)

But Tetlock also investigates further, trying to better understand this “fox/hedgehog” distinction and why it yields different performance. He really bends over backwards to try to redeem the hedgehogs, in the following ways:

  1. He allows them to make post-hoc corrections to their scores, based on “value adjustments” (assigning higher probability to events that would be really important) and “difficulty adjustments” (assigning higher scores to questions where the three outcomes were close to equally probable) and “fuzzy sets” (giving some leeway on things that almost happened or things that might still happen later).
  2. He demonstrates a different, related experiment, in which certain manipulations can cause foxes to perform a lot worse than they normally would, and even yield really crazy results like probabilities that add up to 200%.
  3. He has a whole chapter that is a Socratic dialogue (seriously!) between four voices: A “hardline neopositivist”, a “moderate neopositivist”, a “reasonable relativist”, and an “unrelenting relativist”; and all but the “hardline neopositivist” agree that there is some legitimate place for the sort of post hoc corrections that the hedgehogs make to keep themselves from looking so bad.

This post is already getting a bit long, so that will conclude part I. Stay tuned for part II, next week!

How to teach people about vaccines

May 25 JDN 2460821

Vaccines are one of the greatest accomplishments in human history. They have saved hundreds of millions of lives with minimal cost and almost no downside at all. (For everyone who suffers a side effect from a vaccine, I guarantee you: Someone else would have had it much worse from the disease if they hadn’t been vaccinated.)

It’s honestly really astonishing just how much good vaccines have done for humanity.

Thus, it’s a bit of a mystery how there are so many people who oppose vaccines.

But this mystery becomes a little less baffling in light of behavioral economics. People assess the probability of an event mainly based on the availability heuristic: How many examples can they think of when it happened?

Precisely because vaccines have been so effective at preventing disease, we have now reached a point where diseases that were once commonplace are now virtually eradicated. Thus, parents considering whether to vaccinate their children think about whether they know anyone who has gotten sick from that disease, and they can’t think of anyone, so they assume that it’s not a real danger. Then, someone comes along and convinces them (based on utter lies that have been thoroughly debunked) that vaccines cause autism, and they get scared about autism, because they can think of someone they know who has autism.

But of course, the reason that they can’t think of anyone who died from measles or pertussis is because of the vaccines. So I think we need an educational campaign that makes these rates more vivid for people, which plays into the availability heuristic instead of against it.

Here’s my proposal for a little educational game that might help:

It functions quite similarly to a classic tabletop RPG like Dungeons & Dragons, only here the target numbers are based on real figures.


Gather a group of at least 100 people. (Too few, and the odds become small enough that you may get no examples of some diseases.)

Each person needs 3 10-sided dice. Preferably they would be different colors or somehow labeled, because we want one to represent the 100s digit, one the 10s digit, and one the 1s digit. (The numbers you can roll thus range uniformly from 0 to 999.) In TTRPG parlance, this is called a d1000.

Give each person a worksheet that looks like this:

DiseaseBefore vaccine: Caught?Before vaccine: Died?After vaccine: Caught?After vaccine: Died?
Diptheria



Measles



Mumps



Pertussis



Polio



Rubella



Smallpox



Tetanus



Hep A



Hep B



Pneumococca



Varicella



In the first round, use the figures for before the vaccine. In the second round, use the figures for after the vaccine.

For each disease in each round, there will be a certain roll that people need to get in order to either not contract the disease: Roll that number or higher, and you are okay; roll below it, and you catch the disease.


Likewise, there will be a certain roll they need to get to survive if they contract it: Roll that number or higher, and you get sick but survive; roll below it, and you die.

Each time, name a disease, and then tell people what they need to roll to not catch it.

Have them all roll, and if they catch it, check off that box.

Then, for everyone who catches it, have them roll again to see if they survive it. If they die, check that box.

Based on the historical incidences which I have converted into lifetime prevalences, the target numbers are as follows:

DiseaseBefore vaccine: Roll to not catchBefore vaccine: Roll to surviveAfter vaccine: Roll to not catchAfter vaccine: Roll to survive
Diptheria138700
Measles244100
Mumps66020
Pertussis1232042
Polio208900
Rubella191190
Smallpox201200
Tetanus1800171
Hep A37141
Hep B22444
Pneumococca1910311119
Varicella95011640

What you should expect to see for a group of 100 is something like this (of course the results are random, so it won’t be this exactly):

DiseaseBefore vaccine: Number caughtBefore vaccine: Number diedAfter vaccine: Number caughtAfter vaccine: Number died
Diptheria1000
Measles24000
Mumps7000
Pertussis12100
Polio2000
Rubella2000
Smallpox2000
Tetanus0000
Hep A4000
Hep B2000
Pneumococca2111
Varicella950160

You’ll find that not a lot of people have checked those “dead” boxes either before or after the vaccine. So if you just look at death rates, the difference may not seem that stark.

(Of course, over a world as big as ours, it adds up: The difference between the 0.25% death rate of pertussis before the vaccine and 0% today is 20 million people—roughly the number of people who live in the New York City metro area.)

But I think people will notice that a lot more people got sick in the “before-vaccine” world than the “after-vaccine” world. Moreover, those that did get sick will find themselves rolling the dice on dying; they’ll probably be fine, but you never know for sure.

Make sure people also notice that (except for pneumococca), if you do get sick, the roll you need to survive is a lot higher without the vaccine. (If anyone does get unlucky enough to get tetanus in the first round, they’re probably gonna die!)

If anyone brings up autism, you can add an extra round where you roll for that too.

The supposedly “epidemic” prevalence of autism today is… 3.2%.

(Honestly I expected higher than that, but then, I hang around with a lot of queer and neurodivergent people. (So the availability heuristic got me too!))

Thus, what’s the roll to not get autism? 32.

Even with the expansive diagnostic criteria that include a lot of borderline cases like yours truly, you still only need to roll 32 on this d1000 to not get autism.

This means that only about 3 people in your group of 100 should end up getting autism, most likely fewer than the number who were saved from getting measles, mumps, and rubella by the vaccine, comparable to the number saved from getting most of the other diseases—and almost certainly fewer than the number saved from getting varicella.

So even if someone remains absolutely convinced that vaccines cause autism, you can now point out that vaccines also clearly save billions of people from getting sick and millions from dying.

Also, there are different kinds of autism. Some forms might not even be considered a disability if society were more accommodating; others are severely debilitating.

Recently clinicians have started to categorize “profound autism”, the kind that is severely debilitating. This constitutes about 25% of children with autism—but it’s a falling percentage over time, because broader diagnostic criteria are including more people as autistic, but not changing the number who are severely debilitated. (It is controversial exactly what should constitute “profound autism”, but I do think the construct is useful; there’s a big difference between someone like me who can basically function normally with some simple accommodations, and someone who never even learns to talk.)

So you can have the group do another roll, specifically for profound autism; that target number is now only 8.

There’s also one more demonstration you can do.

Aggregating over all these diseases, we can find the overall chance of dying from any of these diseases before and after the vaccine.

Have everyone roll for that, too:

Before the vaccines, the target number is 8. Afterward, it is 1.

If autism was brought up, make that comparison explicit.

Even if 100% of autism cases were caused by vaccines (which, I really must say, is ridiculous, as there’s no credible evidence that vaccines cause autism at all) that would still mean the following:

You are trading off a 32 in 1000 chance of your child being autistic and an 8 in 1000 chance of your child being profoundly autistic, against a 7 in 1000 chance of your child dying.

If someone is still skeptical of vaccines at this point, you should ask them point-blank:

Do you really think that being autistic is one-fifth as bad as dying?

Do you really think that being profoundly autistic is as bad as dying?

Bundling the stakes to recalibrate ourselves

Mar 31 JDN 2460402

In a previous post I reflected on how our minds evolved for an environment of immediate return: An immediate threat with high chance of success and life-or-death stakes. But the world we live in is one of delayed return: delayed consequences with low chance of success and minimal stakes.

We evolved for a world where you need to either jump that ravine right now or you’ll die; but we live in a world where you’ll submit a hundred job applications before finally getting a good offer.

Thus, our anxiety system is miscalibrated for our modern world, and this miscalibration causes us to have deep, chronic anxiety which is pathological, instead of brief, intense anxiety that would protect us from harm.

I had an idea for how we might try to jury-rig this system and recalibrate ourselves:

Bundle the stakes.

Consider job applications.

The obvious way to think about it is to consider each application, and decide whether it’s worth the effort.

Any particular job application in today’s market probably costs you 30 minutes, but you won’t hear back for 2 weeks, and you have maybe a 2% chance of success. But if you fail, all you lost was that 30 minutes. This is the exact opposite of what our brains evolved to handle.

So now suppose if you think of it in terms of sending 100 job applications.

That will cost you 30 times 100 minutes = 50 hours. You still won’t hear back for weeks, but you’ve spent weeks, so that won’t feel as strange. And your chances of success after 100 applications are something like 1-(0.98)^100 = 87%.

Even losing 50 hours over a few weeks is not the disaster that falling down a ravine is. But it still feels a lot more reasonable to be anxious about that than to be anxious about losing 30 minutes.

More importantly, we have radically changed the chances of success.

Each individual application will almost certainly fail, but all 100 together will probably succeed.

If we were optimally rational, these two methods would lead to the same outcomes, by a rather deep mathematical law, the linearity of expectation:
E[nX] = n E[X]

Thus, the expected utility of doing something n times is precisely n times the expected utility of doing it once (all other things equal); and so, it doesn’t matter which way you look at it.

But of course we aren’t perfectly rational. We don’t actually respond to the expected utility. It’s still not entirely clear how we do assess probability in our minds (prospect theory seems to be onto something, but it’s computationally harder than rational probability, which means it makes absolutely no sense to evolve it).

If instead we are trying to match up our decisions with a much simpler heuristic that evolved for things like jumping over ravines, our representation of probability may be very simple indeed, something like “definitely”, “probably”, “maybe”, “probably not”, “definitely not”. (This is essentially my categorical prospect theory, which, like the stochastic overload model, is a half-baked theory that I haven’t published and at this point probably never will.)

2% chance of success is solidly “probably not” (or maybe something even stronger, like “almost definitely not”). Then, outcomes that are in that category are presumably weighted pretty low, because they generally don’t happen. Unless they are really good or really bad, it’s probably safest to ignore them—and in this case, they are neither.

But 87% chance of success is a clear “probably”; and outcomes in that category deserve our attention, even if their stakes aren’t especially high. And in fact, by bundling them, we have even made the stakes a bit higher—likely making the outcome a bit more salient.

The goal is to change “this will never work” to “this is going to work”.

For an individual application, there’s really no way to do that (without self-delusion); maybe you can make the odds a little better than 2%, but you surely can’t make them so high they deserve to go all the way up to “probably”. (At best you might manage a “maybe”, if you’ve got the right contacts or something.)

But for the whole set of 100 applications, this is in fact the correct assessment. It will probably work. And if 100 doesn’t, 150 might; if 150 doesn’t, 200 might. At no point do you need to delude yourself into over-estimating the odds, because the actual odds are in your favor.

This isn’t perfect, though.

There’s a glaring problem with this technique that I still can’t resolve: It feels overwhelming.

Doing one job application is really not that big a deal. It accomplishes very little, but also costs very little.

Doing 100 job applications is an enormous undertaking that will take up most of your time for multiple weeks.

So if you are feeling demotivated, asking you to bundle the stakes is asking you to take on a huge, overwhelming task that surely feels utterly beyond you.

Also, when it comes to this particular example, I even managed to do 100 job applications and still get a pretty bad outcome: My only offer was Edinburgh, and I ended up being miserable there. I have reason to believe that these were exceptional circumstances (due to COVID), but it has still been hard to shake the feeling of helplessness I learned from that ordeal.

Maybe there’s some additional reframing that can help here. If so, I haven’t found it yet.

But maybe stakes bundling can help you, or someone out there, even if it can’t help me.

Statisticacy

Jun 11 JDN 2460107

I wasn’t able to find a dictionary that includes the word “statisticacy”, but it doesn’t trigger my spell-check, and it does seem to have the same form as “numeracy”: numeric, numerical, numeracy, numerate; statistic, statistical, statisticacy, statisticate. It definitely still sounds very odd to my ears. Perhaps repetition will eventually make it familiar.

For the concept is clearly a very important one. Literacy and numeracy are no longer a serious problem in the First World; basically every adult at this point knows how to read and do addition. Even worldwide, 90% of men and 83% of women can read, at least at a basic level—which is an astonishing feat of our civilization by the way, well worthy of celebration.

But I have noticed a disturbing lack of, well, statisticacy. Even intelligent, educated people seem… pretty bad at understanding statistics.

I’m not talking about sophisticated econometrics here; of course most people don’t know that, and don’t need to. (Most economists don’t know that!) I mean quite basic statistical knowledge.

A few years ago I wrote a post called “Statistics you should have been taught in high school, but probably weren’t”; that’s the kind of stuff I’m talking about.

As part of being a good citizen in a modern society, every adult should understand the following:

1. The difference between a mean and a median, and why average income (mean) can increase even though most people are no richer (median).

2. The difference between increasing by X% and increasing by X percentage points: If inflation goes from 4% to 5%, that is an increase of 20% ((5/4-1)*100%), but only 1 percentage point (5%-4%).

3. The meaning of standard error, and how to interpret error bars on a graph—and why it’s a huge red flag if there aren’t any error bars on a graph.

4. Basic probabilistic reasoning: Given some scratch paper, a pen, and a calculator, everyone should be able to work out the odds of drawing a given blackjack hand, or rolling a particular number on a pair of dice. (If that’s too easy, make it a poker hand and four dice. But mostly that’s just more calculation effort, not fundamentally different.)

5. The meaning of exponential growth rates, and how they apply to economic growth and compound interest. (The difference between 3% interest and 6% interest over 30 years is more than double the total amount paid.)

I see people making errors about this sort of thing all the time.

Economic news that celebrates rising GDP but wonders why people aren’t happier (when real median income has been falling since 2019 and is only 7% higher than it was in 1999, an annual growth rate of 0.2%).

Reports on inflation, interest rates, or poll numbers that don’t clearly specify whether they are dealing with percentages or percentage points. (XKCD made fun of this.)

Speaking of poll numbers, any reporting on changes in polls that isn’t at least twice the margin of error of the polls in question. (There’s also a comic for this; this time it’s PhD Comics.)

People misunderstanding interest rates and gravely underestimating how much they’ll pay for their debt (then again, this is probably the result of strategic choices on the part of banks—so maybe the real failure is regulatory).

And, perhaps worst of all, the plague of science news articles about “New study says X”. Things causing and/or cancer, things correlated with personality types, tiny psychological nudges that supposedly have profound effects on behavior.

Some of these things will even turn out to be true; actually I think this one on fibromyalgia, this one on smoking, and this one on body image are probably accurate. But even if it’s a properly randomized experiment—and especially if it’s just a regression analysis—a single study ultimately tells us very little, and it’s irresponsible to report on them instead of telling people the extensive body of established scientific knowledge that most people still aren’t aware of.

Basically any time an article is published saying “New study says X”, a statisticate person should ignore it and treat it as random noise. This is especially true if the finding seems weird or shocking; such findings are far more likely to be random flukes than genuine discoveries. Yes, they could be true, but one study just doesn’t move the needle that much.

I don’t remember where it came from, but there is a saying about this: “What is in the textbooks is 90% true. What is in the published literature is 50% true. What is in the press releases is 90% false.” These figures are approximately correct.

If their goal is to advance public knowledge of science, science journalists would accomplish a lot more if they just opened to a random page in a mainstream science textbook and started reading it on air. Admittedly, I can see how that would be less interesting to watch; but then, their job should be to find a way to make it interesting, not to take individual studies out of context and hype them up far beyond what they deserve. (Bill Nye did this much better than most science journalists.)

I’m not sure how much to blame people for lacking this knowledge. On the one hand, they could easily look it up on Wikipedia, and apparently choose not to. On the other hand, they probably don’t even realize how important it is, and were never properly taught it in school even though they should have been. Many of these things may even be unknown unknowns; people simply don’t realize how poorly they understand. Maybe the most useful thing we could do right now is simply point out to people that these things are important, and if they don’t understand them, they should get on that Wikipedia binge as soon as possible.

And one last thing: Maybe this is asking too much, but I think that a truly statisticate person should be able to solve the Monty Hall Problem and not be confused by the result. (Hint: It’s very important that Monty Hall knows which door the car is behind, and would never open that one. If he’s guessing at random and simply happens to pick a goat, the correct answer is 1/2, not 2/3. Then again, it’s never a bad choice to switch.)

When maximizing utility doesn’t

Jun 4 JDN 2460100

Expected utility theory behaves quite strangely when you consider questions involving mortality.

Nick Beckstead and Teruji Thomas recently published a paper on this: All well-defined utility functions are either reckless in that they make you take crazy risks, or timid in that they tell you not to take even very small risks. It’s starting to make me wonder if utility theory is even the right way to make decisions after all.

Consider a game of Russian roulette where the prize is $1 million. The revolver has 6 chambers, 3 with a bullet. So that’s a 1/2 chance of $1 million, and a 1/2 chance of dying. Should you play?

I think it’s probably a bad idea to play. But the prize does matter; if it were $100 million, or $1 billion, maybe you should play after all. And if it were $10,000, you clearly shouldn’t.

And lest you think that there is no chance of dying you should be willing to accept for any amount of money, consider this: Do you drive a car? Do you cross the street? Do you do anything that could ever have any risk of shortening your lifespan in exchange for some other gain? I don’t see how you could live a remotely normal life without doing so. It might be a very small risk, but it’s still there.

This raises the question: Suppose we have some utility function over wealth; ln(x) is a quite plausible one. What utility should we assign to dying?


The fact that the prize matters means that we can’t assign death a utility of negative infinity. It must be some finite value.

But suppose we choose some value, -V, (so V is positive), for the utility of dying. Then we can find some amount of money that will make you willing to play: ln(x) = V, x = e^(V).

Now, suppose that you have the chance to play this game over and over again. Your marginal utility of wealth will change each time you win, so we may need to increase the prize to keep you playing; but we could do that. The prizes could keep scaling up as needed to make you willing to play. So then, you will keep playing, over and over—and then, sooner or later, you’ll die. So, at each step you maximized utility—but at the end, you didn’t get any utility.

Well, at that point your heirs will be rich, right? So maybe you’re actually okay with that. Maybe there is some amount of money ($1 billion?) that you’d be willing to die in order to ensure your heirs have.

But what if you don’t have any heirs? Or, what if we consider making such a decision as a civilization? What if death means not only the destruction of you, but also the destruction of everything you care about?

As a civilization, are there choices before us that would result in some chance of a glorious, wonderful future, but also some chance of total annihilation? I think it’s pretty clear that there are. Nuclear technology, biotechnology, artificial intelligence. For about the last century, humanity has been at a unique epoch: We are being forced to make this kind of existential decision, to face this kind of existential risk.

It’s not that we were immune to being wiped out before; an asteroid could have taken us out at any time (as happened to the dinosaurs), and a volcanic eruption nearly did. But this is the first time in humanity’s existence that we have had the power to destroy ourselves. This is the first time we have a decision to make about it.

One possible answer would be to say we should never be willing to take any kind of existential risk. Unlike the case of an individual, when we speaking about an entire civilization, it no longer seems obvious that we shouldn’t set the utility of death at negative infinity. But if we really did this, it would require shutting down whole industries—definitely halting all research in AI and biotechnology, probably disarming all nuclear weapons and destroying all their blueprints, and quite possibly even shutting down the coal and oil industries. It would be an utterly radical change, and it would require bearing great costs.

On the other hand, if we should decide that it is sometimes worth the risk, we will need to know when it is worth the risk. We currently don’t know that.

Even worse, we will need some mechanism for ensuring that we don’t take the risk when it isn’t worth it. And we have nothing like such a mechanism. In fact, most of our process of research in AI and biotechnology is widely dispersed, with no central governing authority and regulations that are inconsistent between countries. I think it’s quite apparent that right now, there are research projects going on somewhere in the world that aren’t worth the existential risk they pose for humanity—but the people doing them are convinced that they are worth it because they so greatly advance their national interest—or simply because they could be so very profitable.

In other words, humanity finally has the power to make a decision about our survival, and we’re not doing it. We aren’t making a decision at all. We’re letting that responsibility fall upon more or less randomly-chosen individuals in government and corporate labs around the world. We may be careening toward an abyss, and we don’t even know who has the steering wheel.

Terrible but not likely, likely but not terrible

May 17 JDN 2458985

The human brain is a remarkably awkward machine. It’s really quite bad at organizing data, relying on associations rather than formal categories.

It is particularly bad at negation. For instance, if I tell you that right now, no matter what, you must not think about a yellow submarine, the first thing you will do is think about a yellow submarine. (You may even get the Beatles song stuck in your head, especially now that I’ve mentioned it.) A computer would never make such a grievous error.

The human brain is also quite bad at separation. Daniel Dennett coined a word “deepity” for a particular kind of deep-sounding but ultimately trivial aphorism that seems to be quite common, which relies upon this feature of the brain. A deepity has at least two possible readings: On one reading, it is true, but utterly trivial. On another, it would be profound if true, but it simply isn’t true. But if you experience both at once, your brain is triggered for both “true” and “profound” and yields “profound truth”. The example he likes to use is “Love is just a word”. Well, yes, “love” is in fact just a word, but who cares? Yeah, words are words. But love, the underlying concept it describes, is not just a word—though if it were that would change a lot.

One thing I’ve come to realize about my own anxiety is that it involves a wide variety of different scenarios I imagine in my mind, and broadly speaking these can be sorted into two categories: Those that are likely but not terrible, and those that are terrible but not likely.

In the former category we have things like taking an extra year to finish my dissertation; the mean time to completion for a PhD is over 8 years, so finishing in 6 instead of 5 can hardly be considered catastrophic.

In the latter category we have things like dying from COVID-19. Yes, I’m a male with type A blood and asthma living in a high-risk county; but I’m also a young, healthy nonsmoker living under lockdown. Even without knowing the true fatality rate of the virus, my chances of actually dying from it are surely less than 1%.

But when both of those scenarios are running through my brain at the same time, the first triggers a reaction for “likely” and the second triggers a reaction for “terrible”, and I get this feeling that something terrible is actually likely to happen. And indeed if my probability of dying were as high as my probability of needing a 6th year to finish my PhD, that would be catastrophic.

I suppose it’s a bit strange that the opposite doesn’t happen: I never seem to get the improbability of dying attached to the mildness of needing an extra year. The confusion never seems to trigger “neither terrible nor likely”. Or perhaps it does, and my brain immediately disregards that as not worthy of consideration? It makes a certain sort of sense: An event that is neither probable nor severe doesn’t seem to merit much anxiety.

I suspect that many other people’s brains work the same way, eliding distinctions between different outcomes and ending up with a sort of maximal product of probability and severity.
The solution to this is not an easy one: It requires deliberate effort and extensive practice, and benefits greatly from formal training by a therapist. Counter-intuitively, you need to actually focus more on the scenarios that cause you anxiety, and accept the anxiety that such focus triggers in you. I find that it helps to actually write down the details of each scenario as vividly as possible, and review what I have written later. After doing this enough times, you can build up a greater separation in your mind, and more clearly categorize—this one is likely but not terrible, that one is terrible but not likely. It isn’t a cure, but it definitely helps me a great deal. Perhaps it could help you.

Pascal’s Mugging

Nov 10 JDN 2458798

In the Singularitarian community there is a paradox known as “Pascal’s Mugging”. The name is an intentional reference to Pascal’s Wager (and the link is quite apt, for reasons I’ll discuss in a later post.)

There are a few different versions of the argument; Yudkowsky’s original argument in which he came up with the name “Pascal’s Mugging” relies upon the concept of the universe as a simulation and an understanding of esoteric mathematical notation. So here is a more intuitive version:

A strange man in a dark hood comes up to you on the street. “Give me five dollars,” he says, “or I will destroy an entire planet filled with ten billion innocent people. I cannot prove to you that I have this power, but how much is an innocent life worth to you? Even if it is as little as $5,000, are you really willing to bet on ten trillion to one odds that I am lying?”

Do you give him the five dollars? I suspect that you do not. Indeed, I suspect that you’d be less likely to give him the five dollars than if he had merely said he was homeless and asked for five dollars to help pay for food. (Also, you may have objected that you value innocent lives, even faraway strangers you’ll never meet, at more than $5,000 each—but if that’s the case, you should probably be donating more, because the world’s best charities can save a live for about $3,000.)

But therein lies the paradox: Are you really willing to bet on ten trillion to one odds?

This argument gives me much the same feeling as the Ontological Argument; as Russell said of the latter, “it is much easier to be persuaded that ontological arguments are no good than it is to say exactly what is wrong with them.” It wasn’t until I read this post on GiveWell that I could really formulate the answer clearly enough to explain it.

The apparent force of Pascal’s Mugging comes from the idea of expected utility: Even if the probability of an event is very small, if it has a sufficiently great impact, the expected utility can still be large.

The problem with this argument is that extraordinary claims require extraordinary evidence. If a man held a gun to your head and said he’d shoot you if you didn’t give him five dollars, you’d give him five dollars. This is a plausible claim and he has provided ample evidence. If he were instead wearing a bomb vest (or even just really puffy clothing that could conceal a bomb vest), and he threatened to blow up a building unless you gave him five dollars, you’d probably do the same. This is less plausible (what kind of terrorist only demands five dollars?), but it’s not worth taking the chance.

But when he claims to have a Death Star parked in orbit of some distant planet, primed to make another Alderaan, you are right to be extremely skeptical. And if he claims to be a being from beyond our universe, primed to destroy so many lives that we couldn’t even write the number down with all the atoms in our universe (which was actually Yudkowsky’s original argument), to say that you are extremely skeptical seems a grievous understatement.

That GiveWell post provides a way to make this intuition mathematically precise in terms of Bayesian logic. If you have a normal prior with mean 0 and standard deviation 1, and you are presented with a likelihood with mean X and standard deviation X, what should you make your posterior distribution?

Normal priors are quite convenient; they conjugate nicely. The precision (inverse variance) of the posterior distribution is the sum of the two precisions, and the mean is a weighted average of the two means, weighted by their precision.

So the posterior variance is 1/(1 + 1/X^2).

The posterior mean is 1/(1+1/X^2)*(0) + (1/X^2)/(1+1/X^2)*(X) = X/(X^2+1).

That is, the mean of the posterior distribution is just barely higher than zero—and in fact, it is decreasing in X, if X > 1.

For those who don’t speak Bayesian: If someone says he’s going to have an effect of magnitude X, you should be less likely to believe him the larger that X is. And indeed this is precisely what our intuition said before: If he says he’s going to kill one person, believe him. If he says he’s going to destroy a planet, don’t believe him, unless he provides some really extraordinary evidence.

What sort of extraordinary evidence? To his credit, Yudkowsky imagined the sort of evidence that might actually be convincing:

If a poorly-dressed street person offers to save 10(10^100) lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10-(10^100) probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky.

This post he called “Pascal’s Muggle”, after the term from the Harry Potter series, since some of the solutions that had been proposed for dealing with Pascal’s Mugging had resulted in a situation almost as absurd, in which the mugger could exhibit powers beyond our imagining and yet nevertheless we’d never have sufficient evidence to believe him.

So, let me go on record as saying this: Yes, if someone snaps his fingers and causes the sky to rip open and reveal a silhouette of himself, I’ll do whatever that person says. The odds are still higher that I’m dreaming or hallucinating than that this is really a being from beyond our universe, but if I’m dreaming, it makes no difference, and if someone can make me hallucinate that vividly he can probably cajole the money out of me in other ways. And there might be just enough chance that this could be real that I’m willing to give up that five bucks.

These seem like really strange thought experiments, because they are. But like many good thought experiments, they can provide us with some important insights. In this case, I think they are telling us something about the way human reasoning can fail when faced with impacts beyond our normal experience: We are in danger of both over-estimating and under-estimating their effects, because our brains aren’t equipped to deal with magnitudes and probabilities on that scale. This has made me realize something rather important about both Singularitarianism and religion, but I’ll save that for next week’s post.

Why are humans so bad with probability?

Apr 29 JDN 2458238

In previous posts on deviations from expected utility and cumulative prospect theory, I’ve detailed some of the myriad ways in which human beings deviate from optimal rational behavior when it comes to probability.

This post is going to be a bit different: Yes, we behave irrationally when it comes to probability. Why?

Why aren’t we optimal expected utility maximizers?
This question is not as simple as it sounds. Some of the ways that human beings deviate from neoclassical behavior are simply because neoclassical theory requires levels of knowledge and intelligence far beyond what human beings are capable of; basically anything requiring “perfect information” qualifies, as does any game theory prediction that involves solving extensive-form games with infinite strategy spaces by backward induction. (Don’t feel bad if you have no idea what that means; that’s kind of my point. Solving infinite extensive-form games by backward induction is an unsolved problem in game theory; just this past week I saw a new paper presented that offered a partial potential solutionand yet we expect people to do it optimally every time?)

I’m also not going to include questions of fundamental uncertainty, like “Will Apple stock rise or fall tomorrow?” or “Will the US go to war with North Korea in the next ten years?” where it isn’t even clear how we would assign a probability. (Though I will get back to them, for reasons that will become clear.)

No, let’s just look at the absolute simplest cases, where the probabilities are all well-defined and completely transparent: Lotteries and casino games. Why are we so bad at that?

Lotteries are not a computationally complex problem. You figure out how much the prize is worth to you, multiply it by the probability of winning—which is clearly spelled out for you—and compare that to how much the ticket price is worth to you. The most challenging part lies in specifying your marginal utility of wealth—the “how much it’s worth to you” part—but that’s something you basically had to do anyway, to make any kind of trade-offs on how to spend your time and money. Maybe you didn’t need to compute it quite so precisely over that particular range of parameters, but you need at least some idea how much $1 versus $10,000 is worth to you in order to get by in a market economy.

Casino games are a bit more complicated, but not much, and most of the work has been done for you; you can look on the Internet and find tables of probability calculations for poker, blackjack, roulette, craps and more. Memorizing all those probabilities might take some doing, but human memory is astonishingly capacious, and part of being an expert card player, especially in blackjack, seems to involve memorizing a lot of those probabilities.

Furthermore, by any plausible expected utility calculation, lotteries and casino games are a bad deal. Unless you’re an expert poker player or blackjack card-counter, your expected income from playing at a casino is always negative—and the casino set it up that way on purpose.

Why, then, can lotteries and casinos stay in business? Why are we so bad at such a simple problem?

Clearly we are using some sort of heuristic judgment in order to save computing power, and the people who make lotteries and casinos have designed formal models that can exploit those heuristics to pump money from us. (Shame on them, really; I don’t fully understand why this sort of thing is legal.)

In another previous post I proposed what I call “categorical prospect theory”, which I think is a decently accurate description of the heuristics people use when assessing probability (though I’ve not yet had the chance to test it experimentally).

But why use this particular heuristic? Indeed, why use a heuristic at all for such a simple problem?

I think it’s helpful to keep in mind that these simple problems are weird; they are absolutely not the sort of thing a tribe of hunter-gatherers is likely to encounter on the savannah. It doesn’t make sense for our brains to be optimized to solve poker or roulette.

The sort of problems that our ancestors encountered—indeed, the sort of problems that we encounter, most of the time—were not problems of calculable probability risk; they were problems of fundamental uncertainty. And they were frequently matters of life or death (which is why we’d expect them to be highly evolutionarily optimized): “Was that sound a lion, or just the wind?” “Is this mushroom safe to eat?” “Is that meat spoiled?”

In fact, many of the uncertainties most important to our ancestors are still important today: “Will these new strangers be friendly, or dangerous?” “Is that person attracted to me, or am I just projecting my own feelings?” “Can I trust you to keep your promise?” These sorts of social uncertainties are even deeper; it’s not clear that any finite being could ever totally resolve its uncertainty surrounding the behavior of other beings with the same level of intelligence, as the cognitive arms race continues indefinitely. The better I understand you, the better you understand me—and if you’re trying to deceive me, as I get better at detecting deception, you’ll get better at deceiving.

Personally, I think that it was precisely this sort of feedback loop that resulting in human beings getting such ridiculously huge brains in the first place. Chimpanzees are pretty good at dealing with the natural environment, maybe even better than we are; but even young children can outsmart them in social tasks any day. And once you start evolving for social cognition, it’s very hard to stop; basically you need to be constrained by something very fundamental, like, say, maximum caloric intake or the shape of the birth canal. Where chimpanzees look like their brains were what we call an “interior solution”, where evolution optimized toward a particular balance between cost and benefit, human brains look more like a “corner solution”, where the evolutionary pressure was entirely in one direction until we hit up against a hard constraint. That’s exactly what one would expect to happen if we were caught in a cognitive arms race.

What sort of heuristic makes sense for dealing with fundamental uncertainty—as opposed to precisely calculable probability? Well, you don’t want to compute a utility function and multiply by it, because that adds all sorts of extra computation and you have no idea what probability to assign. But you’ve got to do something like that in some sense, because that really is the optimal way to respond.

So here’s a heuristic you might try: Separate events into some broad categories based on how frequently they seem to occur, and what sort of response would be necessary.

Some things, like the sun rising each morning, seem to always happen. So you should act as if those things are going to happen pretty much always, because they do happen… pretty much always.

Other things, like rain, seem to happen frequently but not always. So you should look for signs that those things might happen, and prepare for them when the signs point in that direction.

Still other things, like being attacked by lions, happen very rarely, but are a really big deal when they do. You can’t go around expecting those to happen all the time, that would be crazy; but you need to be vigilant, and if you see any sign that they might be happening, even if you’re pretty sure they’re not, you may need to respond as if they were actually happening, just in case. The cost of a false positive is much lower than the cost of a false negative.

And still other things, like people sprouting wings and flying, never seem to happen. So you should act as if those things are never going to happen, and you don’t have to worry about them.

This heuristic is quite simple to apply once set up: It can simply slot in memories of when things did and didn’t happen in order to decide which category they go in—i.e. availability heuristic. If you can remember a lot of examples of “almost never”, maybe you should move it to “unlikely” instead. If you get a really big number of examples, you might even want to move it all the way to “likely”.

Another large advantage of this heuristic is that by combining utility and probability into one metric—we might call it “importance”, though Bayesian econometricians might complain about that—we can save on memory space and computing power. I don’t need to separately compute a utility and a probability; I just need to figure out how much effort I should put into dealing with this situation. A high probability of a small cost and a low probability of a large cost may be equally worth my time.

How might these heuristics go wrong? Well, if your environment changes sufficiently, the probabilities could shift and what seemed certain no longer is. For most of human history, “people walking on the Moon” would seem about as plausible as sprouting wings and flying away, and yet it has happened. Being attacked by lions is now exceedingly rare except in very specific places, but we still harbor a certain awe and fear before lions. And of course availability heuristic can be greatly distorted by mass media, which makes people feel like terrorist attacks and nuclear meltdowns are common and deaths by car accidents and influenza are rare—when exactly the opposite is true.

How many categories should you set, and what frequencies should they be associated with? This part I’m still struggling with, and it’s an important piece of the puzzle I will need before I can take this theory to experiment. There is probably a trade-off between more categories giving you more precision in tailoring your optimal behavior, but costing more cognitive resources to maintain. Is the optimal number 3? 4? 7? 10? I really don’t know. Even I could specify the number of categories, I’d still need to figure out precisely what categories to assign.

Experimentally testing categorical prospect theory

Dec 4, JDN 2457727

In last week’s post I presented a new theory of probability judgments, which doesn’t rely upon people performing complicated math even subconsciously. Instead, I hypothesize that people try to assign categories to their subjective probabilities, and throw away all the information that wasn’t used to assign that category.

The way to most clearly distinguish this from cumulative prospect theory is to show discontinuity. Kahneman’s smooth, continuous function places fairly strong bounds on just how much a shift from 0% to 0.000001% can really affect your behavior. In particular, if you want to explain the fact that people do seem to behave differently around 10% compared to 1% probabilities, you can’t allow the slope of the smooth function to get much higher than 10 at any point, even near 0 and 1. (It does depend on the precise form of the function, but the more complicated you make it, the more free parameters you add to the model. In the most parsimonious form, which is a cubic polynomial, the maximum slope is actually much smaller than this—only 2.)

If that’s the case, then switching from 0.% to 0.0001% should have no more effect in reality than a switch from 0% to 0.00001% would to a rational expected utility optimizer. But in fact I think I can set up scenarios where it would have a larger effect than a switch from 0.001% to 0.01%.

Indeed, these games are already quite profitable for the majority of US states, and they are called lotteries.

Rationally, it should make very little difference to you whether your odds of winning the Powerball are 0 (you bought no ticket) or 0.000000001% (you bought a ticket), even when the prize is $100 million. This is because your utility of $100 million is nowhere near 100 million times as large as your marginal utility of $1. A good guess would be that your lifetime income is about $2 million, your utility is logarithmic, the units of utility are hectoQALY, and the baseline level is about 100,000.

I apologize for the extremely large number of decimals, but I had to do that in order to show any difference at all. I have bolded where the decimals first deviate from the baseline.

Your utility if you don’t have a ticket is ln(20) = 2.9957322736 hQALY.

Your utility if you have a ticket is (1-10^-9) ln(20) + 10^-9 ln(1020) = 2.9957322775 hQALY.

You gain a whopping 40 microQALY over your whole lifetime. I highly doubt you could even perceive such a difference.

And yet, people are willing to pay nontrivial sums for the chance to play such lotteries. Powerball tickets sell for about $2 each, and some people buy tickets every week. If you do that and live to be 80, you will spend some $8,000 on lottery tickets during your lifetime, which results in this expected utility: (1-4*10^-6) ln(20-0.08) + 4*10^-6 ln(1020) = 2.9917399955 hQALY.
You have now sacrificed 0.004 hectoQALY, which is to say 0.4 QALY—that’s months of happiness you’ve given up to play this stupid pointless game.

Which shouldn’t be surprising, as (with 99.9996% probability) you have given up four months of your lifetime income with nothing to show for it. Lifetime income of $2 million / lifespan of 80 years = $25,000 per year; $8,000 / $25,000 = 0.32. You’ve actually sacrificed slightly more than this, which comes from your risk aversion.

Why would anyone do such a thing? Because while the difference between 0 and 10^-9 may be trivial, the difference between “impossible” and “almost impossible” feels enormous. “You can’t win if you don’t play!” they say, but they might as well say “You can’t win if you do play either.” Indeed, the probability of winning without playing isn’t zero; you could find a winning ticket lying on the ground, or win due to an error that is then upheld in court, or be given the winnings bequeathed by a dying family member or gifted by an anonymous donor. These are of course vanishingly unlikely—but so was winning in the first place. You’re talking about the difference between 10^-9 and 10^-12, which in proportional terms sounds like a lot—but in absolute terms is nothing. If you drive to a drug store every week to buy a ticket, you are more likely to die in a car accident on the way to the drug store than you are to win the lottery.

Of course, these are not experimental conditions. So I need to devise a similar game, with smaller stakes but still large enough for people’s brains to care about the “almost impossible” category; maybe thousands? It’s not uncommon for an economics experiment to cost thousands, it’s just usually paid out to many people instead of randomly to one person or nobody. Conducting the experiment in an underdeveloped country like India would also effectively amplify the amounts paid, but at the fixed cost of transporting the research team to India.

But I think in general terms the experiment could look something like this. You are given $20 for participating in the experiment (we treat it as already given to you, to maximize your loss aversion and endowment effect and thereby give us more bang for our buck). You then have a chance to play a game, where you pay $X to get a P probability of $Y*X, and we vary these numbers.

The actual participants wouldn’t see the variables, just the numbers and possibly the rules: “You can pay $2 for a 1% chance of winning $200. You can also play multiple times if you wish.” “You can pay $10 for a 5% chance of winning $250. You can only play once or not at all.”

So I think the first step is to find some dilemmas, cases where people feel ambivalent, and different people differ in their choices. That’s a good role for a pilot study.

Then we take these dilemmas and start varying their probabilities slightly.

In particular, we try to vary them at the edge of where people have mental categories. If subjective probability is continuous, a slight change in actual probability should never result in a large change in behavior, and furthermore the effect of a change shouldn’t vary too much depending on where the change starts.

But if subjective probability is categorical, these categories should have edges. Then, when I present you with two dilemmas that are on opposite sides of one of the edges, your behavior should radically shift; while if I change it in a different way, I can make a large change without changing the result.

Based solely on my own intuition, I guessed that the categories roughly follow this pattern:

Impossible: 0%

Almost impossible: 0.1%

Very unlikely: 1%

Unlikely: 10%

Fairly unlikely: 20%

Roughly even odds: 50%

Fairly likely: 80%

Likely: 90%

Very likely: 99%

Almost certain: 99.9%

Certain: 100%

So for example, if I switch from 0%% to 0.01%, it should have a very large effect, because I’ve moved you out of your “impossible” category (indeed, I think the “impossible” category is almost completely sharp; literally anything above zero seems to be enough for most people, even 10^-9 or 10^-10). But if I move from 1% to 2%, it should have a small effect, because I’m still well within the “very unlikely” category. Yet the latter change is literally one hundred times larger than the former. It is possible to define continuous functions that would behave this way to an arbitrary level of approximation—but they get a lot less parsimonious very fast.

Now, immediately I run into a problem, because I’m not even sure those are my categories, much less that they are everyone else’s. If I knew precisely which categories to look for, I could tell whether or not I had found it. But the process of both finding the categories and determining if their edges are truly sharp is much more complicated, and requires a lot more statistical degrees of freedom to get beyond the noise.

One thing I’m considering is assigning these values as a prior, and then conducting a series of experiments which would adjust that prior. In effect I would be using optimal Bayesian probability reasoning to show that human beings do not use optimal Bayesian probability reasoning. Still, I think that actually pinning down the categories would require a large number of participants or a long series of experiments (in frequentist statistics this distinction is vital; in Bayesian statistics it is basically irrelevant—one of the simplest reasons to be Bayesian is that it no longer bothers you whether someone did 2 experiments of 100 people or 1 experiment of 200 people, provided they were the same experiment of course). And of course there’s always the possibility that my theory is totally off-base, and I find nothing; a dissertation replicating cumulative prospect theory is a lot less exciting (and, sadly, less publishable) than one refuting it.

Still, I think something like this is worth exploring. I highly doubt that people are doing very much math when they make most probabilistic judgments, and using categories would provide a very good way for people to make judgments usefully with no math at all.