Pascal’s Mugging

Nov 10 JDN 2458798

In the Singularitarian community there is a paradox known as “Pascal’s Mugging”. The name is an intentional reference to Pascal’s Wager (and the link is quite apt, for reasons I’ll discuss in a later post.)

There are a few different versions of the argument; Yudkowsky’s original argument in which he came up with the name “Pascal’s Mugging” relies upon the concept of the universe as a simulation and an understanding of esoteric mathematical notation. So here is a more intuitive version:

A strange man in a dark hood comes up to you on the street. “Give me five dollars,” he says, “or I will destroy an entire planet filled with ten billion innocent people. I cannot prove to you that I have this power, but how much is an innocent life worth to you? Even if it is as little as $5,000, are you really willing to bet on ten trillion to one odds that I am lying?”

Do you give him the five dollars? I suspect that you do not. Indeed, I suspect that you’d be less likely to give him the five dollars than if he had merely said he was homeless and asked for five dollars to help pay for food. (Also, you may have objected that you value innocent lives, even faraway strangers you’ll never meet, at more than $5,000 each—but if that’s the case, you should probably be donating more, because the world’s best charities can save a live for about $3,000.)

But therein lies the paradox: Are you really willing to bet on ten trillion to one odds?

This argument gives me much the same feeling as the Ontological Argument; as Russell said of the latter, “it is much easier to be persuaded that ontological arguments are no good than it is to say exactly what is wrong with them.” It wasn’t until I read this post on GiveWell that I could really formulate the answer clearly enough to explain it.

The apparent force of Pascal’s Mugging comes from the idea of expected utility: Even if the probability of an event is very small, if it has a sufficiently great impact, the expected utility can still be large.

The problem with this argument is that extraordinary claims require extraordinary evidence. If a man held a gun to your head and said he’d shoot you if you didn’t give him five dollars, you’d give him five dollars. This is a plausible claim and he has provided ample evidence. If he were instead wearing a bomb vest (or even just really puffy clothing that could conceal a bomb vest), and he threatened to blow up a building unless you gave him five dollars, you’d probably do the same. This is less plausible (what kind of terrorist only demands five dollars?), but it’s not worth taking the chance.

But when he claims to have a Death Star parked in orbit of some distant planet, primed to make another Alderaan, you are right to be extremely skeptical. And if he claims to be a being from beyond our universe, primed to destroy so many lives that we couldn’t even write the number down with all the atoms in our universe (which was actually Yudkowsky’s original argument), to say that you are extremely skeptical seems a grievous understatement.

That GiveWell post provides a way to make this intuition mathematically precise in terms of Bayesian logic. If you have a normal prior with mean 0 and standard deviation 1, and you are presented with a likelihood with mean X and standard deviation X, what should you make your posterior distribution?

Normal priors are quite convenient; they conjugate nicely. The precision (inverse variance) of the posterior distribution is the sum of the two precisions, and the mean is a weighted average of the two means, weighted by their precision.

So the posterior variance is 1/(1 + 1/X^2).

The posterior mean is 1/(1+1/X^2)*(0) + (1/X^2)/(1+1/X^2)*(X) = X/(X^2+1).

That is, the mean of the posterior distribution is just barely higher than zero—and in fact, it is decreasing in X, if X > 1.

For those who don’t speak Bayesian: If someone says he’s going to have an effect of magnitude X, you should be less likely to believe him the larger that X is. And indeed this is precisely what our intuition said before: If he says he’s going to kill one person, believe him. If he says he’s going to destroy a planet, don’t believe him, unless he provides some really extraordinary evidence.

What sort of extraordinary evidence? To his credit, Yudkowsky imagined the sort of evidence that might actually be convincing:

If a poorly-dressed street person offers to save 10(10^100) lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10-(10^100) probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky.

This post he called “Pascal’s Muggle”, after the term from the Harry Potter series, since some of the solutions that had been proposed for dealing with Pascal’s Mugging had resulted in a situation almost as absurd, in which the mugger could exhibit powers beyond our imagining and yet nevertheless we’d never have sufficient evidence to believe him.

So, let me go on record as saying this: Yes, if someone snaps his fingers and causes the sky to rip open and reveal a silhouette of himself, I’ll do whatever that person says. The odds are still higher that I’m dreaming or hallucinating than that this is really a being from beyond our universe, but if I’m dreaming, it makes no difference, and if someone can make me hallucinate that vividly he can probably cajole the money out of me in other ways. And there might be just enough chance that this could be real that I’m willing to give up that five bucks.

These seem like really strange thought experiments, because they are. But like many good thought experiments, they can provide us with some important insights. In this case, I think they are telling us something about the way human reasoning can fail when faced with impacts beyond our normal experience: We are in danger of both over-estimating and under-estimating their effects, because our brains aren’t equipped to deal with magnitudes and probabilities on that scale. This has made me realize something rather important about both Singularitarianism and religion, but I’ll save that for next week’s post.

The “market for love” is a bad metaphor

Feb 14 JDN 2458529

Valentine’s Day was this past week, so let’s talk a bit about love.

Economists would never be accused of being excessively romantic. To most neoclassical economists, just about everything is a market transaction. Love is no exception.

There are all sorts of articles and books and an even larger number of research papers going back multiple decades and continuing all the way through until today using the metaphor of the “marriage market”.

In a few places, marriage does actually function something like a market: In China, there are places where your parents will hire brokers and matchmakers to select a spouse for you. But even this isn’t really a market for love or marriage. It’s a market for matchmaking services. The high-tech version of this is dating sites like OkCupid.
And of course sex work actually occurs on markets; there is buying and selling of services at monetary prices. There is of course a great deal worth saying on that subject, but it’s not my topic for today.

But in general, love is really nothing like a market. First of all, there is no price. This alone should be sufficient reason to say that we’re not actually dealing with a market. The whole mechanism that makes a market a market is the use of prices to achieve equilibrium between supply and demand.

A price doesn’t necessarily have to be monetary; you can barter apples for bananas, or trade in one used video game for another, and we can still legitimately call that a market transaction with a price.

But love isn’t like that either. If your relationship with someone is so transactional that you’re actually keeping a ledger of each thing they do for you and each thing you do for them so that you could compute a price for services, that isn’t love. It’s not even friendship. If you really care about someone, you set such calculations aside. You view their interests and yours as in some sense shared, aligned toward common goals. You stop thinking in terms of “me” and “you” and start thinking in terms of “us”. You don’t think “I’ll scratch your back if you scratch mine.” You think “We’re scratching each other’s backs today.”

This is of course not to say that love never involves conflict. On the contrary, love always involves conflict. Successful relationships aren’t those where conflict never happens, they are those where conflict is effectively and responsibly resolved. Your interests and your loved ones’ are never completely aligned; there will always be some residual disagreement. But the key is to realize that your interests are still mostly aligned; those small vectors of disagreement should be outweighed by the much larger vector of your relationship.

And of course, there can come a time when that is no longer the case. Obviously, there is domestic abuse, which should absolutely be a deal-breaker for anyone. But there are other reasons why you may find that a relationship ultimately isn’t working, that your interests just aren’t as aligned as you thought they were. Eventually those disagreement vectors just get too large to cancel out. This is painful, but unavoidable. But if you reach the point where you are keeping track of actions on a ledger, that relationship is already dead. Sooner or later, someone is going to have to pull the plug.

Very little of what I’ve said in the preceding paragraphs is likely to be controversial. Why, then, would economists think that it makes sense to treat love as a market?

I think this comes down to a motte and bailey doctrine. A more detailed explanation can be found at that link, but the basic idea of a motte and bailey is this: You have a core set of propositions that is highly defensible but not that interesting (the “motte”), and a broader set of propositions that are very interesting, but not as defensible (the “bailey”). The terms are related to a medieval defensive strategy, in which there was a small, heavily fortified tower called a motte, surrounded by fertile, useful land, the bailey. The bailey is where you actually want to live, but it’s hard to defend; so if the need arises, you can pull everyone back into the motte to fight off attacks. But nobody wants to live in the motte; it’s just a cramped stone tower. There’s nothing to eat or enjoy there.

The motte comprised of ideas that almost everyone agrees with. The bailey is the real point of contention, the thing you are trying to argue for—which, by construction, other people must not already agree with.

Here are some examples, which I have intentionally chosen from groups I agree with:

Feminism can be a motte and bailey doctrine. The motte is “women are people”; the bailey is abortion rights, affirmative consent and equal pay legislation.

Rationalism can be a motte and bailey doctrine. The motte is “rationality is good”; the bailey is atheism, transhumanism, and Bayesian statistics.

Anti-fascism can be a motte and bailey doctrine. The motte is “fascists are bad”; the bailey is black bloc Antifa and punching Nazis.

Even democracy can be a motte and bailey doctrine. The motte is “people should vote for their leaders”; my personal bailey is abolition of the Electoral College, a younger voting age, and range voting.

Using a motte and bailey doctrine does not necessarily make you wrong. But it’s something to be careful about, because as a strategy it can be disingenuous. Even if you think that the propositions in the bailey all follow logically from the propositions in the motte, the people you’re talking to may not think so, and in fact you could simply be wrong. At the very least, you should be taking the time to explain how one follows from the other; and really, you should consider whether the connection is actually as tight as you thought, or if perhaps one can believe that rationality is good without being Bayesian or believe that women are people without supporting abortion rights.

I think when economists describe love or marriage as a “market”, they are applying a motte and bailey doctrine. They may actually be doing something even worse than that, by equivocating on the meaning of “market”. But even if any given economist uses the word “market” totally consistently, the fact that different economists of the same broad political alignment use the word differently adds up to a motte and bailey doctrine.

The doctrine is this: “There have always been markets.”

The motte is something like this: “Humans have always engaged in interaction for mutual benefit.”

This is undeniably true. In fact, it’s not even uninteresting. As mottes go, it’s a pretty nice one; it’s worth spending some time there. In the endless quest for an elusive “human nature”, I think you could do worse than to focus on our universal tendency to engage in interaction for mutual benefit. (Don’t other species do it too? Yes, but that’s just it—they are precisely the ones that seem most human.)

And if you want to define any mutually-beneficial interaction as a “market trade”, I guess it’s your right to do that. I think this is foolish and confusing, but legislating language has always been a fool’s errand.

But of course the more standard meaning of the word “market” implies buyers and sellers exchanging goods and services for monetary prices. You can extend it a little to include bartering, various forms of financial intermediation, and the like; but basically you’re still buying and selling.

That makes this the bailey: “Humans have always engaged in buying and selling of goods and services at prices.”

And that, dear readers, is ahistorical nonsense. We’ve only been using money for a few thousand years, and it wasn’t until the Industrial Revolution that we actually started getting the majority of our goods and services via market trades. Economists like to tell a story where bartering preceded the invention of money, but there’s basically no evidence of that. Bartering seems to be what people do when they know how money works but don’t have any money to work with.

Before there was money, there were fundamentally different modes of interaction: Sharing, ritual, debts of honor, common property, and, yes, love.

These were not markets. They perhaps shared some very broad features of markets—such as the interaction for mutual benefit—but they lacked the defining attributes that make a market a market.

Why is this important? Because this doctrine is used to transform more and more of our lives into actual markets, on the grounds that they were already “markets”, and we’re just using “more efficient” kinds of markets. But in fact what’s happening is we are trading one fundamental mode of human interaction for another: Where we used to rely upon norms or trust or mutual affection, we instead rely upon buying and selling at prices.

In some cases, this actually is a good thing: Markets can be very powerful, and are often our best tool when we really need something done. In particular, it’s clear at this point that norms and trust are not sufficient to protect us against climate change. All the “Reduce, Reuse, Recycle” PSAs in the world won’t do as much as a carbon tax. When millions of lives are at stake, we can’t trust people to do the right thing; we need to twist their arms however we can.

But markets are in some sense a brute-force last-resort solution; they commodify and alienate (Marx wasn’t wrong about that), and despite our greatly elevated standard of living, the alienation and competitive pressure of markets seem to be keeping most of us from really achieving happiness.

This is why it’s extremely dangerous to talk about a “market for love”. Love is perhaps the last bastion of our lives that has not been commodified into a true market, and if it goes, we’ll have nothing left. If sexual relationships built on mutual affection were to disappear in favor of apps that will summon a prostitute or a sex robot at the push of a button, I would count that as a great loss for human civilization. (How we should regulate prostitution or sex robots are a different question, which I said I’d leave aside for this post.) A “market for love” is in fact a world with no love at all.

The sausage of statistics being made

 

Nov 11 JDN 2458434

“Laws, like sausages, cease to inspire respect in proportion as we know how they are made.”

~ John Godfrey Saxe, not Otto von Bismark

Statistics are a bit like laws and sausages. There are a lot of things in statistical practice that don’t align with statistical theory. The most obvious examples are the fact that many results in statistics are asymptotic: they only strictly apply for infinitely large samples, and in any finite sample they will be some sort of approximation (we often don’t even know how good an approximation).

But the problem runs deeper than this: The whole idea of a p-value was originally supposed to be used to assess one single hypothesis that is the only one you test in your entire study.

That’s frankly a ludicrous expectation: Why would you write a whole paper just to test one parameter?

This is why I don’t actually think this so-called multiple comparisons problem is a problem with researchers doing too many hypothesis tests; I think it’s a problem with statisticians being fundamentally unreasonable about what statistics is useful for. We have to do multiple comparisons, so you should be telling us how to do it correctly.

Statisticians have this beautiful pure mathematics that generates all these lovely asymptotic results… and then they stop, as if they were done. But we aren’t dealing with infinite or even “sufficiently large” samples; we need to know what happens when your sample is 100, not when your sample is 10^29. We can’t assume that our variables are independently identically distributed; we don’t know their distribution, and we’re pretty sure they’re going to be somewhat dependent.

Even in an experimental context where we can randomly and independently assign some treatments, we can’t do that with lots of variables that are likely to matter, like age, gender, nationality, or field of study. And applied econometricians are in an even tighter bind; they often can’t randomize anything. They have to rely upon “instrumental variables” that they hope are “close enough to randomized” relative to whatever they want to study.

In practice what we tend to do is… fudge it. We use the formal statistical methods, and then we step back and apply a series of informal norms to see if the result actually makes sense to us. This is why almost no psychologists were actually convinced by Daryl Bem’s precognition experiments, despite his standard experimental methodology and perfect p < 0.05 results; he couldn’t pass any of the informal tests, particularly the most basic one of not violating any known fundamental laws of physics. We knew he had somehow cherry-picked the data, even before looking at it; nothing else was possible.

This is actually part of where the “hierarchy of sciences” notion is useful: One of the norms is that you’re not allowed to break the rules of the sciences above you, but you can break the rules of the sciences below you. So psychology has to obey physics, but physics doesn’t have to obey psychology. I think this is also part of why there’s so much enmity between economists and anthropologists; really we should be on the same level, cognizant of each other’s rules, but economists want to be above anthropologists so we can ignore culture, and anthropologists want to be above economists so they can ignore incentives.

Another informal norm is the “robustness check”, in which the researcher runs a dozen different regressions approaching the same basic question from different angles. “What if we control for this? What if we interact those two variables? What if we use a different instrument?” In terms of statistical theory, this doesn’t actually make a lot of sense; the probability distributions f(y|x) of y conditional on x and f(y|x, z) of y conditional on x and z are not the same thing, and wouldn’t in general be closely tied, depending on the distribution f(x|z) of x conditional on z. But in practice, most real-world phenomena are going to continue to show up even as you run a bunch of different regressions, and so we can be more confident that something is a real phenomenon insofar as that happens. If an effect drops out when you switch out a couple of control variables, it may have been a statistical artifact. But if it keeps appearing no matter what you do to try to make it go away, then it’s probably a real thing.

Because of the powerful career incentives toward publication and the strange obsession among journals with a p-value less than 0.05, another norm has emerged: Don’t actually trust p-values that are close to 0.05. The vast majority of the time, a p-value of 0.047 was the result of publication bias. Now if you see a p-value of 0.001, maybe then you can trust it—but you’re still relying on a lot of assumptions even then. I’ve seen some researchers argue that because of this, we should tighten our standards for publication to something like p < 0.01, but that’s missing the point; what we need to do is stop publishing based on p-values. If you tighten the threshold, you’re just going to get more rejected papers and then the few papers that do get published will now have even smaller p-values that are still utterly meaningless.

These informal norms protect us from the worst outcomes of bad research. But they are almost certainly not optimal. It’s all very vague and informal, and different researchers will often disagree vehemently over whether a given interpretation is valid. What we need are formal methods for solving these problems, so that we can have the objectivity and replicability that formal methods provide. Right now, our existing formal tools simply are not up to that task.

There are some things we may never be able to formalize: If we had a formal algorithm for coming up with good ideas, the AIs would already rule the world, and this would be either Terminator or The Culture depending on whether we designed the AIs correctly. But I think we should at least be able to formalize the basic question of “Is this statement likely to be true?” that is the fundamental motivation behind statistical hypothesis testing.

I think the answer is likely to be in a broad sense Bayesian, but Bayesians still have a lot of work left to do in order to give us really flexible, reliable statistical methods we can actually apply to the messy world of real data. In particular, tell us how to choose priors please! Prior selection is a fundamental make-or-break problem in Bayesian inference that has nonetheless been greatly neglected by most Bayesian statisticians. So, what do we do? We fall back on informal norms: Try maximum likelihood, which is like using a very flat prior. Try a normally-distributed prior. See if you can construct a prior from past data. If all those give the same thing, that’s a “robustness check” (see previous informal norm).

Informal norms are also inherently harder to teach and learn. I’ve seen a lot of other grad students flail wildly at statistics, not because they don’t know what a p-value means (though maybe that’s also sometimes true), but because they don’t really quite grok the informal underpinnings of good statistical inference. This can be very hard to explain to someone: They feel like they followed all the rules correctly, but you are saying their results are wrong, and now you can’t explain why.

In fact, some of the informal norms that are in wide use are clearly detrimental. In economics, norms have emerged that certain types of models are better simply because they are “more standard”, such as the dynamic stochastic general equilibrium models that can basically be fit to everything and have never actually usefully predicted anything. In fact, the best ones just predict what we already knew from Keynesian models. But without a formal norm for testing the validity of models, it’s been “DSGE or GTFO”. At present, it is considered “nonstandard” (read: “bad”) not to assume that your agents are either a single unitary “representative agent” or a continuum of infinitely-many agents—modeling the actual fact of finitely-many agents is just not done. Yet it’s hard for me to imagine any formal criterion that wouldn’t at least give you some points for correctly including the fact that there is more than one but less than infinity people in the world (obviously your model could still be bad in other ways).

I don’t know what these new statistical methods would look like. Maybe it’s as simple as formally justifying some of the norms we already use; maybe it’s as complicated as taking a fundamentally new approach to statistical inference. But we have to start somewhere.

Experimentally testing categorical prospect theory

Dec 4, JDN 2457727

In last week’s post I presented a new theory of probability judgments, which doesn’t rely upon people performing complicated math even subconsciously. Instead, I hypothesize that people try to assign categories to their subjective probabilities, and throw away all the information that wasn’t used to assign that category.

The way to most clearly distinguish this from cumulative prospect theory is to show discontinuity. Kahneman’s smooth, continuous function places fairly strong bounds on just how much a shift from 0% to 0.000001% can really affect your behavior. In particular, if you want to explain the fact that people do seem to behave differently around 10% compared to 1% probabilities, you can’t allow the slope of the smooth function to get much higher than 10 at any point, even near 0 and 1. (It does depend on the precise form of the function, but the more complicated you make it, the more free parameters you add to the model. In the most parsimonious form, which is a cubic polynomial, the maximum slope is actually much smaller than this—only 2.)

If that’s the case, then switching from 0.% to 0.0001% should have no more effect in reality than a switch from 0% to 0.00001% would to a rational expected utility optimizer. But in fact I think I can set up scenarios where it would have a larger effect than a switch from 0.001% to 0.01%.

Indeed, these games are already quite profitable for the majority of US states, and they are called lotteries.

Rationally, it should make very little difference to you whether your odds of winning the Powerball are 0 (you bought no ticket) or 0.000000001% (you bought a ticket), even when the prize is $100 million. This is because your utility of $100 million is nowhere near 100 million times as large as your marginal utility of $1. A good guess would be that your lifetime income is about $2 million, your utility is logarithmic, the units of utility are hectoQALY, and the baseline level is about 100,000.

I apologize for the extremely large number of decimals, but I had to do that in order to show any difference at all. I have bolded where the decimals first deviate from the baseline.

Your utility if you don’t have a ticket is ln(20) = 2.9957322736 hQALY.

Your utility if you have a ticket is (1-10^-9) ln(20) + 10^-9 ln(1020) = 2.9957322775 hQALY.

You gain a whopping 40 microQALY over your whole lifetime. I highly doubt you could even perceive such a difference.

And yet, people are willing to pay nontrivial sums for the chance to play such lotteries. Powerball tickets sell for about $2 each, and some people buy tickets every week. If you do that and live to be 80, you will spend some $8,000 on lottery tickets during your lifetime, which results in this expected utility: (1-4*10^-6) ln(20-0.08) + 4*10^-6 ln(1020) = 2.9917399955 hQALY.
You have now sacrificed 0.004 hectoQALY, which is to say 0.4 QALY—that’s months of happiness you’ve given up to play this stupid pointless game.

Which shouldn’t be surprising, as (with 99.9996% probability) you have given up four months of your lifetime income with nothing to show for it. Lifetime income of $2 million / lifespan of 80 years = $25,000 per year; $8,000 / $25,000 = 0.32. You’ve actually sacrificed slightly more than this, which comes from your risk aversion.

Why would anyone do such a thing? Because while the difference between 0 and 10^-9 may be trivial, the difference between “impossible” and “almost impossible” feels enormous. “You can’t win if you don’t play!” they say, but they might as well say “You can’t win if you do play either.” Indeed, the probability of winning without playing isn’t zero; you could find a winning ticket lying on the ground, or win due to an error that is then upheld in court, or be given the winnings bequeathed by a dying family member or gifted by an anonymous donor. These are of course vanishingly unlikely—but so was winning in the first place. You’re talking about the difference between 10^-9 and 10^-12, which in proportional terms sounds like a lot—but in absolute terms is nothing. If you drive to a drug store every week to buy a ticket, you are more likely to die in a car accident on the way to the drug store than you are to win the lottery.

Of course, these are not experimental conditions. So I need to devise a similar game, with smaller stakes but still large enough for people’s brains to care about the “almost impossible” category; maybe thousands? It’s not uncommon for an economics experiment to cost thousands, it’s just usually paid out to many people instead of randomly to one person or nobody. Conducting the experiment in an underdeveloped country like India would also effectively amplify the amounts paid, but at the fixed cost of transporting the research team to India.

But I think in general terms the experiment could look something like this. You are given $20 for participating in the experiment (we treat it as already given to you, to maximize your loss aversion and endowment effect and thereby give us more bang for our buck). You then have a chance to play a game, where you pay $X to get a P probability of $Y*X, and we vary these numbers.

The actual participants wouldn’t see the variables, just the numbers and possibly the rules: “You can pay $2 for a 1% chance of winning $200. You can also play multiple times if you wish.” “You can pay $10 for a 5% chance of winning $250. You can only play once or not at all.”

So I think the first step is to find some dilemmas, cases where people feel ambivalent, and different people differ in their choices. That’s a good role for a pilot study.

Then we take these dilemmas and start varying their probabilities slightly.

In particular, we try to vary them at the edge of where people have mental categories. If subjective probability is continuous, a slight change in actual probability should never result in a large change in behavior, and furthermore the effect of a change shouldn’t vary too much depending on where the change starts.

But if subjective probability is categorical, these categories should have edges. Then, when I present you with two dilemmas that are on opposite sides of one of the edges, your behavior should radically shift; while if I change it in a different way, I can make a large change without changing the result.

Based solely on my own intuition, I guessed that the categories roughly follow this pattern:

Impossible: 0%

Almost impossible: 0.1%

Very unlikely: 1%

Unlikely: 10%

Fairly unlikely: 20%

Roughly even odds: 50%

Fairly likely: 80%

Likely: 90%

Very likely: 99%

Almost certain: 99.9%

Certain: 100%

So for example, if I switch from 0%% to 0.01%, it should have a very large effect, because I’ve moved you out of your “impossible” category (indeed, I think the “impossible” category is almost completely sharp; literally anything above zero seems to be enough for most people, even 10^-9 or 10^-10). But if I move from 1% to 2%, it should have a small effect, because I’m still well within the “very unlikely” category. Yet the latter change is literally one hundred times larger than the former. It is possible to define continuous functions that would behave this way to an arbitrary level of approximation—but they get a lot less parsimonious very fast.

Now, immediately I run into a problem, because I’m not even sure those are my categories, much less that they are everyone else’s. If I knew precisely which categories to look for, I could tell whether or not I had found it. But the process of both finding the categories and determining if their edges are truly sharp is much more complicated, and requires a lot more statistical degrees of freedom to get beyond the noise.

One thing I’m considering is assigning these values as a prior, and then conducting a series of experiments which would adjust that prior. In effect I would be using optimal Bayesian probability reasoning to show that human beings do not use optimal Bayesian probability reasoning. Still, I think that actually pinning down the categories would require a large number of participants or a long series of experiments (in frequentist statistics this distinction is vital; in Bayesian statistics it is basically irrelevant—one of the simplest reasons to be Bayesian is that it no longer bothers you whether someone did 2 experiments of 100 people or 1 experiment of 200 people, provided they were the same experiment of course). And of course there’s always the possibility that my theory is totally off-base, and I find nothing; a dissertation replicating cumulative prospect theory is a lot less exciting (and, sadly, less publishable) than one refuting it.

Still, I think something like this is worth exploring. I highly doubt that people are doing very much math when they make most probabilistic judgments, and using categories would provide a very good way for people to make judgments usefully with no math at all.

How do people think about probability?

Nov 27, JDN 2457690

(This topic was chosen by vote of my Patreons.)

In neoclassical theory, it is assumed (explicitly or implicitly) that human beings judge probability in something like the optimal Bayesian way: We assign prior probabilities to events, and then when confronted with evidence we infer using the observed data to update our prior probabilities to posterior probabilities. Then, when we have to make decisions, we maximize our expected utility subject to our posterior probabilities.

This, of course, is nothing like how human beings actually think. Even very intelligent, rational, numerate people only engage in a vague approximation of this behavior, and only when dealing with major decisions likely to affect the course of their lives. (Yes, I literally decide which universities to attend based upon formal expected utility models. Thus far, I’ve never been dissatisfied with a decision made that way.) No one decides what to eat for lunch or what to do this weekend based on formal expected utility models—or at least I hope they don’t, because that point the computational cost far exceeds the expected benefit.

So how do human beings actually think about probability? Well, a good place to start is to look at ways in which we systematically deviate from expected utility theory.

A classic example is the Allais paradox. See if it applies to you.

In game A, you get $1 million dollars, guaranteed.
In game B, you have a 10% chance of getting $5 million, an 89% chance of getting $1 million, but now you have a 1% chance of getting nothing.

Which do you prefer, game A or game B?

In game C, you have an 11% chance of getting $1 million, and an 89% chance of getting nothing.

In game D, you have a 10% chance of getting $5 million, and a 90% chance of getting nothing.

Which do you prefer, game C or game D?

I have to think about it for a little bit and do some calculations, and it’s still very hard because it depends crucially on my projected lifetime income (which could easily exceed $3 million with a PhD, especially in economics) and the precise form of my marginal utility (I think I have constant relative risk aversion, but I’m not sure what parameter to use precisely), but in general I think I want to choose game A and game C, but I actually feel really ambivalent, because it’s not hard to find plausible parameters for my utility where I should go for the gamble.

But if you’re like most people, you choose game A and game D.

There is no coherent expected utility by which you would do this.

Why? Either a 10% chance of $5 million instead of $1 million is worth risking a 1% chance of nothing, or it isn’t. If it is, you should play B and D. If it’s not, you should play A and C. I can’t tell you for sure whether it is worth it—I can’t even fully decide for myself—but it either is or it isn’t.

Yet most people have a strong intuition that they should take game A but game D. Why? What does this say about how we judge probability?
The leading theory in behavioral economics right now is cumulative prospect theory, developed by the great Kahneman and Tversky, who essentially founded the field of behavioral economics. It’s quite intimidating to try to go up against them—which is probably why we should force ourselves to do it. Fear of challenging the favorite theories of the great scientists before us is how science stagnates.

I wrote about it more in a previous post, but as a brief review, cumulative prospect theory says that instead of judging based on a well-defined utility function, we instead consider gains and losses as fundamentally different sorts of thing, and in three specific ways:

First, we are loss-averse; we feel a loss about twice as intensely as a gain of the same amount.

Second, we are risk-averse for gains, but risk-seeking for losses; we assume that gaining twice as much isn’t actually twice as good (which is almost certainly true), but we also assume that losing twice as much isn’t actually twice as bad (which is almost certainly false and indeed contradictory with the previous).

Third, we judge probabilities as more important when they are close to certainty. We make a large distinction between a 0% probability and a 0.0000001% probability, but almost no distinction at all between a 41% probability and a 43% probability.

That last part is what I want to focus on for today. In Kahneman’s model, this is a continuous, monotonoic function that maps 0 to 0 and 1 to 1, but systematically overestimates probabilities below but near 1/2 and systematically underestimates probabilities above but near 1/2.

It looks something like this, where red is true probability and blue is subjective probability:

cumulative_prospect
I don’t believe this is actually how humans think, for two reasons:

  1. It’s too hard. Humans are astonishingly innumerate creatures, given the enormous processing power of our brains. It’s true that we have some intuitive capacity for “solving” very complex equations, but that’s almost all within our motor system—we can “solve a differential equation” when we catch a ball, but we have no idea how we’re doing it. But probability judgments are often made consciously, especially in experiments like the Allais paradox; and the conscious brain is terrible at math. It’s actually really amazing how bad we are at math. Any model of normal human judgment should assume from the start that we will not do complicated math at any point in the process. Maybe you can hypothesize that we do so subconsciously, but you’d better have a good reason for assuming that.
  2. There is no reason to do this. Why in the world would any kind of optimization system function this way? You start with perfectly good probabilities, and then instead of using them, you subject them to some bizarre, unmotivated transformation that makes them less accurate and costs computing power? You may as well hit yourself in the head with a brick.

So, why might it look like we are doing this? Well, my proposal, admittedly still rather half-baked, is that human beings don’t assign probabilities numerically at all; we assign them categorically.

You may call this, for lack of a better term, categorical prospect theory.

My theory is that people don’t actually have in their head “there is an 11% chance of rain today” (unless they specifically heard that from a weather report this morning); they have in their head “it’s fairly unlikely that it will rain today”.

That is, we assign some small number of discrete categories of probability, and fit things into them. I’m not sure what exactly the categories are, and part of what makes my job difficult here is that they may be fuzzy-edged and vary from person to person, but roughly speaking, I think they correspond to the sort of things psychologists usually put on Likert scales in surveys: Impossible, almost impossible, very unlikely, unlikely, fairly unlikely, roughly even odds, fairly likely, likely, very likely, almost certain, certain. If I’m putting numbers on these probability categories, they go something like this: 0, 0.001, 0.01, 0.10, 0.20, 0.50, 0.8, 0.9, 0.99, 0.999, 1.

Notice that this would preserve the same basic effect as cumulative prospect theory: You care a lot more about differences in probability when they are near 0 or 1, because those are much more likely to actually shift your category. Indeed, as written, you wouldn’t care about a shift from 0.4 to 0.6 at all, despite caring a great deal about a shift from 0.001 to 0.01.

How does this solve the above problems?

  1. It’s easy. Not only don’t you compute a probability and then recompute it for no reason; you never even have to compute it precisely. Just get it within some vague error bounds and that will tell you what box it goes in. Instead of computing an approximation to a continuous function, you just slot things into a small number of discrete boxes, a dozen at the most.
  2. That explains why we would do it: It’s easy. Our brains need to conserve their capacity, and they did especially in our ancestral environment when we struggled to survive. Rather than having to iterate your approximation to arbitrary precision, you just get within 0.1 or so and call it a day. That saves time and computing power, which saves energy, which could save your life.

What new problems have I introduced?

  1. It’s very hard to know exactly where people’s categories are, if they vary between individuals or even between situations, and whether they are fuzzy-edged.
  2. If you take the model I just gave literally, even quite large probability changes will have absolutely no effect as long as they remain within a category such as “roughly even odds”.

With regard to 2, I think Kahneman may himself be able to save me, with his dual process theory concept of System 1 and System 2. What I’m really asserting is that System 1, the fast, intuitive judgment system, operates on these categories. System 2, on the other hand, the careful, rational thought system, can actually make use of proper numerical probabilities; it’s just very costly to boot up System 2 in the first place, much less ensure that it actually gets the right answer.

How might we test this? Well, I think that people are more likely to use System 1 when any of the following are true:

  1. They are under harsh time-pressure
  2. The decision isn’t very important
  3. The intuitive judgment is fast and obvious

And conversely they are likely to use System 2 when the following are true:

  1. They have plenty of time to think
  2. The decision is very important
  3. The intuitive judgment is difficult or unclear

So, it should be possible to arrange an experiment varying these parameters, such that in one treatment people almost always use System 1, and in another they almost always use System 2. And then, my prediction is that in the System 1 treatment, people will in fact not change their behavior at all when you change the probability from 15% to 25% (fairly unlikely) or 40% to 60% (roughly even odds).

To be clear, you can’t just present people with this choice between game E and game F:

Game E: You get a 60% chance of $50, and a 40% chance of nothing.

Game F: You get a 40% chance of $50, and a 60% chance of nothing.

People will obviously choose game E. If you can directly compare the numbers and one game is strictly better in every way, I think even without much effort people will be able to choose correctly.

Instead, what I’m saying is that if you make the following offers to two completely different sets of people, you will observe little difference in their choices, even though under expected utility theory you should.
Group I receives a choice between game E and game G:

Game E: You get a 60% chance of $50, and a 40% chance of nothing.

Game G: You get a 100% chance of $20.

Group II receives a choice between game F and game G:

Game F: You get a 40% chance of $50, and a 60% chance of nothing.

Game G: You get a 100% chance of $20.

Under two very plausible assumptions about marginal utility of wealth, I can fix what the rational judgment should be in each game.

The first assumption is that marginal utility of wealth is decreasing, so people are risk-averse (at least for gains, which these are). The second assumption is that most people’s lifetime income is at least two orders of magnitude higher than $50.

By the first assumption, group II should choose game G. The expected income is precisely the same, and being even ever so slightly risk-averse should make you go for the guaranteed $20.

By the second assumption, group I should choose game E. Yes, there is some risk, but because $50 should not be a huge sum to you, your risk aversion should be small and the higher expected income of $30 should sway you.

But I predict that most people will choose game G in both cases, and (within statistical error) the same proportion will choose F as chose E—thus showing that the difference between a 40% chance and a 60% chance was in fact negligible to their intuitive judgments.

However, this doesn’t actually disprove Kahneman’s theory; perhaps that part of the subjective probability function is just that flat. For that, I need to set up an experiment where I show discontinuity. I need to find the edge of a category and get people to switch categories sharply. Next week I’ll talk about how we might pull that off.