Games as economic simulations—and education tools

Mar 5, JDN 2457818 [Sun]

Moore’s Law is a truly astonishing phenomenon. Now as we are well into the 21st century (I’ve lived more of my life in the 21st century than the 20th now!) it may finally be slowing down a little bit, but it has had quite a run, and even this could be a temporary slowdown due to economic conditions or the lull before a new paradigm (quantum computing?) matures. Since at least 1975, the computing power of an individual processor has doubled approximately every year and a half; that means it has doubled over 25 times—or in other words that it has increased by a factor of over 30 million. I now have in my pocket a smartphone with several thousand times the processing speed of the guidance computer of the Saturn V that landed on the Moon.

This meteoric increase in computing power has had an enormous impact on the way science is done, including economics. Simple theoretical models that could be solved by hand are now being replaced by enormous simulation models that have to be processed by computers. It is now commonplace to devise models with systems of dozens of nonlinear equations that are literally impossible to solve analytically, and just solve them iteratively with computer software.

But one application of this technology that I believe is currently underutilized is video games.

As a culture, we still have the impression that video games are for children; even games like Dragon Age and Grand Theft Auto that are explicitly for adults (and really quite inappropriate for children!) are viewed as in some sense “childish”—that no serious adult would be involved with such frivolities. The same cultural critics who treat Shakespeare’s vagina jokes as the highest form of art are liable to dismiss the poignant critique of war in Call of Duty: Black Ops or the reflections on cultural diversity in Skyrim as mere puerility.

But video games are an art form with a fundamentally greater potential than any other. Now that graphics are almost photorealistic, there is really nothing you can do in a play or a film that you can’t do in a video game—and there is so, so much more that you can only do in a game.
In what other medium can we witness the spontaneous emergence and costly aftermath of a war? Yet EVE Online has this sort of event every year or so—just today there was a surprise attack involving hundreds of players that destroyed thousands of hours’—and dollars’—worth of starships, something that has more or less become an annual tradition. A few years ago there was a massive three-faction war that destroyed over $300,000 in ships and has now been commemorated as “the Bloodbath of B-R5RB”.
Indeed, the immersion and interactivity of games present an opportunity to do nothing less than experimental macroeconomics. For generations it has been impossible, or at least absurdly unethical, to ever experimentally manipulate an entire macroeconomy. But in a video game like EVE Online or Second Life, we can now do so easily, cheaply, and with little or no long-term harm to the participants—and we can literally control everything in the experiment. Forget the natural resource constraints and currency exchange rates—we can change the laws of physics if we want. (Indeed, EVE‘s whole trade network is built around FTL jump points, and in Second Life it’s a basic part of the interface that everyone can fly like Superman.)

This provides untold potential for economic research. With sufficient funding, we could build a game that would allow us to directly test hypotheses about the most fundamental questions of economics: How do governments emerge and maintain security? How is the rule of law sustained, and when can it be broken? What controls the value of money and the rate of inflation? What is the fundamental cause of unemployment, and how can it be corrected? What influences the rate of technological development? How can we maximize the rate of economic growth? What effect does redistribution of wealth have on employment and output? I envision a future where we can directly simulate these questions with thousands of eager participants, varying the subtlest of parameters and carrying out events over any timescale we like from seconds to centuries.

Nor is the potential of games in economics limited to research; it also has enormous untapped potential in education. I’ve already seen in my classes how tabletop-style games with poker chips can teach a concept better in a few minutes than hours of writing algebra derivations on the board; but custom-built video games could be made that would teach economics far better still, and to a much wider audience. In a well-designed game, people could really feel the effects of free trade or protectionism, not just on themselves as individuals but on entire nations that they control—watch their GDP numbers go down as they scramble to produce in autarky what they could have bought for half the price if not for the tariffs. They could see, in real time, how in the absence of environmental regulations and Pigovian taxes the actions of millions of individuals could despoil our planet for everyone.

Of course, games are fundamentally works of fiction, subject to the Fictional Evidence Fallacy and only as reliable as their authors make them. But so it is with all forms of art. I have no illusions about the fact that we will never get the majority of the population to regularly read peer-reviewed empirical papers. But perhaps if we are clever enough in the games we offer them to play, we can still convey some of the knowledge that those papers contain. We could also update and expand the games as new information comes in. Instead of complaining that our students are spending time playing games on their phones and tablets, we could actually make education into games that are as interesting and entertaining as the ones they would have been playing. We could work with the technology instead of against it. And in a world where more people have access to a smartphone than to a toilet, we could finally bring high-quality education to the underdeveloped world quickly and cheaply.

Rapid growth in computing power has given us a gift of great potential. But soon our capacity will widen even further. Even if Moore’s Law slows down, computing power will continue to increase for awhile yet. Soon enough, virtual reality will finally take off and we’ll have even greater depth of immersion available. The future is bright—if we can avoid this corporatist cyberpunk dystopia we seem to be hurtling toward, of course.

Experimentally testing categorical prospect theory

Dec 4, JDN 2457727

In last week’s post I presented a new theory of probability judgments, which doesn’t rely upon people performing complicated math even subconsciously. Instead, I hypothesize that people try to assign categories to their subjective probabilities, and throw away all the information that wasn’t used to assign that category.

The way to most clearly distinguish this from cumulative prospect theory is to show discontinuity. Kahneman’s smooth, continuous function places fairly strong bounds on just how much a shift from 0% to 0.000001% can really affect your behavior. In particular, if you want to explain the fact that people do seem to behave differently around 10% compared to 1% probabilities, you can’t allow the slope of the smooth function to get much higher than 10 at any point, even near 0 and 1. (It does depend on the precise form of the function, but the more complicated you make it, the more free parameters you add to the model. In the most parsimonious form, which is a cubic polynomial, the maximum slope is actually much smaller than this—only 2.)

If that’s the case, then switching from 0.% to 0.0001% should have no more effect in reality than a switch from 0% to 0.00001% would to a rational expected utility optimizer. But in fact I think I can set up scenarios where it would have a larger effect than a switch from 0.001% to 0.01%.

Indeed, these games are already quite profitable for the majority of US states, and they are called lotteries.

Rationally, it should make very little difference to you whether your odds of winning the Powerball are 0 (you bought no ticket) or 0.000000001% (you bought a ticket), even when the prize is $100 million. This is because your utility of $100 million is nowhere near 100 million times as large as your marginal utility of $1. A good guess would be that your lifetime income is about $2 million, your utility is logarithmic, the units of utility are hectoQALY, and the baseline level is about 100,000.

I apologize for the extremely large number of decimals, but I had to do that in order to show any difference at all. I have bolded where the decimals first deviate from the baseline.

Your utility if you don’t have a ticket is ln(20) = 2.9957322736 hQALY.

Your utility if you have a ticket is (1-10^-9) ln(20) + 10^-9 ln(1020) = 2.9957322775 hQALY.

You gain a whopping 40 microQALY over your whole lifetime. I highly doubt you could even perceive such a difference.

And yet, people are willing to pay nontrivial sums for the chance to play such lotteries. Powerball tickets sell for about $2 each, and some people buy tickets every week. If you do that and live to be 80, you will spend some $8,000 on lottery tickets during your lifetime, which results in this expected utility: (1-4*10^-6) ln(20-0.08) + 4*10^-6 ln(1020) = 2.9917399955 hQALY.
You have now sacrificed 0.004 hectoQALY, which is to say 0.4 QALY—that’s months of happiness you’ve given up to play this stupid pointless game.

Which shouldn’t be surprising, as (with 99.9996% probability) you have given up four months of your lifetime income with nothing to show for it. Lifetime income of $2 million / lifespan of 80 years = $25,000 per year; $8,000 / $25,000 = 0.32. You’ve actually sacrificed slightly more than this, which comes from your risk aversion.

Why would anyone do such a thing? Because while the difference between 0 and 10^-9 may be trivial, the difference between “impossible” and “almost impossible” feels enormous. “You can’t win if you don’t play!” they say, but they might as well say “You can’t win if you do play either.” Indeed, the probability of winning without playing isn’t zero; you could find a winning ticket lying on the ground, or win due to an error that is then upheld in court, or be given the winnings bequeathed by a dying family member or gifted by an anonymous donor. These are of course vanishingly unlikely—but so was winning in the first place. You’re talking about the difference between 10^-9 and 10^-12, which in proportional terms sounds like a lot—but in absolute terms is nothing. If you drive to a drug store every week to buy a ticket, you are more likely to die in a car accident on the way to the drug store than you are to win the lottery.

Of course, these are not experimental conditions. So I need to devise a similar game, with smaller stakes but still large enough for people’s brains to care about the “almost impossible” category; maybe thousands? It’s not uncommon for an economics experiment to cost thousands, it’s just usually paid out to many people instead of randomly to one person or nobody. Conducting the experiment in an underdeveloped country like India would also effectively amplify the amounts paid, but at the fixed cost of transporting the research team to India.

But I think in general terms the experiment could look something like this. You are given $20 for participating in the experiment (we treat it as already given to you, to maximize your loss aversion and endowment effect and thereby give us more bang for our buck). You then have a chance to play a game, where you pay $X to get a P probability of $Y*X, and we vary these numbers.

The actual participants wouldn’t see the variables, just the numbers and possibly the rules: “You can pay $2 for a 1% chance of winning $200. You can also play multiple times if you wish.” “You can pay $10 for a 5% chance of winning $250. You can only play once or not at all.”

So I think the first step is to find some dilemmas, cases where people feel ambivalent, and different people differ in their choices. That’s a good role for a pilot study.

Then we take these dilemmas and start varying their probabilities slightly.

In particular, we try to vary them at the edge of where people have mental categories. If subjective probability is continuous, a slight change in actual probability should never result in a large change in behavior, and furthermore the effect of a change shouldn’t vary too much depending on where the change starts.

But if subjective probability is categorical, these categories should have edges. Then, when I present you with two dilemmas that are on opposite sides of one of the edges, your behavior should radically shift; while if I change it in a different way, I can make a large change without changing the result.

Based solely on my own intuition, I guessed that the categories roughly follow this pattern:

Impossible: 0%

Almost impossible: 0.1%

Very unlikely: 1%

Unlikely: 10%

Fairly unlikely: 20%

Roughly even odds: 50%

Fairly likely: 80%

Likely: 90%

Very likely: 99%

Almost certain: 99.9%

Certain: 100%

So for example, if I switch from 0%% to 0.01%, it should have a very large effect, because I’ve moved you out of your “impossible” category (indeed, I think the “impossible” category is almost completely sharp; literally anything above zero seems to be enough for most people, even 10^-9 or 10^-10). But if I move from 1% to 2%, it should have a small effect, because I’m still well within the “very unlikely” category. Yet the latter change is literally one hundred times larger than the former. It is possible to define continuous functions that would behave this way to an arbitrary level of approximation—but they get a lot less parsimonious very fast.

Now, immediately I run into a problem, because I’m not even sure those are my categories, much less that they are everyone else’s. If I knew precisely which categories to look for, I could tell whether or not I had found it. But the process of both finding the categories and determining if their edges are truly sharp is much more complicated, and requires a lot more statistical degrees of freedom to get beyond the noise.

One thing I’m considering is assigning these values as a prior, and then conducting a series of experiments which would adjust that prior. In effect I would be using optimal Bayesian probability reasoning to show that human beings do not use optimal Bayesian probability reasoning. Still, I think that actually pinning down the categories would require a large number of participants or a long series of experiments (in frequentist statistics this distinction is vital; in Bayesian statistics it is basically irrelevant—one of the simplest reasons to be Bayesian is that it no longer bothers you whether someone did 2 experiments of 100 people or 1 experiment of 200 people, provided they were the same experiment of course). And of course there’s always the possibility that my theory is totally off-base, and I find nothing; a dissertation replicating cumulative prospect theory is a lot less exciting (and, sadly, less publishable) than one refuting it.

Still, I think something like this is worth exploring. I highly doubt that people are doing very much math when they make most probabilistic judgments, and using categories would provide a very good way for people to make judgments usefully with no math at all.

What is the price of time?

JDN 2457562

If they were asked outright, “What is the price of time?” most people would find that it sounds nonsensical, like I’ve asked you “What is the diameter of calculus?” or “What is the electric charge of justice?” (It’s interesting that we generally try to assign meaning to such nonsensical questions, and they often seem strangely profound when we do; a good deal of what passes for “profound wisdom” is really better explained as this sort of reaction to nonsense. Deepak Chopra, for instance.)

But there is actually a quite sensible economic meaning of this question, and answering it turns out to have many important implications for how we should run our countries and how we should live our lives.

What we are really asking for is temporal discounting; we want to know how much more money today is worth compared to tomorrow, and how much more money tomorrow is worth compared to two days from now.

If you say that they are exactly the same, your discount rate (your “price of time”) is zero; if that is indeed how you feel, may I please borrow your entire net wealth at 0% interest for the next thirty years? If you like we can even inflation-index the interest rate so it always produces a real interest rate of zero, thus protecting you from potential inflation risk.
What? You don’t like my deal? You say you need that money sooner? Then your discount rate is not zero. Similarly, it can’t be negative; if you actually valued money tomorrow more than money today, you’d gladly give me my loan.

Money today is worth more to you than money tomorrow—the only question is how much more.

There’s a very simple theorem which says that as long as your temporal discounting doesn’t change over time, so it is dynamically consistent, it must have a very specific form. I don’t normally use math this advanced in my blog, but this one is so elegant I couldn’t resist. I’ll encase it in blockquotes so you can skim over it if you must.

The value of $1 today relative to… today is of course 1; f(0) = 1.

If you are dynamically consistent, at any time t you should discount tomorrow relative to today the same as you discounted today relative to yesterday, so for all t, f(t+1)/f(t) = f(t)/f(t-1)
Thus, f(t+1)/f(t) is independent of t, and therefore equal to some constant, which we can call r:

f(t+1)/f(t) = r, which implies f(t+1) = r f(t).

Starting at f(0) = 1, we have:

f(0) = 1, f(1) = r, f(2) = r^2

We can prove that this pattern continues to hold by mathematical induction.

Suppose the following is true for some integer k; we already know it works for k = 1:

f(k) = r^k

Let t = k:

f(k+1) = r f(k)

Therefore:

f(k+1) = r^(k+1)

Which by induction proves that for all integers n:

f(n) = r^n

The name of the variable doesn’t matter. Therefore:

f(t) = r^t

Whether you agree with me that this is beautiful, or you have no idea what I just said, the take-away is the same: If your discount rate is consistent over time, it must be exponential. There must be some constant number 0 < r < 1 such that each successive time period is worth r times as much as the previous. (You can also generalize this to the case of continuous time, where instead of r^t you get e^(-r t). This requires even more advanced math, so I’ll spare you.)

Most neoclassical economists would stop right there. But there are two very big problems with this argument:

(1) It doesn’t tell us the value r should actually be, only that it should be a constant.

(2) No actual human being thinks of time this way.

There is still ongoing research as to exactly how real human beings discount time, but one thing is quite clear from the experiments: It certainly isn’t exponential.

From about 2000 to 2010, the consensus among cognitive economists was that humans discount time hyperbolically; that is, our discount function looks like this:

f(t) = 1/(1 + r t)

In the 1990s there were a couple of experiments supporting hyperbolic discounting. There is even some theoretical work trying to show that this is actually optimal, given a certain kind of uncertainty about the future, and the argument for exponential discounting relies upon certainty we don’t actually have. Hyperbolic discounting could also result if we were reasoning as though we are given a simple interest rate, rather than a compound interest rate.

But even that doesn’t really seem like humans think, now does it? It’s already weird enough for someone to say “Should I take out this loan at 5%? Well, my discount rate is 7%, so yes.” But I can at least imagine that happening when people are comparing two different interest rates (“Should I pay down my student loans, or my credit cards?”). But I can’t imagine anyone thinking, “Should I take out this loan at 5% APR which I’d need to repay after 5 years? Well, let’s check my discount function, 1/(1+0.05 (5)) = 0.8, multiplied by 1.05^5 = 1.28, the product of which is 1.02, greater than 1, so no, I shouldn’t.” That isn’t how human brains function.

Moreover, recent experiments have shown that people often don’t seem to behave according to what hyperbolic discounting would predict.

Therefore I am very much in the other camp of cognitive economists, who say that we don’t have a well-defined discount function. It’s not exponential, it’s not hyperbolic, it’s not “quasi-hyperbolic” (yes that is a thing); we just don’t have one. We reason about time by simple heuristics. You can’t make a coherent function out of it because human beings… don’t always reason coherently.

Some economists seem to have an incredible amount of trouble accepting that; here we have one from the University of Chicago arguing that hyperbolic discounting can’t possibly exist, because then people could be Dutch-booked out of all their money; but this amounts to saying that human behavior cannot ever be irrational, lest all our money magically disappear. Yes, we know hyperbolic discounting (and heuristics) allow for Dutch-booking; that’s why they’re irrational. If you really want to know the formal assumption this paper makes that is wrong, it assumes that we have complete markets—and yes, complete markets essentially force you to be perfectly rational or die, because the slightest inconsistency in your reasoning results in someone convincing you to bet all your money on a sure loss. Why was it that we wanted complete markets, again? (Oh, yes, the fanciful Arrow-Debreu model, the magical fairy land where everyone is perfectly rational and all markets are complete and we all have perfect information and the same amount of wealth and skills and the same preferences, where everything automatically achieves a perfect equilibrium.)

There was a very good experiment on this, showing that rather than discount hyperbolically, behavior is better explained by a heuristic that people judge which of two options is better by a weighted sum of the absolute distance in time plus the relative distance in time. Now that sounds like something human beings might actually do. “$100 today or $110 tomorrow? That’s only 1 day away, but it’s also twice as long. I’m not waiting.” “$100 next year, or $110 in a year and a day? It’s only 1 day apart, and it’s only slightly longer, so I’ll wait.”

That might not actually be the precise heuristic we use, but it at least seems like one that people could use.

John Duffy, whom I hope to work with at UCI starting this fall, has been working on another experiment to test a different heuristic, based on the work of Daniel Kahneman, saying essentially that we have a fast, impulsive, System 1 reasoning layer and a slow, deliberative, System 2 reasoning layer; the result is that our judgments combine both “hand to mouth” where our System 1 essentially tries to get everything immediately and spend whatever we can get our hands on, and a more rational assessment by System 2 that might actually resemble an exponential discount rate. In the 5-minute judgment, System 1’s voice is overwhelming; but if we’re already planning a year out, System 1 doesn’t even care anymore and System 2 can take over. This model also has the nice feature of explaining why people with better self-control seem to behave more like they use exponential discounting,[PDF link] and why people do on occasion reason more or less exponentially, while I have literally never heard anyone try to reason hyperbolically, only economic theorists trying to use hyperbolic models to explain behavior.

Another theory is that discounting is “subadditive”, that is, if you break up a long time interval into many short intervals, people will discount it more, because it feels longer that way. Imagine a century. Now imagine a year, another year, another year, all the way up to 100 years. Now imagine a day, another day, another day, all the way up to 365 days for the first year, and then 365 days for the second year, and that on and on up to 100 years. It feels longer, doesn’t it? It is of course exactly the same. This can account for some weird anomalies in choice behavior, but I’m not convinced it’s as good as the two-system model.

Another theory is that we simply have a “present bias”, which we treat as a sort of fixed cost that we incur regardless of what the payments are. I like this because it is so supremely simple, but there’s something very fishy about it, because in this experiment it was just fixed at $4, and that can’t be right. It must be fixed at some proportion of the rewards, or something like that; or else we would always exhibit near-perfect exponential discounting for large amounts of money, which is more expensive to test (quite directly), but still seems rather unlikely.

Why is this important? This post is getting long, so I’ll save it for future posts, but in short, the ways that we value future costs and benefits, both as we actually do, and as we ought to, have far-reaching implications for everything from inflation to saving to environmental sustainability.

Why it matters that torture is ineffective

JDN 2457531

Like “longest-ever-serving Speaker of the House sexually abuses teenagers” and “NSA spy program is trying to monitor the entire telephone and email system”, the news that the US government systematically tortures suspects is an egregious violation that goes to the highest levels of our government—that for some reason most Americans don’t particularly seem to care about.

The good news is that President Obama signed an executive order in 2009 banning torture domestically, reversing official policy under the Bush Administration, and then better yet in 2014 expanded the order to apply to all US interests worldwide. If this is properly enforced, perhaps our history of hypocrisy will finally be at its end. (Well, not if Trump wins…)

Yet as often seems to happen, there are two extremes in this debate and I think they’re both wrong.
The really disturbing side is “Torture works and we have to use it!” The preferred mode of argumentation for this is the “ticking time bomb scenario”, in which we have some urgent disaster to prevent (such as a nuclear bomb about to go off) and torture is the only way to stop it from happening. Surely then torture is justified? This argument may sound plausible, but as I’ll get to below, this is a lot like saying, “If aliens were attacking from outer space trying to wipe out humanity, nuclear bombs would probably be justified against them; therefore nuclear bombs are always justified and we can use them whenever we want.” If you can’t wait for my explanation, The Atlantic skewers the argument nicely.

Yet the opponents of torture have brought this sort of argument on themselves, by staking out a position so extreme as “It doesn’t matter if torture works! It’s wrong, wrong, wrong!” This kind of simplistic deontological reasoning is very appealing and intuitive to humans, because it casts the world into simple black-and-white categories. To show that this is not a strawman, here are several different people all making this same basic argument, that since torture is illegal and wrong it doesn’t matter if it works and there should be no further debate.

But the truth is, if it really were true that the only way to stop a nuclear bomb from leveling Los Angeles was to torture someone, it would be entirely justified—indeed obligatory—to torture that suspect and stop that nuclear bomb.

The problem with that argument is not just that this is not our usual scenario (though it certainly isn’t); it goes much deeper than that:

That scenario makes no sense. It wouldn’t happen.

To use the example the late Antonin Scalia used from an episode of 24 (perhaps the most egregious Fictional Evidence Fallacy ever committed), if there ever is a nuclear bomb planted in Los Angeles, that would literally be one of the worst things that ever happened in the history of the human race—literally a Holocaust in the blink of an eye. We should be prepared to cause extreme suffering and death in order to prevent it. But not only is that event (fortunately) very unlikely, torture would not help us.

Why? Because torture just doesn’t work that well.

It would be too strong to say that it doesn’t work at all; it’s possible that it could produce some valuable intelligence—though clear examples of such results are amazingly hard to come by. There are some social scientists who have found empirical results showing some effectiveness of torture, however. We can’t say with any certainty that it is completely useless. (For obvious reasons, a randomized controlled experiment in torture is wildly unethical, so none have ever been attempted.) But to justify torture it isn’t enough that it could work sometimes; it has to work vastly better than any other method we have.

And our empirical data is in fact reliable enough to show that that is not the case. Torture often produces unreliable information, as we would expect from the game theory involved—your incentive is to stop the pain, not provide accurate intel; the psychological trauma that torture causes actually distorts memory and reasoning; and as a matter of fact basically all the useful intelligence obtained in the War on Terror was obtained through humane interrogation methods. As interrogation experts agree, torture just isn’t that effective.

In principle, there are four basic cases to consider:

1. Torture is vastly more effective than the best humane interrogation methods.

2. Torture is slightly more effective than the best humane interrogation methods.

3. Torture is as effective as the best humane interrogation methods.

4. Torture is less effective than the best humane interrogation methods.

The evidence points most strongly to case 4, which would mean that torture is a no-brainer; if it doesn’t even work as well as other methods, it’s absurd to use it. You’re basically kicking puppies at that point—purely sadistic violence that accomplishes nothing. But the data isn’t clear enough for us to rule out case 3 or even case 2. There is only one case we can strictly rule out, and that is case 1.

But it was only in case 1 that torture could ever be justified!

If you’re trying to justify doing something intrinsically horrible, it’s not enough that it has some slight benefit.

People seem to have this bizarre notion that we have only two choices in morality:

Either we are strict deontologists, and wrong actions can never be justified by good outcomes ever, in which case apparently vaccines are morally wrong, because stabbing children with needles is wrong. Tto be fair, some people seem to actually believe this; but then, some people believe the Earth is less than 10,000 years old.

Or alternatively we are the bizarre strawman concept most people seem to have of utilitarianism, under which any wrong action can be justified by even the slightest good outcome, in which case all you need to do to justify slavery is show that it would lead to a 1% increase in per-capita GDP. Sadly, there honestly do seem to be economists who believe this sort of thing. Here’s one arguing that US chattel slavery was economically efficient, and some of the more extreme arguments for why sweatshops are good can take on this character. Sweatshops may be a necessary evil for the time being, but they are still an evil.

But what utilitarianism actually says (and I consider myself some form of nuanced rule-utilitarian, though actually I sometimes call it “deontological consequentialism” to emphasize that I mean to synthesize the best parts of the two extremes) is not that the ends always justify the means, but that the ends can justify the means—that it can be morally good or even obligatory to do something intrinsically bad (like stabbing children with needles) if it is the best way to accomplish some greater good (like saving them from measles and polio). But the good actually has to be greater, and it has to be the best way to accomplish that good.

To see why this later proviso is important, consider the real-world ethical issues involved in psychology experiments. The benefits of psychology experiments are already quite large, and poised to grow as the science improves; one day the benefits of cognitive science to humanity may be even larger than the benefits of physics and biology are today. Imagine a world without mood disorders or mental illness of any kind; a world without psychopathy, where everyone is compassionate; a world where everyone is achieving their full potential for happiness and self-actualization. Cognitive science may yet make that world possible—and I haven’t even gotten into its applications in artificial intelligence.

To achieve that world, we will need a great many psychology experiments. But does that mean we can just corral people off the street and throw them into psychology experiments without their consent—or perhaps even their knowledge? That we can do whatever we want in those experiments, as long as it’s scientifically useful? No, it does not. We have ethical standards in psychology experiments for a very good reason, and while those ethical standards do slightly reduce the efficiency of the research process, the reduction is small enough that the moral choice is obviously to retain the ethics committees and accept the slight reduction in research efficiency. Yes, randomly throwing people into psychology experiments might actually be slightly better in purely scientific terms (larger and more random samples)—but it would be terrible in moral terms.

Along similar lines, even if torture works about as well or even slightly better than other methods, that’s simply not enough to justify it morally. Making a successful interrogation take 16 days instead of 17 simply wouldn’t be enough benefit to justify the psychological trauma to the suspect (and perhaps the interrogator!), the risk of harm to the falsely accused, or the violation of international human rights law. And in fact a number of terrorism suspects were waterboarded for months, so even the idea that it could shorten the interrogation is pretty implausible. If anything, torture seems to make interrogations take longer and give less reliable information—case 4.

A lot of people seem to have this impression that torture is amazingly, wildly effective, that a suspect who won’t crack after hours of humane interrogation can be tortured for just a few minutes and give you all the information you need. This is exactly what we do not find empirically; if he didn’t crack after hours of talk, he won’t crack after hours of torture. If you literally only have 30 minutes to find the nuke in Los Angeles, I’m sorry; you’re not going to find the nuke in Los Angeles. No adversarial interrogation is ever going to be completed that quickly, no matter what technique you use. Evacuate as many people to safe distances or underground shelters as you can in the time you have left.

This is why the “ticking time-bomb” scenario is so ridiculous (and so insidious); that’s simply not how interrogation works. The best methods we have for “rapid” interrogation of hostile suspects take hours or even days, and they are humane—building trust and rapport is the most important step. The goal is to get the suspect to want to give you accurate information.

For the purposes of the thought experiment, okay, you can stipulate that it would work (this is what the Stanford Encyclopedia of Philosophy does). But now all you’ve done is made the thought experiment more distant from the real-world moral question. The closest real-world examples we’ve ever had involved individual crimes, probably too small to justify the torture (as bad as a murdered child is, think about what you’re doing if you let the police torture people). But by the time the terrorism to be prevented is large enough to really be sufficient justification, it (1) hasn’t happened in the real world and (2) surely involves terrorists who are sufficiently ideologically committed that they’ll be able to resist the torture. If such a situation arises, of course we should try to get information from the suspects—but what we try should be our best methods, the ones that work most consistently, not the ones that “feel right” and maybe happen to work on occasion.

Indeed, the best explanation I have for why people use torture at all, given its horrible effects and mediocre effectiveness at best is that it feels right.

When someone does something terrible (such as an act of terrorism), we rightfully reduce our moral valuation of them relative to everyone else. If you are even tempted to deny this, suppose a terrorist and a random civilian are both inside a burning building and you only have time to save one. Of course you save the civilian and not the terrorist. And that’s still true even if you know that once the terrorist was rescued he’d go to prison and never be a threat to anyone else. He’s just not worth as much.

In the most extreme circumstances, a person can be so terrible that their moral valuation should be effectively zero: If the only person in a burning building is Stalin, I’m not sure you should save him even if you easily could. But it is a grave moral mistake to think that a person’s moral valuation should ever go negative, yet I think this is something that people do when confronted with someone they truly hate. The federal agents torturing those terrorists didn’t merely think of them as worthless—they thought of them as having negative worth. They felt it was a positive good to harm them. But this is fundamentally wrong; no sentient being has negative worth. Some may be so terrible as to have essentially zero worth; and we are often justified in causing harm to some in order to save others. It would have been entirely justified to kill Stalin (as a matter of fact he died of heart disease and old age), to remove the continued threat he posed; but to torture him would not have made the world a better place, and actually might well have made it worse.

Yet I can see how psychologically it could be useful to have a mechanism in our brains that makes us hate someone so much we view them as having negative worth. It makes it a lot easier to harm them when necessary, makes us feel a lot better about ourselves when we do. The idea that any act of homicide is a tragedy but some of them are necessary tragedies is a lot harder to deal with than the idea that some people are just so evil that killing or even torturing them is intrinsically good. But some of the worst things human beings have ever done ultimately came from that place in our brains—and torture is one of them.

The powerful persistence of bigotry

JDN 2457527

Bigotry has been a part of human society since the beginning—people have been hating people they perceive as different since as long as there have been people, and maybe even before that. I wouldn’t be surprised to find that different tribes of chimpanzees or even elephants hold bigoted beliefs about each other.

Yet it may surprise you that neoclassical economics has basically no explanation for this. There is a long-standing famous argument that bigotry is inherently irrational: If you hire based on anything aside from actual qualifications, you are leaving money on the table for your company. Because women CEOs are paid less and perform better, simply ending discrimination against women in top executive positions could save any typical large multinational corporation tens of millions of dollars a year. And yet, they don’t! Fancy that.

More recently there has been work on the concept of statistical discrimination, under which it is rational (in the sense of narrowly-defined economic self-interest) to discriminate because categories like race and gender may provide some statistically valid stereotype information. For example, “Black people are poor” is obviously not true across the board, but race is strongly correlated with wealth in the US; “Asians are smart” is not a universal truth, but Asian-Americans do have very high educational attainment. In the absence of more reliable information that might be your best option for making good decisions. Of course, this creates a vicious cycle where people in the positive stereotype group are better off and have more incentive to improve their skills than people in the negative stereotype group, thus perpetuating the statistical validity of the stereotype.

But of course that assumes that the stereotypes are statistically valid, and that employers don’t have more reliable information. Yet many stereotypes aren’t even true statistically: If “women are bad drivers”, then why do men cause 75% of traffic fatalities? Furthermore, in most cases employers have more reliable information—resumes with education and employment records. Asian-Americans are indeed more likely to have bachelor’s degrees than Latino Americans, but when it say right on Mr. Lorenzo’s resume that he has a B.A. and on Mr. Suzuki’s resume that he doesn’t, that racial stereotype no longer provides you with any further information. Yet even if the resumes are identical, employers will be more likely to hire a White applicant than a Black applicant, and more likely to hire a male applicant than a female applicant—we have directly tested this in experiments. In an experiment where employers had direct performance figures in front of them, they were still more likely to choose the man when they had the same scores—and sometimes even when the woman had a higher score!

Even our assessments of competence are often biased, probably subconsciously; given the same essay to review, most reviewers find more spelling errors and are more concerned about those errors if they are told that the author is Black. If they thought the author was White, they thought of the errors as “minor mistakes” by a student with “otherwise good potential”; but if they thought the author was Black, they “can’t believe he got into this school in the first place”. These reviewers were reading the same essay. The alleged author’s race was decided randomly. Most if not all of these reviewers were not consciously racist. Subconscious racial biases are all over the place; almost everyone exhibits some subconscious racial bias.

No, discrimination isn’t just rational inference based on valid (if unfortunate and self-reinforcing) statistical trends. There is a significant component of just outright irrational bigotry.

We’re seeing this play out in North Carolina; due to their arbitrary discrimination against lesbian, gay, bisexual and especially transgender people, they are now hemorrhaging jobs as employers pull out, and their federal funding for student loans is now in jeopardy due to the obvious Title IX violation. This is obviously not in the best interest of the people of North Carolina (even the ones who aren’t LGBT!); and it’s all being justified on the grounds of an epidemic of sexual assaults by people pretending to be trans that doesn’t even exist. It turns out that more Republican Senators have been arrested for sexual misconduct in bathrooms than transgender people—and while the number of transgender people in the US is surprisingly hard to measure, it’s clearly a lot larger than the number of Republican Senators!

In fact, discrimination is even more irrational than it may seem, because empirically the benefits of discrimination (such as they are—short-term narrow economic self-interest) fall almost entirely on the rich while the harms fall mainly on the poor, yet poor people are much more likely to be racist! Since income and education are highly correlated, education accounts for some of this effect. This is reason to be hopeful, for as educational attainment has soared, we have found that racism has decreased.

But education doesn’t seem to explain the full effect. One theory to account this is what’s called last-place aversiona highly pernicious heuristic where people are less concerned about their own absolute status than they are about not having the worst status. In economic experiments, people are usually more willing to give money to people worse off than them than to those better off than them—unless giving it to the worse-off would make those people better off than they themselves are. I think we actually need to do further study to see what happens if it would make those other people exactly as well-off as they are, because that turns out to be absolutely critical to whether people would be willing to support a basic income. In other words, do people count “tied for last”? Would they rather play a game where everyone gets $100, or one where they get $50 but everyone else only gets $10?

I would hope that humanity is better than that—that we would want to play the $100 game, which is analogous to a basic income. But when I look at the extreme and persistent inequality that has plagued human society for millennia, I begin to wonder if perhaps there really are a lot of people who think of the world in such zero-sum, purely relative terms, and care more about being better than others than they do about doing well themselves. Perhaps the horrific poverty of Sub-Saharan Africa and Southeast Asia is, for many First World people, not a bug but a feature; we feel richer when we know they are poorer. Scarcity seems to amplify this zero-sum thinking; racism gets worse whenever we have economic downturns. Precisely because discrimination is economically inefficient, this can create a vicious cycle where poverty causes bigotry which worsens poverty.

There is also something deeper going on, something evolutionary; bigotry is part of what I call the tribal paradigm, the core aspect of human psychology that defines identity in terms of in-groups which are good and out-groups which are bad. We will probably never fully escape the tribal paradigm, but this is not a reason to give up hope; we have made substantial progress in reducing bigotry in many places. What seems to happen is that people learn to expand their mental tribe, so that it encompasses larger and larger groups—not just White Americans but all Americans, or not just Americans but all human beings. Peter Singer calls this the Expanding Circle (also the title of his book on it). We may one day be able to make our tribe large enough to encompass all sentient beings in the universe; at that point, it’s just fine if we are only interested in advancing the interests of those in our tribe, because our tribe would include everyone. Yet I don’t think any of us are quite there yet, and some people have a really long way to go.

But with these expanding tribes in mind, perhaps I can leave you with a fact that is as counter-intuitive as it is encouraging, and even easier still to take out of context: Racism was better than what came before it. What I mean by this is not that racism is good—of course it’s terrible—but that in order to be racism, to define the whole world into a small number of “racial groups”, people already had to enormously expand their mental tribe from where it started. When we evolved on the African savannah millions of years ago, our tribe was 150 people; to this day, that’s about the number of people we actually feel close to and interact with on a personal level. We could have stopped there, and for millennia we did. But over time we managed to expand beyond that number, to a village of 1,000, a town of 10,000, a city of 100,000. More recently we attained mental tribes of whole nations, in some case hundreds of millions of people. Racism is about that same scale, if not a bit larger; what most people (rather arbitrarily, and in a way that changes over time) call “White” constitutes about a billion people. “Asian” (including South Asian) is almost four billion. These are astonishingly huge figures, some seven orders of magnitude larger than what we originally evolved to handle. The ability to feel empathy for all “White” people is just a little bit smaller than the ability to feel empathy for all people period. Similarly, while today the gender in “all men are created equal” is jarring to us, the idea at the time really was an incredibly radical broadening of the moral horizon—Half the world? Are you mad?

Therefore I am confident that one day, not too far from now, the world will take that next step, that next order of magnitude, which many of us already have (or try to), and we will at last conquer bigotry, and if not eradicate it entirely then force it completely into the most distant shadows and deny it its power over our society.

What does correlation have to do with causation?

JDN 2457345

I’ve been thinking of expanding the topics of this blog into some basic statistics and econometrics. It has been said that there are “Lies, damn lies, and statistics”; but in fact it’s almost the opposite—there are truths, whole truths, and statistics. Almost everything in the world that we know—not merely guess, or suppose, or intuit, or believe, but actually know, with a quantifiable level of certainty—is done by means of statistics. All sciences are based on them, from physics (when they say the Higgs discovery is a “5-sigma event”, that’s a statistic) to psychology, ecology to economics. Far from being something we cannot trust, they are in a sense the only thing we can trust.

The reason it sometimes feels like we cannot trust statistics is that most people do not understand statistics very well; this creates opportunities for both accidental confusion and willful distortion. My hope is therefore to provide you with some of the basic statistical knowledge you need to combat the worst distortions and correct the worst confusions.

I wasn’t quite sure where to start on this quest, but I suppose I have to start somewhere. I figured I may as well start with an adage about statistics that I hear commonly abused: “Correlation does not imply causation.”

Taken at its original meaning, this is definitely true. Unfortunately, it can be easily abused or misunderstood.

In its original meaning, the formal sense of the word “imply” meaning logical implication, to “imply” something is an extremely strong statement. It means that you logically entail that result, that if the antecedent is true, the consequent must be true, on pain of logical contradiction. Logical implication is for most practical purposes synonymous with mathematical proof. (Unfortunately, it’s not quite synonymous, because of things like Gödel’s incompleteness theorems and Löb’s theorem.)

And indeed, correlation does not logically entail causation; it’s quite possible to have correlations without any causal connection whatsoever, simply by chance. One of my former professors liked to brag that from 1990 to 2010 whether or not she ate breakfast had a statistically significant positive correlation with that day’s closing price for the Dow Jones Industrial Average.

How is this possible? Did my professor actually somehow influence the stock market by eating breakfast? Of course not; if she could do that, she’d be a billionaire by now. And obviously the Dow’s price at 17:00 couldn’t influence whether she ate breakfast at 09:00. Could there be some common cause driving both of them, like the weather? I guess it’s possible; maybe in good weather she gets up earlier and people are in better moods so they buy more stocks. But the most likely reason for this correlation is much simpler than that: She tried a whole bunch of different combinations until she found two things that correlated. At the usual significance level of 0.05, on average you need to try about 20 combinations of totally unrelated things before two of them will show up as correlated. (My guess is she used a number of different stock indexes and varied the starting and ending year. That’s a way to generate a surprisingly large number of degrees of freedom without it seeming like you’re doing anything particularly nefarious.)

But how do we know they aren’t actually causally related? Well, I suppose we don’t. Especially if the universe is ultimately deterministic and nonlocal (as I’ve become increasingly convinced by the results of recent quantum experiments), any two data sets could be causally related somehow. But the point is they don’t have to be; you can pick any randomly-generated datasets, pair them up in 20 different ways, and odds are, one of those ways will show a statistically significant correlation.

All of that is true, and important to understand. Finding a correlation between eating grapefruit and getting breast cancer, or between liking bitter foods and being a psychopath, does not necessarily mean that there is any real causal link between the two. If we can replicate these results in a bunch of other studies, that would suggest that the link is real; but typically, such findings cannot be replicated. There is something deeply wrong with the way science journalists operate; they like to publish the new and exciting findings, which 9 times out of 10 turn out to be completely wrong. They never want to talk about the really important and fascinating things that we know are true because we’ve been confirming them over hundreds of different experiments, because that’s “old news”. The journalistic desire to be new and first fundamentally contradicts the scientific requirement of being replicated and confirmed.

So, yes, it’s quite possible to have a correlation that tells you absolutely nothing about causation.

But this is exceptional. In most cases, correlation actually tells you quite a bit about causation.

And this is why I don’t like the adage; “imply” has a very different meaning in common speech, meaning merely to suggest or evoke. Almost everything you say implies all sorts of things in this broader sense, some more strongly than others, even though it may logically entail none of them.

Correlation does in fact suggest causation. Like any suggestion, it can be overridden. If we know that 20 different combinations were tried until one finally yielded a correlation, we have reason to distrust that correlation. If we find a correlation between A and B but there is no logical way they can be connected, we infer that it is simply an odd coincidence.

But when we encounter any given correlation, there are three other scenarios which are far more likely than mere coincidence: A causes B, B causes A, or some other factor C causes A and B. These are also not mutually exclusive; they can all be true to some extent, and in many cases are.

A great deal of work in science, and particularly in economics, is based upon using correlation to infer causation, and has to be—because there is simply no alternative means of approaching the problem.

Yes, sometimes you can do randomized controlled experiments, and some really important new findings in behavioral economics and development economics have been made this way. Indeed, much of the work that I hope to do over the course of my career is based on randomized controlled experiments, because they truly are the foundation of scientific knowledge. But sometimes, that’s just not an option.

Let’s consider an example: In my master’s thesis I found a strong correlation between the level of corruption in a country (as estimated by the World Bank) and the proportion of that country’s income which goes to the top 0.01% of the population. Countries that have higher levels of corruption also tend to have a larger proportion of income that accrues to the top 0.01%. That correlation is a fact; it’s there. There’s no denying it. But where does it come from? That’s the real question.

Could it be pure coincidence? Well, maybe; but when it keeps showing up in several different models with different variables included, that becomes unlikely. A single p < 0.05 will happen about 1 in 20 times by chance; but five in a row should happen less than 1 in 1 million times (assuming they’re independent, which, to be fair, they usually aren’t).

Could it be some artifact of the measurement methods? It’s possible. In particular, I was concerned about the possibility of Halo Effect, in which people tend to assume that something which is better (or worse) in one way is automatically better (or worse) in other ways as well. People might think of their country as more corrupt simply because it has higher inequality, even if there is no real connection. But it would have taken a very large halo bias to explain this effect.

So, does corruption cause income inequality? It’s not hard to see how that might happen: More corrupt individuals could bribe leaders or exploit loopholes to make themselves extremely rich, and thereby increase inequality.

Does inequality cause corruption? This also makes some sense, since it’s a lot easier to bribe leaders and manipulate regulations when you have a lot of money to work with in the first place.

Does something else cause both corruption and inequality? Also quite plausible. Maybe some general cultural factors are involved, or certain economic policies lead to both corruption and inequality. I did try to control for such things, but I obviously couldn’t include all possible variables.

So, which way does the causation run? Unfortunately, I don’t know. I tried some clever statistical techniques to try to figure this out; in particular, I looked at which tends to come first—the corruption or the inequality—and whether they could be used to predict each other, a method called Granger causality. Those results were inconclusive, however. I could neither verify nor exclude a causal connection in either direction. But is there a causal connection? I think so. It’s too robust to just be coincidence. I simply don’t know whether A causes B, B causes A, or C causes A and B.

Imagine trying to do this same study as a randomized controlled experiment. Are we supposed to create two societies and flip a coin to decide which one we make more corrupt? Or which one we give more income inequality? Perhaps you could do some sort of experiment with a proxy for corruption (cheating on a test or something like that), and then have unequal payoffs in the experiment—but that is very far removed from how corruption actually works in the real world, and worse, it’s prohibitively expensive to make really life-altering income inequality within an experimental context. Sure, we can give one participant $1 and the other $1,000; but we can’t give one participant $10,000 and the other $10 million, and it’s the latter that we’re really talking about when we deal with real-world income inequality. I’m not opposed to doing such an experiment, but it can only tell us so much. At some point you need to actually test the validity of your theory in the real world, and for that we need to use statistical correlations.

Or think about macroeconomics; how exactly are you supposed to test a theory of the business cycle experimentally? I guess theoretically you could subject an entire country to a new monetary policy selected at random, but the consequences of being put into the wrong experimental group would be disastrous. Moreover, nobody is going to accept a random monetary policy democratically, so you’d have to introduce it against the will of the population, by some sort of tyranny or at least technocracy. Even if this is theoretically possible, it’s mind-bogglingly unethical.

Now, you might be thinking: But we do change real-world policies, right? Couldn’t we use those changes as a sort of “experiment”? Yes, absolutely; that’s called a quasi-experiment or a natural experiment. They are tremendously useful. But since they are not truly randomized, they aren’t quite experiments. Ultimately, everything you get out of a quasi-experiment is based on statistical correlations.

Thus, abuse of the adage “Correlation does not imply causation” can lead to ignoring whole subfields of science, because there is no realistic way of running experiments in those subfields. Sometimes, statistics are all we have to work with.

This is why I like to say it a little differently:

Correlation does not prove causation. But correlation definitely can suggest causation.