How do you change a paradigm?

Mar 3 JDN 2458546

I recently attended the Institute for New Economic Thinking (INET) Young Scholars Initiative (YSI) North American Regional Convening (what a mouthful!). I didn’t present, so I couldn’t get funding for a hotel, so I commuted to LA each day. That was miserable; if I ever go again, it will be with funding.

The highlight of the conference was George Akerlof‘s keynote, which I knew would be the case from the start. The swag bag labeled “Rebel Without a Paradigm” was also pretty great (though not as great as the “Totes Bi” totes at the Human Rights Council Time to THRIVE conference).

The rest of the conference was… a bit strange, to be honest. They had a lot of slightly cheesy interactive activities and exhibits; the conference was targeted at grad students, but some of these would have drawn groans from my more jaded undergrads (and “jaded grad student” is a redundancy). The poster session was pathetically small; I think there were literally only three posters. (Had I known in time for the deadline, I could surely have submitted a poster.)

The theme of the conference was challenging the neoclassical paradigm. This was really the only unifying principle. So we had quite an eclectic mix of presenters: There were a few behavioral economists (like Akerlof himself), and some econophysicists and complexity theorists, but mostly the conference was filled with a wide variety of heterodox theorists, ranging all the way from Austrian to Marxist. Also sprinkled in were a few outright cranks, whose ideas were just total nonsense; fortunately these were relatively rare.

And what really struck me about listening to the heterodox theorists was how mainstream it made me feel. I went to a session on development economics, expecting randomized controlled trials of basic income and maybe some political economy game theory, and instead saw several presentations of neo-Marxist postcolonial theory. At the AEA conference I felt like a radical firebrand; at the YSI conference I felt like a holdout of the ancien regime. Is this what it feels like to push the envelope without leaping outside it?

The whole atmosphere of the conference was one of “Why won’t they listen to us!?” and I couldn’t help but feel like I kind of knew why. All this heterodox theory isn’t testable. It isn’t useful. It doesn’t solve the problem. Even if you are entirely correct that Latin America is poor because of colonial and neocolonial exploitation by the West (and I’m fairly certain that you’re not; standard of living under the Mexica wasn’t so great you know), that doesn’t tell me how to feed starving children in Nicaragua.

Indeed, I think it’s notable that the one Nobel Laureate they could find to speak for us was a behavioral economist. Behavioral economics has actually managed to penetrate into the mainstream somewhat. Not enough, not nearly quickly enough, to be sure—but it’s happening. Why is it happening? Because behavioral economics is testable, it’s useful, and it solves problems.

Indeed, behavioral economics is more testable than most neoclassical economics: We run lab experiments while they’re adding yet another friction or shock to the never-ending DSGE quagmire.

And we’ve already managed to solve some real policy problems this way, like Alvin Roth’s kidney matching system and Richard Thaler’s “Save More Tomorrow” program.

The (limited) success of behavioral economics came not because we continued to batter at the gates of the old paradigm demanding to be let in, but because we tied ourselves to the methodology of hard science and gathered irrefutable empirical data. We didn’t get as far as we have by complaining that economics is too much like physics; we actually made it more like physics. Physicists do experiments. They make sharp, testable predictions. They refute their hypotheses. And now, so do we.

That said, Akerlof was right when he pointed out that the insistence upon empirical precision has limited the scope of questions we are able to ask, and kept us from addressing some of the really vital economic problems in the world. And neoclassical theory is too narrow; in particular, the ongoing insistence that behavior must be modeled as perfectly rational and completely selfish is infuriating. That model has clearly failed at this point, and it’s time for something new.

So I do think there is some space for heterodox theory in economics. But there actually seems to be no shortage of heterodox theory; it’s easy to come up with ideas that are different from the mainstream. What we actually need is more ways to constrain theory with empirical evidence. The goal must be to have theory that actually predicts and explains the world better than neoclassical theory does—and that’s a higher bar than you might imagine. Neoclassical theory isn’t an abject failure; in fact, if we’d just followed the standard Keynesian models in the Great Recession, we would have recovered much faster. Most of this neo-Marxist theory struck me as not even wrong: the ideas were flexible enough that almost any observed outcome could be fit into them.

Galileo and Einstein didn’t just come up with new ideas and complain that no one listened to them. They developed detailed, mathematically precise models that could be experimentally tested—and when they were tested, they worked better than the old theory. That is the way to change a paradigm: Replace it with one that you can prove is better.

Why are humans so bad with probability?

Apr 29 JDN 2458238

In previous posts on deviations from expected utility and cumulative prospect theory, I’ve detailed some of the myriad ways in which human beings deviate from optimal rational behavior when it comes to probability.

This post is going to be a bit different: Yes, we behave irrationally when it comes to probability. Why?

Why aren’t we optimal expected utility maximizers?
This question is not as simple as it sounds. Some of the ways that human beings deviate from neoclassical behavior are simply because neoclassical theory requires levels of knowledge and intelligence far beyond what human beings are capable of; basically anything requiring “perfect information” qualifies, as does any game theory prediction that involves solving extensive-form games with infinite strategy spaces by backward induction. (Don’t feel bad if you have no idea what that means; that’s kind of my point. Solving infinite extensive-form games by backward induction is an unsolved problem in game theory; just this past week I saw a new paper presented that offered a partial potential solutionand yet we expect people to do it optimally every time?)

I’m also not going to include questions of fundamental uncertainty, like “Will Apple stock rise or fall tomorrow?” or “Will the US go to war with North Korea in the next ten years?” where it isn’t even clear how we would assign a probability. (Though I will get back to them, for reasons that will become clear.)

No, let’s just look at the absolute simplest cases, where the probabilities are all well-defined and completely transparent: Lotteries and casino games. Why are we so bad at that?

Lotteries are not a computationally complex problem. You figure out how much the prize is worth to you, multiply it by the probability of winning—which is clearly spelled out for you—and compare that to how much the ticket price is worth to you. The most challenging part lies in specifying your marginal utility of wealth—the “how much it’s worth to you” part—but that’s something you basically had to do anyway, to make any kind of trade-offs on how to spend your time and money. Maybe you didn’t need to compute it quite so precisely over that particular range of parameters, but you need at least some idea how much $1 versus $10,000 is worth to you in order to get by in a market economy.

Casino games are a bit more complicated, but not much, and most of the work has been done for you; you can look on the Internet and find tables of probability calculations for poker, blackjack, roulette, craps and more. Memorizing all those probabilities might take some doing, but human memory is astonishingly capacious, and part of being an expert card player, especially in blackjack, seems to involve memorizing a lot of those probabilities.

Furthermore, by any plausible expected utility calculation, lotteries and casino games are a bad deal. Unless you’re an expert poker player or blackjack card-counter, your expected income from playing at a casino is always negative—and the casino set it up that way on purpose.

Why, then, can lotteries and casinos stay in business? Why are we so bad at such a simple problem?

Clearly we are using some sort of heuristic judgment in order to save computing power, and the people who make lotteries and casinos have designed formal models that can exploit those heuristics to pump money from us. (Shame on them, really; I don’t fully understand why this sort of thing is legal.)

In another previous post I proposed what I call “categorical prospect theory”, which I think is a decently accurate description of the heuristics people use when assessing probability (though I’ve not yet had the chance to test it experimentally).

But why use this particular heuristic? Indeed, why use a heuristic at all for such a simple problem?

I think it’s helpful to keep in mind that these simple problems are weird; they are absolutely not the sort of thing a tribe of hunter-gatherers is likely to encounter on the savannah. It doesn’t make sense for our brains to be optimized to solve poker or roulette.

The sort of problems that our ancestors encountered—indeed, the sort of problems that we encounter, most of the time—were not problems of calculable probability risk; they were problems of fundamental uncertainty. And they were frequently matters of life or death (which is why we’d expect them to be highly evolutionarily optimized): “Was that sound a lion, or just the wind?” “Is this mushroom safe to eat?” “Is that meat spoiled?”

In fact, many of the uncertainties most important to our ancestors are still important today: “Will these new strangers be friendly, or dangerous?” “Is that person attracted to me, or am I just projecting my own feelings?” “Can I trust you to keep your promise?” These sorts of social uncertainties are even deeper; it’s not clear that any finite being could ever totally resolve its uncertainty surrounding the behavior of other beings with the same level of intelligence, as the cognitive arms race continues indefinitely. The better I understand you, the better you understand me—and if you’re trying to deceive me, as I get better at detecting deception, you’ll get better at deceiving.

Personally, I think that it was precisely this sort of feedback loop that resulting in human beings getting such ridiculously huge brains in the first place. Chimpanzees are pretty good at dealing with the natural environment, maybe even better than we are; but even young children can outsmart them in social tasks any day. And once you start evolving for social cognition, it’s very hard to stop; basically you need to be constrained by something very fundamental, like, say, maximum caloric intake or the shape of the birth canal. Where chimpanzees look like their brains were what we call an “interior solution”, where evolution optimized toward a particular balance between cost and benefit, human brains look more like a “corner solution”, where the evolutionary pressure was entirely in one direction until we hit up against a hard constraint. That’s exactly what one would expect to happen if we were caught in a cognitive arms race.

What sort of heuristic makes sense for dealing with fundamental uncertainty—as opposed to precisely calculable probability? Well, you don’t want to compute a utility function and multiply by it, because that adds all sorts of extra computation and you have no idea what probability to assign. But you’ve got to do something like that in some sense, because that really is the optimal way to respond.

So here’s a heuristic you might try: Separate events into some broad categories based on how frequently they seem to occur, and what sort of response would be necessary.

Some things, like the sun rising each morning, seem to always happen. So you should act as if those things are going to happen pretty much always, because they do happen… pretty much always.

Other things, like rain, seem to happen frequently but not always. So you should look for signs that those things might happen, and prepare for them when the signs point in that direction.

Still other things, like being attacked by lions, happen very rarely, but are a really big deal when they do. You can’t go around expecting those to happen all the time, that would be crazy; but you need to be vigilant, and if you see any sign that they might be happening, even if you’re pretty sure they’re not, you may need to respond as if they were actually happening, just in case. The cost of a false positive is much lower than the cost of a false negative.

And still other things, like people sprouting wings and flying, never seem to happen. So you should act as if those things are never going to happen, and you don’t have to worry about them.

This heuristic is quite simple to apply once set up: It can simply slot in memories of when things did and didn’t happen in order to decide which category they go in—i.e. availability heuristic. If you can remember a lot of examples of “almost never”, maybe you should move it to “unlikely” instead. If you get a really big number of examples, you might even want to move it all the way to “likely”.

Another large advantage of this heuristic is that by combining utility and probability into one metric—we might call it “importance”, though Bayesian econometricians might complain about that—we can save on memory space and computing power. I don’t need to separately compute a utility and a probability; I just need to figure out how much effort I should put into dealing with this situation. A high probability of a small cost and a low probability of a large cost may be equally worth my time.

How might these heuristics go wrong? Well, if your environment changes sufficiently, the probabilities could shift and what seemed certain no longer is. For most of human history, “people walking on the Moon” would seem about as plausible as sprouting wings and flying away, and yet it has happened. Being attacked by lions is now exceedingly rare except in very specific places, but we still harbor a certain awe and fear before lions. And of course availability heuristic can be greatly distorted by mass media, which makes people feel like terrorist attacks and nuclear meltdowns are common and deaths by car accidents and influenza are rare—when exactly the opposite is true.

How many categories should you set, and what frequencies should they be associated with? This part I’m still struggling with, and it’s an important piece of the puzzle I will need before I can take this theory to experiment. There is probably a trade-off between more categories giving you more precision in tailoring your optimal behavior, but costing more cognitive resources to maintain. Is the optimal number 3? 4? 7? 10? I really don’t know. Even I could specify the number of categories, I’d still need to figure out precisely what categories to assign.

Reasonableness and public goods games

Apr 1 JDN 2458210

There’s a very common economics experiment called a public goods game, often used to study cooperation and altruistic behavior. I’m actually planning on running a variant of such an experiment for my second-year paper.

The game is quite simple, which is part of why it is used so frequently: You are placed into a group of people (usually about four), and given a little bit of money (say $10). Then you are offered a choice: You can keep the money, or you can donate some of it to a group fund. Money in the group fund will be multiplied by some factor (usually about two) and then redistributed evenly to everyone in the group. So for example if you donate $5, that will become $10, split four ways, so you’ll get back $2.50.

Donating more to the group will benefit everyone else, but at a cost to yourself. The game is usually set up so that the best outcome for everyone is if everyone donates the maximum amount, but the best outcome for you, holding everyone else’s choices constant, is to donate nothing and keep it all.

Yet it is a very robust finding that most people do neither of those things. There’s still a good deal of uncertainty surrounding what motivates people to donate what they do, but certain patterns that have emerged:

  1. Most people donate something, but hardly anyone donates everything.
  2. Increasing the multiplier tends to smoothly increase how much people donate.
  3. The number of people in the group isn’t very important, though very small groups (e.g. 2) behave differently from very large groups (e.g. 50).
  4. Letting people talk to each other tends to increase the rate of donations.
  5. Repetition of the game, or experience from previous games, tends to result in decreasing donation over time.
  6. Economists donate less than other people.

Number 6 is unfortunate, but easy to explain: Indoctrination into game theory and neoclassical economics has taught economists that selfish behavior is efficient and optimal, so they behave selfishly.

Number 3 is also fairly easy to explain: Very small groups allow opportunities for punishment and coordination that don’t exist in large groups. Think about how you would respond when faced with 2 defectors in a group of 4 as opposed to 10 defectors in a group of 50. You could punish the 2 by giving less next round; but punishing the 10 would end up punishing 40 others who had contributed like they were supposed to.

Number 4 is a very interesting finding. Game theory says that communication shouldn’t matter, because there is a unique Nash equilibrium: Donate nothing. All the promises in the world can’t change what is the optimal response in the game. But in fact, human beings don’t like to break their promises, and so when you get a bunch of people together and they all agree to donate, most of them will carry through on that agreement most of the time.

Number 5 is on the frontier of research right now. There are various theoretical accounts for why it might occur, but none of the models proposed so far have much predictive power.

But my focus today will be on findings 1 and 2.

If you’re not familiar with the underlying game theory, finding 2 may seem obvious to you: Well, of course if you increase the payoff for donating, people will donate more! It’s precisely that sense of obviousness which I am going to appeal to in a moment.

In fact, the game theory makes a very sharp prediction: For N players, if the multiplier is less than N, you should always contribute nothing. Only if the multiplier becomes larger than N should you donate—and at that point you should donate everything. The game theory prediction is not a smooth increase; it’s all-or-nothing. The only time game theory predicts intermediate amounts is on the knife-edge at exactly equal to N, where each player would be indifferent between donating and not donating.

But it feels reasonable that increasing the multiplier should increase donation, doesn’t it? It’s a “safer bet” in some sense to donate $1 if the payoff to everyone is $3 and the payoff to yourself is $0.75 than if the payoff to everyone is $1.04 and the payoff to yourself is $0.26. The cost-benefit analysis comes out better: In the former case, you can gain up to $2 if everyone donates, but would only lose $0.25 if you donate alone; but in the latter case, you would only gain $0.04 if everyone donates, and would lose $0.74 if you donate alone.

I think this notion of “reasonableness” is a deep principle that underlies a great deal of human thought. This is something that is sorely lacking from artificial intelligence: The same AI that tells you the precise width of the English Channel to the nearest foot may also tell you that the Earth is 14 feet in diameter, because the former was in its database and the latter wasn’t. Yes, WATSON may have won on Jeopardy, but it (he?) also made a nonsensical response to the Final Jeopardy question.

Human beings like to “sanity-check” our results against prior knowledge, making sure that everything fits together. And, of particular note for public goods games, human beings like to “hedge our bets”; we don’t like to over-commit to a single belief in the face of uncertainty.

I think this is what best explains findings 1 and 2. We don’t donate everything, because that requires committing totally to the belief that contributing is always better. We also don’t donate nothing, because that requires committing totally to the belief that contributing is always worse.

And of course we donate more as the payoffs to donating more increase; that also just seems reasonable. If something is better, you do more of it!

These choices could be modeled formally by assigning some sort of probability distribution over other’s choices, but in a rather unconventional way. We can’t simply assume that other people will randomly choose some decision and then optimize accordingly—that just gives you back the game theory prediction. We have to assume that our behavior and the behavior of others is in some sense correlated; if we decide to donate, we reason that others are more likely to donate as well.

Stated like that, this sounds irrational; some economists have taken to calling it “magical thinking”. Yet, as I always like to point out to such economists: On average, people who do that make more money in the games. Economists playing other economists always make very little money in these games, because they turn on each other immediately. So who is “irrational” now?

Indeed, if you ask people to predict how others will behave in these games, they generally do better than the game theory prediction: They say, correctly, that some people will give nothing, most will give something, and hardly any will give everything. The same “reasonableness” that they use to motivate their own decisions, they also accurately apply to forecasting the decisions of others.

Of course, to say that something is “reasonable” may be ultimately to say that it conforms to our heuristics well. To really have a theory, I need to specify exactly what those heuristics are.

“Don’t put all your eggs in one basket” seems to be one, but it’s probably not the only one that matters; my guess is that there are circumstances in which people would actually choose all-or-nothing, like if we said that the multiplier was 0.5 (so everyone giving to the group would make everyone worse off) or 10 (so that giving to the group makes you and everyone else way better off).

“Higher payoffs are better” is probably one as well, but precisely formulating that is actually surprisingly difficult. Higher payoffs for you? For the group? Conditional on what? Do you hold others’ behavior constant, or assume it is somehow affected by your own choices?

And of course, the theory wouldn’t be much good if it only worked on public goods games (though even that would be a substantial advance at this point). We want a theory that explains a broad class of human behavior; we can start with simple economics experiments, but ultimately we want to extend it to real-world choices.

“DSGE or GTFO”: Macroeconomics took a wrong turn somewhere

Dec 31, JDN 2458119

The state of macro is good,” wrote Oliver Blanchard—in August 2008. This is rather like the turkey who is so pleased with how the farmer has been feeding him lately, the day before Thanksgiving.

It’s not easy to say exactly where macroeconomics went wrong, but I think Paul Romer is right when he makes the analogy between DSGE (dynamic stochastic general equilbrium) models and string theory. They are mathematically complex and difficult to understand, and people can make their careers by being the only ones who grasp them; therefore they must be right! Nevermind if they have no empirical support whatsoever.

To be fair, DSGE models are at least a little better than string theory; they can at least be fit to real-world data, which is better than string theory can say. But being fit to data and actually predicting data are fundamentally different things, and DSGE models typically forecast no better than far simpler models without their bold assumptions. You don’t need to assume all this stuff about a “representative agent” maximizing a well-defined utility function, or an Euler equation (that doesn’t even fit the data), or this ever-proliferating list of “random shocks” that end up taking up all the degrees of freedom your model was supposed to explain. Just regressing the variables on a few years of previous values of each other (a “vector autoregression” or VAR) generally gives you an equally-good forecast. The fact that these models can be made to fit the data well if you add enough degrees of freedom doesn’t actually make them good models. As Von Neumann warned us, with enough free parameters, you can fit an elephant.

But really what bothers me is not the DSGE but the GTFO (“get the [expletive] out”); it’s not that DSGE models are used, but that it’s almost impossible to get published as a macroeconomic theorist using anything else. Defenders of DSGE typically don’t even argue anymore that it is good; they argue that there are no credible alternatives. They characterize their opponents as “dilettantes” who aren’t opposing DSGE because we disagree with it; no, it must be because we don’t understand it. (Also, regarding that post, I’d just like to note that I now officially satisfy the Athreya Axiom of Absolute Arrogance: I have passed my qualifying exams in a top-50 economics PhD program. Yet my enmity toward DSGE has, if anything, only intensified.)

Of course, that argument only makes sense if you haven’t been actively suppressing all attempts to formulate an alternative, which is precisely what DSGE macroeconomists have been doing for the last two or three decades. And yet despite this suppression, there are alternatives emerging, particularly from the empirical side. There are now empirical approaches to macroeconomics that don’t use DSGE models. Regression discontinuity methods and other “natural experiment” designs—not to mention actual experiments—are quickly rising in popularity as economists realize that these methods allow us to actually empirically test our models instead of just adding more and more mathematical complexity to them.

But there still seems to be a lingering attitude that there is no other way to do macro theory. This is very frustrating for me personally, because deep down I think what I would like to do as a career is macro theory: By temperament I have always viewed the world through a very abstract, theoretical lens, and the issues I care most about—particularly inequality, development, and unemployment—are all fundamentally “macro” issues. I left physics when I realized I would be expected to do string theory. I don’t want to leave economics now that I’m expected to do DSGE. But I also definitely don’t want to do DSGE.

Fortunately with economics I have a backup plan: I can always be an “applied micreconomist” (rather the opposite of a theoretical macroeconomist I suppose), directly attached to the data in the form of empirical analyses or even direct, randomized controlled experiments. And there certainly is plenty of work to be done along the lines of Akerlof and Roth and Shiller and Kahneman and Thaler in cognitive and behavioral economics, which is also generally considered applied micro. I was never going to be an experimental physicist, but I can be an experimental economist. And I do get to use at least some theory: In particular, there’s an awful lot of game theory in experimental economics these days. Some of the most exciting stuff is actually in showing how human beings don’t behave the way classical game theory predicts (particularly in the Ultimatum Game and the Prisoner’s Dilemma), and trying to extend game theory into something that would fit our actual behavior. Cognitive science suggests that the result is going to end up looking quite different from game theory as we know it, and with my cognitive science background I may be particularly well-positioned to lead that charge.

Still, I don’t think I’ll be entirely satisfied if I can’t somehow bring my career back around to macroeconomic issues, and particularly the great elephant in the room of all economics, which is inequality. Underlying everything from Marxism to Trumpism, from the surging rents in Silicon Valley and the crushing poverty of Burkina Faso, to the Great Recession itself, is inequality. It is, in my view, the central question of economics: Who gets what, and why?

That is a fundamentally macro question, but you can’t even talk about that issue in DSGE as we know it; a “representative agent” inherently smooths over all inequality in the economy as though total GDP were all that mattered. A fundamentally new approach to macroeconomics is needed. Hopefully I can be part of that, but from my current position I don’t feel much empowered to fight this status quo. Maybe I need to spend at least a few more years doing something else, making a name for myself, and then I’ll be able to come back to this fight with a stronger position.

In the meantime, I guess there’s plenty of work to be done on cognitive biases and deviations from game theory.

Games as economic simulations—and education tools

Mar 5, JDN 2457818 [Sun]

Moore’s Law is a truly astonishing phenomenon. Now as we are well into the 21st century (I’ve lived more of my life in the 21st century than the 20th now!) it may finally be slowing down a little bit, but it has had quite a run, and even this could be a temporary slowdown due to economic conditions or the lull before a new paradigm (quantum computing?) matures. Since at least 1975, the computing power of an individual processor has doubled approximately every year and a half; that means it has doubled over 25 times—or in other words that it has increased by a factor of over 30 million. I now have in my pocket a smartphone with several thousand times the processing speed of the guidance computer of the Saturn V that landed on the Moon.

This meteoric increase in computing power has had an enormous impact on the way science is done, including economics. Simple theoretical models that could be solved by hand are now being replaced by enormous simulation models that have to be processed by computers. It is now commonplace to devise models with systems of dozens of nonlinear equations that are literally impossible to solve analytically, and just solve them iteratively with computer software.

But one application of this technology that I believe is currently underutilized is video games.

As a culture, we still have the impression that video games are for children; even games like Dragon Age and Grand Theft Auto that are explicitly for adults (and really quite inappropriate for children!) are viewed as in some sense “childish”—that no serious adult would be involved with such frivolities. The same cultural critics who treat Shakespeare’s vagina jokes as the highest form of art are liable to dismiss the poignant critique of war in Call of Duty: Black Ops or the reflections on cultural diversity in Skyrim as mere puerility.

But video games are an art form with a fundamentally greater potential than any other. Now that graphics are almost photorealistic, there is really nothing you can do in a play or a film that you can’t do in a video game—and there is so, so much more that you can only do in a game.
In what other medium can we witness the spontaneous emergence and costly aftermath of a war? Yet EVE Online has this sort of event every year or so—just today there was a surprise attack involving hundreds of players that destroyed thousands of hours’—and dollars’—worth of starships, something that has more or less become an annual tradition. A few years ago there was a massive three-faction war that destroyed over $300,000 in ships and has now been commemorated as “the Bloodbath of B-R5RB”.
Indeed, the immersion and interactivity of games present an opportunity to do nothing less than experimental macroeconomics. For generations it has been impossible, or at least absurdly unethical, to ever experimentally manipulate an entire macroeconomy. But in a video game like EVE Online or Second Life, we can now do so easily, cheaply, and with little or no long-term harm to the participants—and we can literally control everything in the experiment. Forget the natural resource constraints and currency exchange rates—we can change the laws of physics if we want. (Indeed, EVE‘s whole trade network is built around FTL jump points, and in Second Life it’s a basic part of the interface that everyone can fly like Superman.)

This provides untold potential for economic research. With sufficient funding, we could build a game that would allow us to directly test hypotheses about the most fundamental questions of economics: How do governments emerge and maintain security? How is the rule of law sustained, and when can it be broken? What controls the value of money and the rate of inflation? What is the fundamental cause of unemployment, and how can it be corrected? What influences the rate of technological development? How can we maximize the rate of economic growth? What effect does redistribution of wealth have on employment and output? I envision a future where we can directly simulate these questions with thousands of eager participants, varying the subtlest of parameters and carrying out events over any timescale we like from seconds to centuries.

Nor is the potential of games in economics limited to research; it also has enormous untapped potential in education. I’ve already seen in my classes how tabletop-style games with poker chips can teach a concept better in a few minutes than hours of writing algebra derivations on the board; but custom-built video games could be made that would teach economics far better still, and to a much wider audience. In a well-designed game, people could really feel the effects of free trade or protectionism, not just on themselves as individuals but on entire nations that they control—watch their GDP numbers go down as they scramble to produce in autarky what they could have bought for half the price if not for the tariffs. They could see, in real time, how in the absence of environmental regulations and Pigovian taxes the actions of millions of individuals could despoil our planet for everyone.

Of course, games are fundamentally works of fiction, subject to the Fictional Evidence Fallacy and only as reliable as their authors make them. But so it is with all forms of art. I have no illusions about the fact that we will never get the majority of the population to regularly read peer-reviewed empirical papers. But perhaps if we are clever enough in the games we offer them to play, we can still convey some of the knowledge that those papers contain. We could also update and expand the games as new information comes in. Instead of complaining that our students are spending time playing games on their phones and tablets, we could actually make education into games that are as interesting and entertaining as the ones they would have been playing. We could work with the technology instead of against it. And in a world where more people have access to a smartphone than to a toilet, we could finally bring high-quality education to the underdeveloped world quickly and cheaply.

Rapid growth in computing power has given us a gift of great potential. But soon our capacity will widen even further. Even if Moore’s Law slows down, computing power will continue to increase for awhile yet. Soon enough, virtual reality will finally take off and we’ll have even greater depth of immersion available. The future is bright—if we can avoid this corporatist cyberpunk dystopia we seem to be hurtling toward, of course.

Experimentally testing categorical prospect theory

Dec 4, JDN 2457727

In last week’s post I presented a new theory of probability judgments, which doesn’t rely upon people performing complicated math even subconsciously. Instead, I hypothesize that people try to assign categories to their subjective probabilities, and throw away all the information that wasn’t used to assign that category.

The way to most clearly distinguish this from cumulative prospect theory is to show discontinuity. Kahneman’s smooth, continuous function places fairly strong bounds on just how much a shift from 0% to 0.000001% can really affect your behavior. In particular, if you want to explain the fact that people do seem to behave differently around 10% compared to 1% probabilities, you can’t allow the slope of the smooth function to get much higher than 10 at any point, even near 0 and 1. (It does depend on the precise form of the function, but the more complicated you make it, the more free parameters you add to the model. In the most parsimonious form, which is a cubic polynomial, the maximum slope is actually much smaller than this—only 2.)

If that’s the case, then switching from 0.% to 0.0001% should have no more effect in reality than a switch from 0% to 0.00001% would to a rational expected utility optimizer. But in fact I think I can set up scenarios where it would have a larger effect than a switch from 0.001% to 0.01%.

Indeed, these games are already quite profitable for the majority of US states, and they are called lotteries.

Rationally, it should make very little difference to you whether your odds of winning the Powerball are 0 (you bought no ticket) or 0.000000001% (you bought a ticket), even when the prize is $100 million. This is because your utility of $100 million is nowhere near 100 million times as large as your marginal utility of $1. A good guess would be that your lifetime income is about $2 million, your utility is logarithmic, the units of utility are hectoQALY, and the baseline level is about 100,000.

I apologize for the extremely large number of decimals, but I had to do that in order to show any difference at all. I have bolded where the decimals first deviate from the baseline.

Your utility if you don’t have a ticket is ln(20) = 2.9957322736 hQALY.

Your utility if you have a ticket is (1-10^-9) ln(20) + 10^-9 ln(1020) = 2.9957322775 hQALY.

You gain a whopping 40 microQALY over your whole lifetime. I highly doubt you could even perceive such a difference.

And yet, people are willing to pay nontrivial sums for the chance to play such lotteries. Powerball tickets sell for about $2 each, and some people buy tickets every week. If you do that and live to be 80, you will spend some $8,000 on lottery tickets during your lifetime, which results in this expected utility: (1-4*10^-6) ln(20-0.08) + 4*10^-6 ln(1020) = 2.9917399955 hQALY.
You have now sacrificed 0.004 hectoQALY, which is to say 0.4 QALY—that’s months of happiness you’ve given up to play this stupid pointless game.

Which shouldn’t be surprising, as (with 99.9996% probability) you have given up four months of your lifetime income with nothing to show for it. Lifetime income of $2 million / lifespan of 80 years = $25,000 per year; $8,000 / $25,000 = 0.32. You’ve actually sacrificed slightly more than this, which comes from your risk aversion.

Why would anyone do such a thing? Because while the difference between 0 and 10^-9 may be trivial, the difference between “impossible” and “almost impossible” feels enormous. “You can’t win if you don’t play!” they say, but they might as well say “You can’t win if you do play either.” Indeed, the probability of winning without playing isn’t zero; you could find a winning ticket lying on the ground, or win due to an error that is then upheld in court, or be given the winnings bequeathed by a dying family member or gifted by an anonymous donor. These are of course vanishingly unlikely—but so was winning in the first place. You’re talking about the difference between 10^-9 and 10^-12, which in proportional terms sounds like a lot—but in absolute terms is nothing. If you drive to a drug store every week to buy a ticket, you are more likely to die in a car accident on the way to the drug store than you are to win the lottery.

Of course, these are not experimental conditions. So I need to devise a similar game, with smaller stakes but still large enough for people’s brains to care about the “almost impossible” category; maybe thousands? It’s not uncommon for an economics experiment to cost thousands, it’s just usually paid out to many people instead of randomly to one person or nobody. Conducting the experiment in an underdeveloped country like India would also effectively amplify the amounts paid, but at the fixed cost of transporting the research team to India.

But I think in general terms the experiment could look something like this. You are given $20 for participating in the experiment (we treat it as already given to you, to maximize your loss aversion and endowment effect and thereby give us more bang for our buck). You then have a chance to play a game, where you pay $X to get a P probability of $Y*X, and we vary these numbers.

The actual participants wouldn’t see the variables, just the numbers and possibly the rules: “You can pay $2 for a 1% chance of winning $200. You can also play multiple times if you wish.” “You can pay $10 for a 5% chance of winning $250. You can only play once or not at all.”

So I think the first step is to find some dilemmas, cases where people feel ambivalent, and different people differ in their choices. That’s a good role for a pilot study.

Then we take these dilemmas and start varying their probabilities slightly.

In particular, we try to vary them at the edge of where people have mental categories. If subjective probability is continuous, a slight change in actual probability should never result in a large change in behavior, and furthermore the effect of a change shouldn’t vary too much depending on where the change starts.

But if subjective probability is categorical, these categories should have edges. Then, when I present you with two dilemmas that are on opposite sides of one of the edges, your behavior should radically shift; while if I change it in a different way, I can make a large change without changing the result.

Based solely on my own intuition, I guessed that the categories roughly follow this pattern:

Impossible: 0%

Almost impossible: 0.1%

Very unlikely: 1%

Unlikely: 10%

Fairly unlikely: 20%

Roughly even odds: 50%

Fairly likely: 80%

Likely: 90%

Very likely: 99%

Almost certain: 99.9%

Certain: 100%

So for example, if I switch from 0%% to 0.01%, it should have a very large effect, because I’ve moved you out of your “impossible” category (indeed, I think the “impossible” category is almost completely sharp; literally anything above zero seems to be enough for most people, even 10^-9 or 10^-10). But if I move from 1% to 2%, it should have a small effect, because I’m still well within the “very unlikely” category. Yet the latter change is literally one hundred times larger than the former. It is possible to define continuous functions that would behave this way to an arbitrary level of approximation—but they get a lot less parsimonious very fast.

Now, immediately I run into a problem, because I’m not even sure those are my categories, much less that they are everyone else’s. If I knew precisely which categories to look for, I could tell whether or not I had found it. But the process of both finding the categories and determining if their edges are truly sharp is much more complicated, and requires a lot more statistical degrees of freedom to get beyond the noise.

One thing I’m considering is assigning these values as a prior, and then conducting a series of experiments which would adjust that prior. In effect I would be using optimal Bayesian probability reasoning to show that human beings do not use optimal Bayesian probability reasoning. Still, I think that actually pinning down the categories would require a large number of participants or a long series of experiments (in frequentist statistics this distinction is vital; in Bayesian statistics it is basically irrelevant—one of the simplest reasons to be Bayesian is that it no longer bothers you whether someone did 2 experiments of 100 people or 1 experiment of 200 people, provided they were the same experiment of course). And of course there’s always the possibility that my theory is totally off-base, and I find nothing; a dissertation replicating cumulative prospect theory is a lot less exciting (and, sadly, less publishable) than one refuting it.

Still, I think something like this is worth exploring. I highly doubt that people are doing very much math when they make most probabilistic judgments, and using categories would provide a very good way for people to make judgments usefully with no math at all.

What is the price of time?

JDN 2457562

If they were asked outright, “What is the price of time?” most people would find that it sounds nonsensical, like I’ve asked you “What is the diameter of calculus?” or “What is the electric charge of justice?” (It’s interesting that we generally try to assign meaning to such nonsensical questions, and they often seem strangely profound when we do; a good deal of what passes for “profound wisdom” is really better explained as this sort of reaction to nonsense. Deepak Chopra, for instance.)

But there is actually a quite sensible economic meaning of this question, and answering it turns out to have many important implications for how we should run our countries and how we should live our lives.

What we are really asking for is temporal discounting; we want to know how much more money today is worth compared to tomorrow, and how much more money tomorrow is worth compared to two days from now.

If you say that they are exactly the same, your discount rate (your “price of time”) is zero; if that is indeed how you feel, may I please borrow your entire net wealth at 0% interest for the next thirty years? If you like we can even inflation-index the interest rate so it always produces a real interest rate of zero, thus protecting you from potential inflation risk.
What? You don’t like my deal? You say you need that money sooner? Then your discount rate is not zero. Similarly, it can’t be negative; if you actually valued money tomorrow more than money today, you’d gladly give me my loan.

Money today is worth more to you than money tomorrow—the only question is how much more.

There’s a very simple theorem which says that as long as your temporal discounting doesn’t change over time, so it is dynamically consistent, it must have a very specific form. I don’t normally use math this advanced in my blog, but this one is so elegant I couldn’t resist. I’ll encase it in blockquotes so you can skim over it if you must.

The value of $1 today relative to… today is of course 1; f(0) = 1.

If you are dynamically consistent, at any time t you should discount tomorrow relative to today the same as you discounted today relative to yesterday, so for all t, f(t+1)/f(t) = f(t)/f(t-1)
Thus, f(t+1)/f(t) is independent of t, and therefore equal to some constant, which we can call r:

f(t+1)/f(t) = r, which implies f(t+1) = r f(t).

Starting at f(0) = 1, we have:

f(0) = 1, f(1) = r, f(2) = r^2

We can prove that this pattern continues to hold by mathematical induction.

Suppose the following is true for some integer k; we already know it works for k = 1:

f(k) = r^k

Let t = k:

f(k+1) = r f(k)

Therefore:

f(k+1) = r^(k+1)

Which by induction proves that for all integers n:

f(n) = r^n

The name of the variable doesn’t matter. Therefore:

f(t) = r^t

Whether you agree with me that this is beautiful, or you have no idea what I just said, the take-away is the same: If your discount rate is consistent over time, it must be exponential. There must be some constant number 0 < r < 1 such that each successive time period is worth r times as much as the previous. (You can also generalize this to the case of continuous time, where instead of r^t you get e^(-r t). This requires even more advanced math, so I’ll spare you.)

Most neoclassical economists would stop right there. But there are two very big problems with this argument:

(1) It doesn’t tell us the value r should actually be, only that it should be a constant.

(2) No actual human being thinks of time this way.

There is still ongoing research as to exactly how real human beings discount time, but one thing is quite clear from the experiments: It certainly isn’t exponential.

From about 2000 to 2010, the consensus among cognitive economists was that humans discount time hyperbolically; that is, our discount function looks like this:

f(t) = 1/(1 + r t)

In the 1990s there were a couple of experiments supporting hyperbolic discounting. There is even some theoretical work trying to show that this is actually optimal, given a certain kind of uncertainty about the future, and the argument for exponential discounting relies upon certainty we don’t actually have. Hyperbolic discounting could also result if we were reasoning as though we are given a simple interest rate, rather than a compound interest rate.

But even that doesn’t really seem like humans think, now does it? It’s already weird enough for someone to say “Should I take out this loan at 5%? Well, my discount rate is 7%, so yes.” But I can at least imagine that happening when people are comparing two different interest rates (“Should I pay down my student loans, or my credit cards?”). But I can’t imagine anyone thinking, “Should I take out this loan at 5% APR which I’d need to repay after 5 years? Well, let’s check my discount function, 1/(1+0.05 (5)) = 0.8, multiplied by 1.05^5 = 1.28, the product of which is 1.02, greater than 1, so no, I shouldn’t.” That isn’t how human brains function.

Moreover, recent experiments have shown that people often don’t seem to behave according to what hyperbolic discounting would predict.

Therefore I am very much in the other camp of cognitive economists, who say that we don’t have a well-defined discount function. It’s not exponential, it’s not hyperbolic, it’s not “quasi-hyperbolic” (yes that is a thing); we just don’t have one. We reason about time by simple heuristics. You can’t make a coherent function out of it because human beings… don’t always reason coherently.

Some economists seem to have an incredible amount of trouble accepting that; here we have one from the University of Chicago arguing that hyperbolic discounting can’t possibly exist, because then people could be Dutch-booked out of all their money; but this amounts to saying that human behavior cannot ever be irrational, lest all our money magically disappear. Yes, we know hyperbolic discounting (and heuristics) allow for Dutch-booking; that’s why they’re irrational. If you really want to know the formal assumption this paper makes that is wrong, it assumes that we have complete markets—and yes, complete markets essentially force you to be perfectly rational or die, because the slightest inconsistency in your reasoning results in someone convincing you to bet all your money on a sure loss. Why was it that we wanted complete markets, again? (Oh, yes, the fanciful Arrow-Debreu model, the magical fairy land where everyone is perfectly rational and all markets are complete and we all have perfect information and the same amount of wealth and skills and the same preferences, where everything automatically achieves a perfect equilibrium.)

There was a very good experiment on this, showing that rather than discount hyperbolically, behavior is better explained by a heuristic that people judge which of two options is better by a weighted sum of the absolute distance in time plus the relative distance in time. Now that sounds like something human beings might actually do. “$100 today or $110 tomorrow? That’s only 1 day away, but it’s also twice as long. I’m not waiting.” “$100 next year, or $110 in a year and a day? It’s only 1 day apart, and it’s only slightly longer, so I’ll wait.”

That might not actually be the precise heuristic we use, but it at least seems like one that people could use.

John Duffy, whom I hope to work with at UCI starting this fall, has been working on another experiment to test a different heuristic, based on the work of Daniel Kahneman, saying essentially that we have a fast, impulsive, System 1 reasoning layer and a slow, deliberative, System 2 reasoning layer; the result is that our judgments combine both “hand to mouth” where our System 1 essentially tries to get everything immediately and spend whatever we can get our hands on, and a more rational assessment by System 2 that might actually resemble an exponential discount rate. In the 5-minute judgment, System 1’s voice is overwhelming; but if we’re already planning a year out, System 1 doesn’t even care anymore and System 2 can take over. This model also has the nice feature of explaining why people with better self-control seem to behave more like they use exponential discounting,[PDF link] and why people do on occasion reason more or less exponentially, while I have literally never heard anyone try to reason hyperbolically, only economic theorists trying to use hyperbolic models to explain behavior.

Another theory is that discounting is “subadditive”, that is, if you break up a long time interval into many short intervals, people will discount it more, because it feels longer that way. Imagine a century. Now imagine a year, another year, another year, all the way up to 100 years. Now imagine a day, another day, another day, all the way up to 365 days for the first year, and then 365 days for the second year, and that on and on up to 100 years. It feels longer, doesn’t it? It is of course exactly the same. This can account for some weird anomalies in choice behavior, but I’m not convinced it’s as good as the two-system model.

Another theory is that we simply have a “present bias”, which we treat as a sort of fixed cost that we incur regardless of what the payments are. I like this because it is so supremely simple, but there’s something very fishy about it, because in this experiment it was just fixed at $4, and that can’t be right. It must be fixed at some proportion of the rewards, or something like that; or else we would always exhibit near-perfect exponential discounting for large amounts of money, which is more expensive to test (quite directly), but still seems rather unlikely.

Why is this important? This post is getting long, so I’ll save it for future posts, but in short, the ways that we value future costs and benefits, both as we actually do, and as we ought to, have far-reaching implications for everything from inflation to saving to environmental sustainability.

Why it matters that torture is ineffective

JDN 2457531

Like “longest-ever-serving Speaker of the House sexually abuses teenagers” and “NSA spy program is trying to monitor the entire telephone and email system”, the news that the US government systematically tortures suspects is an egregious violation that goes to the highest levels of our government—that for some reason most Americans don’t particularly seem to care about.

The good news is that President Obama signed an executive order in 2009 banning torture domestically, reversing official policy under the Bush Administration, and then better yet in 2014 expanded the order to apply to all US interests worldwide. If this is properly enforced, perhaps our history of hypocrisy will finally be at its end. (Well, not if Trump wins…)

Yet as often seems to happen, there are two extremes in this debate and I think they’re both wrong.
The really disturbing side is “Torture works and we have to use it!” The preferred mode of argumentation for this is the “ticking time bomb scenario”, in which we have some urgent disaster to prevent (such as a nuclear bomb about to go off) and torture is the only way to stop it from happening. Surely then torture is justified? This argument may sound plausible, but as I’ll get to below, this is a lot like saying, “If aliens were attacking from outer space trying to wipe out humanity, nuclear bombs would probably be justified against them; therefore nuclear bombs are always justified and we can use them whenever we want.” If you can’t wait for my explanation, The Atlantic skewers the argument nicely.

Yet the opponents of torture have brought this sort of argument on themselves, by staking out a position so extreme as “It doesn’t matter if torture works! It’s wrong, wrong, wrong!” This kind of simplistic deontological reasoning is very appealing and intuitive to humans, because it casts the world into simple black-and-white categories. To show that this is not a strawman, here are several different people all making this same basic argument, that since torture is illegal and wrong it doesn’t matter if it works and there should be no further debate.

But the truth is, if it really were true that the only way to stop a nuclear bomb from leveling Los Angeles was to torture someone, it would be entirely justified—indeed obligatory—to torture that suspect and stop that nuclear bomb.

The problem with that argument is not just that this is not our usual scenario (though it certainly isn’t); it goes much deeper than that:

That scenario makes no sense. It wouldn’t happen.

To use the example the late Antonin Scalia used from an episode of 24 (perhaps the most egregious Fictional Evidence Fallacy ever committed), if there ever is a nuclear bomb planted in Los Angeles, that would literally be one of the worst things that ever happened in the history of the human race—literally a Holocaust in the blink of an eye. We should be prepared to cause extreme suffering and death in order to prevent it. But not only is that event (fortunately) very unlikely, torture would not help us.

Why? Because torture just doesn’t work that well.

It would be too strong to say that it doesn’t work at all; it’s possible that it could produce some valuable intelligence—though clear examples of such results are amazingly hard to come by. There are some social scientists who have found empirical results showing some effectiveness of torture, however. We can’t say with any certainty that it is completely useless. (For obvious reasons, a randomized controlled experiment in torture is wildly unethical, so none have ever been attempted.) But to justify torture it isn’t enough that it could work sometimes; it has to work vastly better than any other method we have.

And our empirical data is in fact reliable enough to show that that is not the case. Torture often produces unreliable information, as we would expect from the game theory involved—your incentive is to stop the pain, not provide accurate intel; the psychological trauma that torture causes actually distorts memory and reasoning; and as a matter of fact basically all the useful intelligence obtained in the War on Terror was obtained through humane interrogation methods. As interrogation experts agree, torture just isn’t that effective.

In principle, there are four basic cases to consider:

1. Torture is vastly more effective than the best humane interrogation methods.

2. Torture is slightly more effective than the best humane interrogation methods.

3. Torture is as effective as the best humane interrogation methods.

4. Torture is less effective than the best humane interrogation methods.

The evidence points most strongly to case 4, which would mean that torture is a no-brainer; if it doesn’t even work as well as other methods, it’s absurd to use it. You’re basically kicking puppies at that point—purely sadistic violence that accomplishes nothing. But the data isn’t clear enough for us to rule out case 3 or even case 2. There is only one case we can strictly rule out, and that is case 1.

But it was only in case 1 that torture could ever be justified!

If you’re trying to justify doing something intrinsically horrible, it’s not enough that it has some slight benefit.

People seem to have this bizarre notion that we have only two choices in morality:

Either we are strict deontologists, and wrong actions can never be justified by good outcomes ever, in which case apparently vaccines are morally wrong, because stabbing children with needles is wrong. Tto be fair, some people seem to actually believe this; but then, some people believe the Earth is less than 10,000 years old.

Or alternatively we are the bizarre strawman concept most people seem to have of utilitarianism, under which any wrong action can be justified by even the slightest good outcome, in which case all you need to do to justify slavery is show that it would lead to a 1% increase in per-capita GDP. Sadly, there honestly do seem to be economists who believe this sort of thing. Here’s one arguing that US chattel slavery was economically efficient, and some of the more extreme arguments for why sweatshops are good can take on this character. Sweatshops may be a necessary evil for the time being, but they are still an evil.

But what utilitarianism actually says (and I consider myself some form of nuanced rule-utilitarian, though actually I sometimes call it “deontological consequentialism” to emphasize that I mean to synthesize the best parts of the two extremes) is not that the ends always justify the means, but that the ends can justify the means—that it can be morally good or even obligatory to do something intrinsically bad (like stabbing children with needles) if it is the best way to accomplish some greater good (like saving them from measles and polio). But the good actually has to be greater, and it has to be the best way to accomplish that good.

To see why this later proviso is important, consider the real-world ethical issues involved in psychology experiments. The benefits of psychology experiments are already quite large, and poised to grow as the science improves; one day the benefits of cognitive science to humanity may be even larger than the benefits of physics and biology are today. Imagine a world without mood disorders or mental illness of any kind; a world without psychopathy, where everyone is compassionate; a world where everyone is achieving their full potential for happiness and self-actualization. Cognitive science may yet make that world possible—and I haven’t even gotten into its applications in artificial intelligence.

To achieve that world, we will need a great many psychology experiments. But does that mean we can just corral people off the street and throw them into psychology experiments without their consent—or perhaps even their knowledge? That we can do whatever we want in those experiments, as long as it’s scientifically useful? No, it does not. We have ethical standards in psychology experiments for a very good reason, and while those ethical standards do slightly reduce the efficiency of the research process, the reduction is small enough that the moral choice is obviously to retain the ethics committees and accept the slight reduction in research efficiency. Yes, randomly throwing people into psychology experiments might actually be slightly better in purely scientific terms (larger and more random samples)—but it would be terrible in moral terms.

Along similar lines, even if torture works about as well or even slightly better than other methods, that’s simply not enough to justify it morally. Making a successful interrogation take 16 days instead of 17 simply wouldn’t be enough benefit to justify the psychological trauma to the suspect (and perhaps the interrogator!), the risk of harm to the falsely accused, or the violation of international human rights law. And in fact a number of terrorism suspects were waterboarded for months, so even the idea that it could shorten the interrogation is pretty implausible. If anything, torture seems to make interrogations take longer and give less reliable information—case 4.

A lot of people seem to have this impression that torture is amazingly, wildly effective, that a suspect who won’t crack after hours of humane interrogation can be tortured for just a few minutes and give you all the information you need. This is exactly what we do not find empirically; if he didn’t crack after hours of talk, he won’t crack after hours of torture. If you literally only have 30 minutes to find the nuke in Los Angeles, I’m sorry; you’re not going to find the nuke in Los Angeles. No adversarial interrogation is ever going to be completed that quickly, no matter what technique you use. Evacuate as many people to safe distances or underground shelters as you can in the time you have left.

This is why the “ticking time-bomb” scenario is so ridiculous (and so insidious); that’s simply not how interrogation works. The best methods we have for “rapid” interrogation of hostile suspects take hours or even days, and they are humane—building trust and rapport is the most important step. The goal is to get the suspect to want to give you accurate information.

For the purposes of the thought experiment, okay, you can stipulate that it would work (this is what the Stanford Encyclopedia of Philosophy does). But now all you’ve done is made the thought experiment more distant from the real-world moral question. The closest real-world examples we’ve ever had involved individual crimes, probably too small to justify the torture (as bad as a murdered child is, think about what you’re doing if you let the police torture people). But by the time the terrorism to be prevented is large enough to really be sufficient justification, it (1) hasn’t happened in the real world and (2) surely involves terrorists who are sufficiently ideologically committed that they’ll be able to resist the torture. If such a situation arises, of course we should try to get information from the suspects—but what we try should be our best methods, the ones that work most consistently, not the ones that “feel right” and maybe happen to work on occasion.

Indeed, the best explanation I have for why people use torture at all, given its horrible effects and mediocre effectiveness at best is that it feels right.

When someone does something terrible (such as an act of terrorism), we rightfully reduce our moral valuation of them relative to everyone else. If you are even tempted to deny this, suppose a terrorist and a random civilian are both inside a burning building and you only have time to save one. Of course you save the civilian and not the terrorist. And that’s still true even if you know that once the terrorist was rescued he’d go to prison and never be a threat to anyone else. He’s just not worth as much.

In the most extreme circumstances, a person can be so terrible that their moral valuation should be effectively zero: If the only person in a burning building is Stalin, I’m not sure you should save him even if you easily could. But it is a grave moral mistake to think that a person’s moral valuation should ever go negative, yet I think this is something that people do when confronted with someone they truly hate. The federal agents torturing those terrorists didn’t merely think of them as worthless—they thought of them as having negative worth. They felt it was a positive good to harm them. But this is fundamentally wrong; no sentient being has negative worth. Some may be so terrible as to have essentially zero worth; and we are often justified in causing harm to some in order to save others. It would have been entirely justified to kill Stalin (as a matter of fact he died of heart disease and old age), to remove the continued threat he posed; but to torture him would not have made the world a better place, and actually might well have made it worse.

Yet I can see how psychologically it could be useful to have a mechanism in our brains that makes us hate someone so much we view them as having negative worth. It makes it a lot easier to harm them when necessary, makes us feel a lot better about ourselves when we do. The idea that any act of homicide is a tragedy but some of them are necessary tragedies is a lot harder to deal with than the idea that some people are just so evil that killing or even torturing them is intrinsically good. But some of the worst things human beings have ever done ultimately came from that place in our brains—and torture is one of them.

The powerful persistence of bigotry

JDN 2457527

Bigotry has been a part of human society since the beginning—people have been hating people they perceive as different since as long as there have been people, and maybe even before that. I wouldn’t be surprised to find that different tribes of chimpanzees or even elephants hold bigoted beliefs about each other.

Yet it may surprise you that neoclassical economics has basically no explanation for this. There is a long-standing famous argument that bigotry is inherently irrational: If you hire based on anything aside from actual qualifications, you are leaving money on the table for your company. Because women CEOs are paid less and perform better, simply ending discrimination against women in top executive positions could save any typical large multinational corporation tens of millions of dollars a year. And yet, they don’t! Fancy that.

More recently there has been work on the concept of statistical discrimination, under which it is rational (in the sense of narrowly-defined economic self-interest) to discriminate because categories like race and gender may provide some statistically valid stereotype information. For example, “Black people are poor” is obviously not true across the board, but race is strongly correlated with wealth in the US; “Asians are smart” is not a universal truth, but Asian-Americans do have very high educational attainment. In the absence of more reliable information that might be your best option for making good decisions. Of course, this creates a vicious cycle where people in the positive stereotype group are better off and have more incentive to improve their skills than people in the negative stereotype group, thus perpetuating the statistical validity of the stereotype.

But of course that assumes that the stereotypes are statistically valid, and that employers don’t have more reliable information. Yet many stereotypes aren’t even true statistically: If “women are bad drivers”, then why do men cause 75% of traffic fatalities? Furthermore, in most cases employers have more reliable information—resumes with education and employment records. Asian-Americans are indeed more likely to have bachelor’s degrees than Latino Americans, but when it say right on Mr. Lorenzo’s resume that he has a B.A. and on Mr. Suzuki’s resume that he doesn’t, that racial stereotype no longer provides you with any further information. Yet even if the resumes are identical, employers will be more likely to hire a White applicant than a Black applicant, and more likely to hire a male applicant than a female applicant—we have directly tested this in experiments. In an experiment where employers had direct performance figures in front of them, they were still more likely to choose the man when they had the same scores—and sometimes even when the woman had a higher score!

Even our assessments of competence are often biased, probably subconsciously; given the same essay to review, most reviewers find more spelling errors and are more concerned about those errors if they are told that the author is Black. If they thought the author was White, they thought of the errors as “minor mistakes” by a student with “otherwise good potential”; but if they thought the author was Black, they “can’t believe he got into this school in the first place”. These reviewers were reading the same essay. The alleged author’s race was decided randomly. Most if not all of these reviewers were not consciously racist. Subconscious racial biases are all over the place; almost everyone exhibits some subconscious racial bias.

No, discrimination isn’t just rational inference based on valid (if unfortunate and self-reinforcing) statistical trends. There is a significant component of just outright irrational bigotry.

We’re seeing this play out in North Carolina; due to their arbitrary discrimination against lesbian, gay, bisexual and especially transgender people, they are now hemorrhaging jobs as employers pull out, and their federal funding for student loans is now in jeopardy due to the obvious Title IX violation. This is obviously not in the best interest of the people of North Carolina (even the ones who aren’t LGBT!); and it’s all being justified on the grounds of an epidemic of sexual assaults by people pretending to be trans that doesn’t even exist. It turns out that more Republican Senators have been arrested for sexual misconduct in bathrooms than transgender people—and while the number of transgender people in the US is surprisingly hard to measure, it’s clearly a lot larger than the number of Republican Senators!

In fact, discrimination is even more irrational than it may seem, because empirically the benefits of discrimination (such as they are—short-term narrow economic self-interest) fall almost entirely on the rich while the harms fall mainly on the poor, yet poor people are much more likely to be racist! Since income and education are highly correlated, education accounts for some of this effect. This is reason to be hopeful, for as educational attainment has soared, we have found that racism has decreased.

But education doesn’t seem to explain the full effect. One theory to account this is what’s called last-place aversiona highly pernicious heuristic where people are less concerned about their own absolute status than they are about not having the worst status. In economic experiments, people are usually more willing to give money to people worse off than them than to those better off than them—unless giving it to the worse-off would make those people better off than they themselves are. I think we actually need to do further study to see what happens if it would make those other people exactly as well-off as they are, because that turns out to be absolutely critical to whether people would be willing to support a basic income. In other words, do people count “tied for last”? Would they rather play a game where everyone gets $100, or one where they get $50 but everyone else only gets $10?

I would hope that humanity is better than that—that we would want to play the $100 game, which is analogous to a basic income. But when I look at the extreme and persistent inequality that has plagued human society for millennia, I begin to wonder if perhaps there really are a lot of people who think of the world in such zero-sum, purely relative terms, and care more about being better than others than they do about doing well themselves. Perhaps the horrific poverty of Sub-Saharan Africa and Southeast Asia is, for many First World people, not a bug but a feature; we feel richer when we know they are poorer. Scarcity seems to amplify this zero-sum thinking; racism gets worse whenever we have economic downturns. Precisely because discrimination is economically inefficient, this can create a vicious cycle where poverty causes bigotry which worsens poverty.

There is also something deeper going on, something evolutionary; bigotry is part of what I call the tribal paradigm, the core aspect of human psychology that defines identity in terms of in-groups which are good and out-groups which are bad. We will probably never fully escape the tribal paradigm, but this is not a reason to give up hope; we have made substantial progress in reducing bigotry in many places. What seems to happen is that people learn to expand their mental tribe, so that it encompasses larger and larger groups—not just White Americans but all Americans, or not just Americans but all human beings. Peter Singer calls this the Expanding Circle (also the title of his book on it). We may one day be able to make our tribe large enough to encompass all sentient beings in the universe; at that point, it’s just fine if we are only interested in advancing the interests of those in our tribe, because our tribe would include everyone. Yet I don’t think any of us are quite there yet, and some people have a really long way to go.

But with these expanding tribes in mind, perhaps I can leave you with a fact that is as counter-intuitive as it is encouraging, and even easier still to take out of context: Racism was better than what came before it. What I mean by this is not that racism is good—of course it’s terrible—but that in order to be racism, to define the whole world into a small number of “racial groups”, people already had to enormously expand their mental tribe from where it started. When we evolved on the African savannah millions of years ago, our tribe was 150 people; to this day, that’s about the number of people we actually feel close to and interact with on a personal level. We could have stopped there, and for millennia we did. But over time we managed to expand beyond that number, to a village of 1,000, a town of 10,000, a city of 100,000. More recently we attained mental tribes of whole nations, in some case hundreds of millions of people. Racism is about that same scale, if not a bit larger; what most people (rather arbitrarily, and in a way that changes over time) call “White” constitutes about a billion people. “Asian” (including South Asian) is almost four billion. These are astonishingly huge figures, some seven orders of magnitude larger than what we originally evolved to handle. The ability to feel empathy for all “White” people is just a little bit smaller than the ability to feel empathy for all people period. Similarly, while today the gender in “all men are created equal” is jarring to us, the idea at the time really was an incredibly radical broadening of the moral horizon—Half the world? Are you mad?

Therefore I am confident that one day, not too far from now, the world will take that next step, that next order of magnitude, which many of us already have (or try to), and we will at last conquer bigotry, and if not eradicate it entirely then force it completely into the most distant shadows and deny it its power over our society.

What does correlation have to do with causation?

JDN 2457345

I’ve been thinking of expanding the topics of this blog into some basic statistics and econometrics. It has been said that there are “Lies, damn lies, and statistics”; but in fact it’s almost the opposite—there are truths, whole truths, and statistics. Almost everything in the world that we know—not merely guess, or suppose, or intuit, or believe, but actually know, with a quantifiable level of certainty—is done by means of statistics. All sciences are based on them, from physics (when they say the Higgs discovery is a “5-sigma event”, that’s a statistic) to psychology, ecology to economics. Far from being something we cannot trust, they are in a sense the only thing we can trust.

The reason it sometimes feels like we cannot trust statistics is that most people do not understand statistics very well; this creates opportunities for both accidental confusion and willful distortion. My hope is therefore to provide you with some of the basic statistical knowledge you need to combat the worst distortions and correct the worst confusions.

I wasn’t quite sure where to start on this quest, but I suppose I have to start somewhere. I figured I may as well start with an adage about statistics that I hear commonly abused: “Correlation does not imply causation.”

Taken at its original meaning, this is definitely true. Unfortunately, it can be easily abused or misunderstood.

In its original meaning, the formal sense of the word “imply” meaning logical implication, to “imply” something is an extremely strong statement. It means that you logically entail that result, that if the antecedent is true, the consequent must be true, on pain of logical contradiction. Logical implication is for most practical purposes synonymous with mathematical proof. (Unfortunately, it’s not quite synonymous, because of things like Gödel’s incompleteness theorems and Löb’s theorem.)

And indeed, correlation does not logically entail causation; it’s quite possible to have correlations without any causal connection whatsoever, simply by chance. One of my former professors liked to brag that from 1990 to 2010 whether or not she ate breakfast had a statistically significant positive correlation with that day’s closing price for the Dow Jones Industrial Average.

How is this possible? Did my professor actually somehow influence the stock market by eating breakfast? Of course not; if she could do that, she’d be a billionaire by now. And obviously the Dow’s price at 17:00 couldn’t influence whether she ate breakfast at 09:00. Could there be some common cause driving both of them, like the weather? I guess it’s possible; maybe in good weather she gets up earlier and people are in better moods so they buy more stocks. But the most likely reason for this correlation is much simpler than that: She tried a whole bunch of different combinations until she found two things that correlated. At the usual significance level of 0.05, on average you need to try about 20 combinations of totally unrelated things before two of them will show up as correlated. (My guess is she used a number of different stock indexes and varied the starting and ending year. That’s a way to generate a surprisingly large number of degrees of freedom without it seeming like you’re doing anything particularly nefarious.)

But how do we know they aren’t actually causally related? Well, I suppose we don’t. Especially if the universe is ultimately deterministic and nonlocal (as I’ve become increasingly convinced by the results of recent quantum experiments), any two data sets could be causally related somehow. But the point is they don’t have to be; you can pick any randomly-generated datasets, pair them up in 20 different ways, and odds are, one of those ways will show a statistically significant correlation.

All of that is true, and important to understand. Finding a correlation between eating grapefruit and getting breast cancer, or between liking bitter foods and being a psychopath, does not necessarily mean that there is any real causal link between the two. If we can replicate these results in a bunch of other studies, that would suggest that the link is real; but typically, such findings cannot be replicated. There is something deeply wrong with the way science journalists operate; they like to publish the new and exciting findings, which 9 times out of 10 turn out to be completely wrong. They never want to talk about the really important and fascinating things that we know are true because we’ve been confirming them over hundreds of different experiments, because that’s “old news”. The journalistic desire to be new and first fundamentally contradicts the scientific requirement of being replicated and confirmed.

So, yes, it’s quite possible to have a correlation that tells you absolutely nothing about causation.

But this is exceptional. In most cases, correlation actually tells you quite a bit about causation.

And this is why I don’t like the adage; “imply” has a very different meaning in common speech, meaning merely to suggest or evoke. Almost everything you say implies all sorts of things in this broader sense, some more strongly than others, even though it may logically entail none of them.

Correlation does in fact suggest causation. Like any suggestion, it can be overridden. If we know that 20 different combinations were tried until one finally yielded a correlation, we have reason to distrust that correlation. If we find a correlation between A and B but there is no logical way they can be connected, we infer that it is simply an odd coincidence.

But when we encounter any given correlation, there are three other scenarios which are far more likely than mere coincidence: A causes B, B causes A, or some other factor C causes A and B. These are also not mutually exclusive; they can all be true to some extent, and in many cases are.

A great deal of work in science, and particularly in economics, is based upon using correlation to infer causation, and has to be—because there is simply no alternative means of approaching the problem.

Yes, sometimes you can do randomized controlled experiments, and some really important new findings in behavioral economics and development economics have been made this way. Indeed, much of the work that I hope to do over the course of my career is based on randomized controlled experiments, because they truly are the foundation of scientific knowledge. But sometimes, that’s just not an option.

Let’s consider an example: In my master’s thesis I found a strong correlation between the level of corruption in a country (as estimated by the World Bank) and the proportion of that country’s income which goes to the top 0.01% of the population. Countries that have higher levels of corruption also tend to have a larger proportion of income that accrues to the top 0.01%. That correlation is a fact; it’s there. There’s no denying it. But where does it come from? That’s the real question.

Could it be pure coincidence? Well, maybe; but when it keeps showing up in several different models with different variables included, that becomes unlikely. A single p < 0.05 will happen about 1 in 20 times by chance; but five in a row should happen less than 1 in 1 million times (assuming they’re independent, which, to be fair, they usually aren’t).

Could it be some artifact of the measurement methods? It’s possible. In particular, I was concerned about the possibility of Halo Effect, in which people tend to assume that something which is better (or worse) in one way is automatically better (or worse) in other ways as well. People might think of their country as more corrupt simply because it has higher inequality, even if there is no real connection. But it would have taken a very large halo bias to explain this effect.

So, does corruption cause income inequality? It’s not hard to see how that might happen: More corrupt individuals could bribe leaders or exploit loopholes to make themselves extremely rich, and thereby increase inequality.

Does inequality cause corruption? This also makes some sense, since it’s a lot easier to bribe leaders and manipulate regulations when you have a lot of money to work with in the first place.

Does something else cause both corruption and inequality? Also quite plausible. Maybe some general cultural factors are involved, or certain economic policies lead to both corruption and inequality. I did try to control for such things, but I obviously couldn’t include all possible variables.

So, which way does the causation run? Unfortunately, I don’t know. I tried some clever statistical techniques to try to figure this out; in particular, I looked at which tends to come first—the corruption or the inequality—and whether they could be used to predict each other, a method called Granger causality. Those results were inconclusive, however. I could neither verify nor exclude a causal connection in either direction. But is there a causal connection? I think so. It’s too robust to just be coincidence. I simply don’t know whether A causes B, B causes A, or C causes A and B.

Imagine trying to do this same study as a randomized controlled experiment. Are we supposed to create two societies and flip a coin to decide which one we make more corrupt? Or which one we give more income inequality? Perhaps you could do some sort of experiment with a proxy for corruption (cheating on a test or something like that), and then have unequal payoffs in the experiment—but that is very far removed from how corruption actually works in the real world, and worse, it’s prohibitively expensive to make really life-altering income inequality within an experimental context. Sure, we can give one participant $1 and the other $1,000; but we can’t give one participant $10,000 and the other $10 million, and it’s the latter that we’re really talking about when we deal with real-world income inequality. I’m not opposed to doing such an experiment, but it can only tell us so much. At some point you need to actually test the validity of your theory in the real world, and for that we need to use statistical correlations.

Or think about macroeconomics; how exactly are you supposed to test a theory of the business cycle experimentally? I guess theoretically you could subject an entire country to a new monetary policy selected at random, but the consequences of being put into the wrong experimental group would be disastrous. Moreover, nobody is going to accept a random monetary policy democratically, so you’d have to introduce it against the will of the population, by some sort of tyranny or at least technocracy. Even if this is theoretically possible, it’s mind-bogglingly unethical.

Now, you might be thinking: But we do change real-world policies, right? Couldn’t we use those changes as a sort of “experiment”? Yes, absolutely; that’s called a quasi-experiment or a natural experiment. They are tremendously useful. But since they are not truly randomized, they aren’t quite experiments. Ultimately, everything you get out of a quasi-experiment is based on statistical correlations.

Thus, abuse of the adage “Correlation does not imply causation” can lead to ignoring whole subfields of science, because there is no realistic way of running experiments in those subfields. Sometimes, statistics are all we have to work with.

This is why I like to say it a little differently:

Correlation does not prove causation. But correlation definitely can suggest causation.