How is the economy doing this well?

Apr 14 JDN 2460416

We are living in a very weird time, economically. The COVID pandemic created huge disruptions throughout our economy, from retail shops closing to shortages in shipping containers. The result was a severe recession with the worst unemployment since the Great Depression.

Now, a few years later, we have fully recovered.

Here’s a graph from FRED showing our unemployment and inflation rates since 1990 [technical note: I’m using the urban CPI; there are a few other inflation measures you could use instead, but they look much the same]:

Inflation fluctuates pretty quickly, while unemployment moves much slower.

There are a lot of things we can learn from this graph:

  1. Before COVID, we had pretty low inflation; from 1990 to 2019, inflation averaged about 2.4%, just over the Fed’s 2% target.
  2. Before COVID, we had moderate to high unemployment; it rarely went below 5% and and for several years after the 2008 crash it was over 7%—which is why we called it the Great Recession.
  3. The only times we actually had negative inflation—deflationwere during recessions, and coincided with high unemployment; so, no, we really don’t want prices to come down.
  4. During COVID, we had a massive spike in unemployment up to almost 15%, but then it came back down much more rapidly than it had in the Great Recession.
  5. After COVID, there was a surge in inflation, peaking at almost 10%.
  6. That inflation surge was short-lived; by the end of 2022 inflation was back down to 4%.
  7. Unemployment now stands at 3.8% while inflation is at 2.7%.

What I really want to emphasize right now is point 7, so let me repeat it:

Unemployment now stands at 3.8% while inflation is at 2.7%.

Yes, technically, 2.7% is above our inflation target. But honestly, I’m not sure it should be. I don’t see any particular reason to think that 2% is optimal, and based on what we’ve learned from the Great Recession, I actually think 3% or even 4% would be perfectly reasonable inflation targets. No, we don’t want to be going into double-digits (and we certainly don’t want true hyperinflation); but 4% inflation really isn’t a disaster, and we should stop treating it like it is.

2.7% inflation is actually pretty close to the 2.4% inflation we’d been averaging from 1990 to 2019. So I think it’s fair to say that inflation is back to normal.

But the really wild thing is that unemployment isn’t back to normal: It’s much better than that.

To get some more perspective on this, let’s extend our graph backward all the way to 1950:

Inflation has been much higher than it is now. In the late 1970s, it was consistently as high as it got during the post-COVID surge. But it has never been substantially lower than it is now; a little above the 2% target really seems to be what stable, normal inflation looks like in the United States.

On the other hand, unemployment is almost never this low. It was for a few years in the early 1950s and the late 1960s; but otherwise, it has always been higher—and sometimes much higher. It did not dip below 5% for the entire period from 1971 to 1994.

They hammer into us in our intro macroeconomics courses the Phillips Curve, which supposedly says that unemployment is inversely related to inflation, so that it’s impossible to have both low inflation and low unemployment.

But we’re looking at it, right now. It’s here, right in front of us. What wasn’t supposed to be possible has now been achieved. E pur si muove.

There was supposed to be this terrible trade-off between inflation and unemployment, leaving our government with the stark dilemma of either letting prices surge or letting millions remain out of work. I had always been on the “inflation” side: I thought that rising prices were far less of a problem than poeple out of work.

But we just learned that the entire premise was wrong.

You can have both. You don’t have to choose.

Right here, right now, we have both. All we need to do is keep doing whatever we’re doing.

One response might be: what if we can’t? What if this is unsustainable? (Then again, conservatives never seemed terribly concerned about sustainability before….)

It’s worth considering. One thing that doesn’t look so great now is the federal deficit. It got extremely high during COVID, and it’s still pretty high now. But as a proportion of GDP, it isn’t anywhere near as high as it was during WW2, and we certainly made it through that all right:

So, yeah, we should probably see if we can bring the budget back to balanced—probably by raising taxes. But this isn’t an urgent problem. We have time to sort it out. 15% unemployment was an urgent problem—and we fixed it.

In fact in some ways the economy is even doing better now than it looks. Unemployment for Black people has never been this low, since we’ve been keeping track of it:

Black people had basically learned to live with 8% or 9% unemployment as if it were normal; but now, for the first time ever—ever—their unemployment rate is down to only 5%.

This isn’t because people are dropping out of the labor force. Broad unemployment, which includes people marginally attached to the labor force, people employed part-time not by choice, and people who gave up looking for work, is also at historic lows, despite surging to almost 23% during COVID:

In fact, overall employment among people 25-54 years old (considered “prime age”—old enough to not be students, young enough to not be retired) is nearly the highest it has ever been, and radically higher than it was before the 1980s (because women entered the workforce):

So this is not an illusion: More Americans really are working now. And employment has become more inclusive of women and minorities.

I really don’t understand why President Biden isn’t more popular. Biden inherited the worst unemployment since the Great Depression, and turned it around into an economic situation so good that most economists thought it was impossible. A 39% approval rating does not seem consistent with that kind of staggering economic improvement.

And yes, there are a lot of other factors involved aside from the President; but for once I think he really does deserve a lot of the credit here. Programs he enacted to respond to COVID brought us back to work quicker than many thought possible. Then, the Inflation Reduction Act made historic progress at fighting climate change—and also, lo and behold, reduced inflation.

He’s not a particularly charismatic figure. He is getting pretty old for this job (or any job, really). But Biden’s economic policy has been amazing, and deserves more credit for that.

Statisticacy

Jun 11 JDN 2460107

I wasn’t able to find a dictionary that includes the word “statisticacy”, but it doesn’t trigger my spell-check, and it does seem to have the same form as “numeracy”: numeric, numerical, numeracy, numerate; statistic, statistical, statisticacy, statisticate. It definitely still sounds very odd to my ears. Perhaps repetition will eventually make it familiar.

For the concept is clearly a very important one. Literacy and numeracy are no longer a serious problem in the First World; basically every adult at this point knows how to read and do addition. Even worldwide, 90% of men and 83% of women can read, at least at a basic level—which is an astonishing feat of our civilization by the way, well worthy of celebration.

But I have noticed a disturbing lack of, well, statisticacy. Even intelligent, educated people seem… pretty bad at understanding statistics.

I’m not talking about sophisticated econometrics here; of course most people don’t know that, and don’t need to. (Most economists don’t know that!) I mean quite basic statistical knowledge.

A few years ago I wrote a post called “Statistics you should have been taught in high school, but probably weren’t”; that’s the kind of stuff I’m talking about.

As part of being a good citizen in a modern society, every adult should understand the following:

1. The difference between a mean and a median, and why average income (mean) can increase even though most people are no richer (median).

2. The difference between increasing by X% and increasing by X percentage points: If inflation goes from 4% to 5%, that is an increase of 20% ((5/4-1)*100%), but only 1 percentage point (5%-4%).

3. The meaning of standard error, and how to interpret error bars on a graph—and why it’s a huge red flag if there aren’t any error bars on a graph.

4. Basic probabilistic reasoning: Given some scratch paper, a pen, and a calculator, everyone should be able to work out the odds of drawing a given blackjack hand, or rolling a particular number on a pair of dice. (If that’s too easy, make it a poker hand and four dice. But mostly that’s just more calculation effort, not fundamentally different.)

5. The meaning of exponential growth rates, and how they apply to economic growth and compound interest. (The difference between 3% interest and 6% interest over 30 years is more than double the total amount paid.)

I see people making errors about this sort of thing all the time.

Economic news that celebrates rising GDP but wonders why people aren’t happier (when real median income has been falling since 2019 and is only 7% higher than it was in 1999, an annual growth rate of 0.2%).

Reports on inflation, interest rates, or poll numbers that don’t clearly specify whether they are dealing with percentages or percentage points. (XKCD made fun of this.)

Speaking of poll numbers, any reporting on changes in polls that isn’t at least twice the margin of error of the polls in question. (There’s also a comic for this; this time it’s PhD Comics.)

People misunderstanding interest rates and gravely underestimating how much they’ll pay for their debt (then again, this is probably the result of strategic choices on the part of banks—so maybe the real failure is regulatory).

And, perhaps worst of all, the plague of science news articles about “New study says X”. Things causing and/or cancer, things correlated with personality types, tiny psychological nudges that supposedly have profound effects on behavior.

Some of these things will even turn out to be true; actually I think this one on fibromyalgia, this one on smoking, and this one on body image are probably accurate. But even if it’s a properly randomized experiment—and especially if it’s just a regression analysis—a single study ultimately tells us very little, and it’s irresponsible to report on them instead of telling people the extensive body of established scientific knowledge that most people still aren’t aware of.

Basically any time an article is published saying “New study says X”, a statisticate person should ignore it and treat it as random noise. This is especially true if the finding seems weird or shocking; such findings are far more likely to be random flukes than genuine discoveries. Yes, they could be true, but one study just doesn’t move the needle that much.

I don’t remember where it came from, but there is a saying about this: “What is in the textbooks is 90% true. What is in the published literature is 50% true. What is in the press releases is 90% false.” These figures are approximately correct.

If their goal is to advance public knowledge of science, science journalists would accomplish a lot more if they just opened to a random page in a mainstream science textbook and started reading it on air. Admittedly, I can see how that would be less interesting to watch; but then, their job should be to find a way to make it interesting, not to take individual studies out of context and hype them up far beyond what they deserve. (Bill Nye did this much better than most science journalists.)

I’m not sure how much to blame people for lacking this knowledge. On the one hand, they could easily look it up on Wikipedia, and apparently choose not to. On the other hand, they probably don’t even realize how important it is, and were never properly taught it in school even though they should have been. Many of these things may even be unknown unknowns; people simply don’t realize how poorly they understand. Maybe the most useful thing we could do right now is simply point out to people that these things are important, and if they don’t understand them, they should get on that Wikipedia binge as soon as possible.

And one last thing: Maybe this is asking too much, but I think that a truly statisticate person should be able to solve the Monty Hall Problem and not be confused by the result. (Hint: It’s very important that Monty Hall knows which door the car is behind, and would never open that one. If he’s guessing at random and simply happens to pick a goat, the correct answer is 1/2, not 2/3. Then again, it’s never a bad choice to switch.)

Pinker Propositions

May 19 2458623

What do the following statements have in common?

1. “Capitalist countries have less poverty than Communist countries.

2. “Black men in the US commit homicide at a higher rate than White men.

3. “On average, in the US, Asian people score highest on IQ tests, White and Hispanic people score near the middle, and Black people score the lowest.

4. “Men on average perform better at visual tasks, and women on average perform better on verbal tasks.

5. “In the United States, White men are no more likely to be mass shooters than other men.

6. “The genetic heritability of intelligence is about 60%.

7. “The plurality of recent terrorist attacks in the US have been committed by Muslims.

8. “The period of US military hegemony since 1945 has been the most peaceful period in human history.

These statements have two things in common:

1. All of these statements are objectively true facts that can be verified by rich and reliable empirical data which is publicly available and uncontroversially accepted by social scientists.

2. If spoken publicly among left-wing social justice activists, all of these statements will draw resistance, defensiveness, and often outright hostility. Anyone making these statements is likely to be accused of racism, sexism, imperialism, and so on.

I call such propositions Pinker Propositions, after an excellent talk by Steven Pinker illustrating several of the above statements (which was then taken wildly out of context by social justice activists on social media).

The usual reaction to these statements suggests that people think they imply harmful far-right policy conclusions. This inference is utterly wrong: A nuanced understanding of each of these propositions does not in any way lead to far-right policy conclusions—in fact, some rather strongly support left-wing policy conclusions.

1. Capitalist countries have less poverty than Communist countries, because Communist countries are nearly always corrupt and authoritarian. Social democratic countries have the lowest poverty and the highest overall happiness (#ScandinaviaIsBetter).

2. Black men commit more homicide than White men because of poverty, discrimination, mass incarceration, and gang violence. Black men are also greatly overrepresented among victims of homicide, as most homicide is intra-racial. Homicide rates often vary across ethnic and socioeconomic groups, and these rates vary over time as a result of cultural and political changes.

3. IQ tests are a highly imperfect measure of intelligence, and the genetics of intelligence cut across our socially-constructed concept of race. There is far more within-group variation in IQ than between-group variation. Intelligence is not fixed at birth but is affected by nutrition, upbringing, exposure to toxins, and education—all of which statistically put Black people at a disadvantage. Nor does intelligence remain constant within populations: The Flynn Effect is the well-documented increase in intelligence which has occurred in almost every country over the past century. Far from justifying discrimination, these provide very strong reasons to improve opportunities for Black children. The lead and mercury in Flint’s water suppressed the brain development of thousands of Black children—that’s going to lower average IQ scores. But that says nothing about supposed “inherent racial differences” and everything about the catastrophic damage of environmental racism.

4. To be quite honest, I never even understood why this one shocks—or even surprises—people. It’s not even saying that men are “smarter” than women—overall IQ is almost identical. It’s just saying that men are more visual and women are more verbal. And this, I think, is actually quite obvious. I think the clearest evidence of this—the “interocular trauma” that will convince you the effect is real and worth talking about—is pornography. Visual porn is overwhelmingly consumed by men, even when it was designed for women (e.g. Playgirla majority of its readers are gay men, even though there are ten times as many straight women in the world as there are gay men). Conversely, erotic novels are overwhelmingly consumed by women. I think a lot of anti-porn feminism can actually be explained by this effect: Feminists (who are usually women, for obvious reasons) can say they are against “porn” when what they are really against is visual porn, because visual porn is consumed by men; then the kind of porn that they like (erotic literature) doesn’t count as “real porn”. And honestly they’re mostly against the current structure of the live-action visual porn industry, which is totally reasonable—but it’s a far cry from being against porn in general. I have some serious issues with how our farming system is currently set up, but I’m not against farming.

5. This one is interesting, because it’s a lack of a race difference, which normally is what the left wing always wants to hear. The difference of course is that this alleged difference would make White men look bad, and that’s apparently seen as a desirable goal for social justice. But the data just doesn’t bear it out: While indeed most mass shooters are White men, that’s because most Americans are White, which is a totally uninteresting reason. There’s no clear evidence of any racial disparity in mass shootings—though the gender disparity is absolutely overwhelming: It’s almost always men.

6. Heritability is a subtle concept; it doesn’t mean what most people seem to think it means. It doesn’t mean that 60% of your intelligence is due to your your genes. Indeed, I’m not even sure what that sentence would actually mean; it’s like saying that 60% of the flavor of a cake is due to the eggs. What this heritability figure actually means that when you compare across individuals in a population, and carefully control for environmental influences, you find that about 60% of the variance in IQ scores is explained by genetic factors. But this is within a particular population—here, US adults—and is absolutely dependent on all sorts of other variables. The more flexible one’s environment becomes, the more people self-select into their preferred environment, and the more heritable traits become. As a result, IQ actually becomes more heritable as children become adults, called the Wilson Effect.

7. This one might actually have some contradiction with left-wing policy. The disproportionate participation of Muslims in terrorism—controlling for just about anything you like, income, education, age etc.—really does suggest that, at least at this point in history, there is some real ideological link between Islam and terrorism. But the fact remains that the vast majority of Muslims are not terrorists and do not support terrorism, and antagonizing all the people of an entire religion is fundamentally unjust as well as likely to backfire in various ways. We should instead be trying to encourage the spread of more tolerant forms of Islam, and maintaining the strict boundaries of secularism to prevent the encroach of any religion on our system of government.

8. The fact that US military hegemony does seem to be a cause of global peace doesn’t imply that every single military intervention by the US is justified. In fact, it doesn’t even necessarily imply that any such interventions are justified—though I think one would be hard-pressed to say that the NATO intervention in the Kosovo War or the defense of Kuwait in the Gulf War was unjustified. It merely points out that having a hegemon is clearly preferable to having a multipolar world where many countries jockey for military supremacy. The Pax Romana was a time of peace but also authoritarianism; the Pax Americana is better, but that doesn’t prevent us from criticizing the real harms—including major war crimes—committed by the United States.

So it is entirely possible to know and understand these facts without adopting far-right political views.

Yet Pinker’s point—and mine—is that by suppressing these true facts, by responding with hostility or even ostracism to anyone who states them, we are actually adding fuel to the far-right fire. Instead of presenting the nuanced truth and explaining why it doesn’t imply such radical policies, we attack the messenger; and this leads people to conclude three things:

1. The left wing is willing to lie and suppress the truth in order to achieve political goals (they’re doing it right now).

2. These statements actually do imply right-wing conclusions (else why suppress them?).

3. Since these statements are true, that must mean the right-wing conclusions are actually correct.

Now (especially if you are someone who identifies unironically as “woke”), you might be thinking something like this: “Anyone who can be turned away from social justice so easily was never a real ally in the first place!”

This is a fundamentally and dangerously wrongheaded view. No one—not me, not you, not anyone—was born believing in social justice. You did not emerge from your mother’s womb ranting against colonalist imperialism. You had to learn what you now know. You came to believe what you now believe, after once believing something else that you now think is wrong. This is true of absolutely everyone everywhere. Indeed, the better you are, the more true it is; good people learn from their mistakes and grow in their knowledge.

This means that anyone who is now an ally of social justice once was not. And that, in turn, suggests that many people who are currently not allies could become so, under the right circumstances. They would probably not shift all at once—as I didn’t, and I doubt you did either—but if we are welcoming and open and honest with them, we can gradually tilt them toward greater and greater levels of support.

But if we reject them immediately for being impure, they never get the chance to learn, and we never get the chance to sway them. People who are currently uncertain of their political beliefs will become our enemies because we made them our enemies. We declared that if they would not immediately commit to everything we believe, then they may as well oppose us. They, quite reasonably unwilling to commit to a detailed political agenda they didn’t understand, decided that it would be easiest to simply oppose us.

And we don’t have to win over every person on every single issue. We merely need to win over a large enough critical mass on each issue to shift policies and cultural norms. Building a wider tent is not compromising on your principles; on the contrary, it’s how you actually win and make those principles a reality.

There will always be those we cannot convince, of course. And I admit, there is something deeply irrational about going from “those leftists attacked Charles Murray” to “I think I’ll start waving a swastika”. But humans aren’t always rational; we know this. You can lament this, complain about it, yell at people for being so irrational all you like—it won’t actually make people any more rational. Humans are tribal; we think in terms of teams. We need to make our team as large and welcoming as possible, and suppressing Pinker Propositions is not the way to do that.

The sausage of statistics being made

 

Nov 11 JDN 2458434

“Laws, like sausages, cease to inspire respect in proportion as we know how they are made.”

~ John Godfrey Saxe, not Otto von Bismark

Statistics are a bit like laws and sausages. There are a lot of things in statistical practice that don’t align with statistical theory. The most obvious examples are the fact that many results in statistics are asymptotic: they only strictly apply for infinitely large samples, and in any finite sample they will be some sort of approximation (we often don’t even know how good an approximation).

But the problem runs deeper than this: The whole idea of a p-value was originally supposed to be used to assess one single hypothesis that is the only one you test in your entire study.

That’s frankly a ludicrous expectation: Why would you write a whole paper just to test one parameter?

This is why I don’t actually think this so-called multiple comparisons problem is a problem with researchers doing too many hypothesis tests; I think it’s a problem with statisticians being fundamentally unreasonable about what statistics is useful for. We have to do multiple comparisons, so you should be telling us how to do it correctly.

Statisticians have this beautiful pure mathematics that generates all these lovely asymptotic results… and then they stop, as if they were done. But we aren’t dealing with infinite or even “sufficiently large” samples; we need to know what happens when your sample is 100, not when your sample is 10^29. We can’t assume that our variables are independently identically distributed; we don’t know their distribution, and we’re pretty sure they’re going to be somewhat dependent.

Even in an experimental context where we can randomly and independently assign some treatments, we can’t do that with lots of variables that are likely to matter, like age, gender, nationality, or field of study. And applied econometricians are in an even tighter bind; they often can’t randomize anything. They have to rely upon “instrumental variables” that they hope are “close enough to randomized” relative to whatever they want to study.

In practice what we tend to do is… fudge it. We use the formal statistical methods, and then we step back and apply a series of informal norms to see if the result actually makes sense to us. This is why almost no psychologists were actually convinced by Daryl Bem’s precognition experiments, despite his standard experimental methodology and perfect p < 0.05 results; he couldn’t pass any of the informal tests, particularly the most basic one of not violating any known fundamental laws of physics. We knew he had somehow cherry-picked the data, even before looking at it; nothing else was possible.

This is actually part of where the “hierarchy of sciences” notion is useful: One of the norms is that you’re not allowed to break the rules of the sciences above you, but you can break the rules of the sciences below you. So psychology has to obey physics, but physics doesn’t have to obey psychology. I think this is also part of why there’s so much enmity between economists and anthropologists; really we should be on the same level, cognizant of each other’s rules, but economists want to be above anthropologists so we can ignore culture, and anthropologists want to be above economists so they can ignore incentives.

Another informal norm is the “robustness check”, in which the researcher runs a dozen different regressions approaching the same basic question from different angles. “What if we control for this? What if we interact those two variables? What if we use a different instrument?” In terms of statistical theory, this doesn’t actually make a lot of sense; the probability distributions f(y|x) of y conditional on x and f(y|x, z) of y conditional on x and z are not the same thing, and wouldn’t in general be closely tied, depending on the distribution f(x|z) of x conditional on z. But in practice, most real-world phenomena are going to continue to show up even as you run a bunch of different regressions, and so we can be more confident that something is a real phenomenon insofar as that happens. If an effect drops out when you switch out a couple of control variables, it may have been a statistical artifact. But if it keeps appearing no matter what you do to try to make it go away, then it’s probably a real thing.

Because of the powerful career incentives toward publication and the strange obsession among journals with a p-value less than 0.05, another norm has emerged: Don’t actually trust p-values that are close to 0.05. The vast majority of the time, a p-value of 0.047 was the result of publication bias. Now if you see a p-value of 0.001, maybe then you can trust it—but you’re still relying on a lot of assumptions even then. I’ve seen some researchers argue that because of this, we should tighten our standards for publication to something like p < 0.01, but that’s missing the point; what we need to do is stop publishing based on p-values. If you tighten the threshold, you’re just going to get more rejected papers and then the few papers that do get published will now have even smaller p-values that are still utterly meaningless.

These informal norms protect us from the worst outcomes of bad research. But they are almost certainly not optimal. It’s all very vague and informal, and different researchers will often disagree vehemently over whether a given interpretation is valid. What we need are formal methods for solving these problems, so that we can have the objectivity and replicability that formal methods provide. Right now, our existing formal tools simply are not up to that task.

There are some things we may never be able to formalize: If we had a formal algorithm for coming up with good ideas, the AIs would already rule the world, and this would be either Terminator or The Culture depending on whether we designed the AIs correctly. But I think we should at least be able to formalize the basic question of “Is this statement likely to be true?” that is the fundamental motivation behind statistical hypothesis testing.

I think the answer is likely to be in a broad sense Bayesian, but Bayesians still have a lot of work left to do in order to give us really flexible, reliable statistical methods we can actually apply to the messy world of real data. In particular, tell us how to choose priors please! Prior selection is a fundamental make-or-break problem in Bayesian inference that has nonetheless been greatly neglected by most Bayesian statisticians. So, what do we do? We fall back on informal norms: Try maximum likelihood, which is like using a very flat prior. Try a normally-distributed prior. See if you can construct a prior from past data. If all those give the same thing, that’s a “robustness check” (see previous informal norm).

Informal norms are also inherently harder to teach and learn. I’ve seen a lot of other grad students flail wildly at statistics, not because they don’t know what a p-value means (though maybe that’s also sometimes true), but because they don’t really quite grok the informal underpinnings of good statistical inference. This can be very hard to explain to someone: They feel like they followed all the rules correctly, but you are saying their results are wrong, and now you can’t explain why.

In fact, some of the informal norms that are in wide use are clearly detrimental. In economics, norms have emerged that certain types of models are better simply because they are “more standard”, such as the dynamic stochastic general equilibrium models that can basically be fit to everything and have never actually usefully predicted anything. In fact, the best ones just predict what we already knew from Keynesian models. But without a formal norm for testing the validity of models, it’s been “DSGE or GTFO”. At present, it is considered “nonstandard” (read: “bad”) not to assume that your agents are either a single unitary “representative agent” or a continuum of infinitely-many agents—modeling the actual fact of finitely-many agents is just not done. Yet it’s hard for me to imagine any formal criterion that wouldn’t at least give you some points for correctly including the fact that there is more than one but less than infinity people in the world (obviously your model could still be bad in other ways).

I don’t know what these new statistical methods would look like. Maybe it’s as simple as formally justifying some of the norms we already use; maybe it’s as complicated as taking a fundamentally new approach to statistical inference. But we have to start somewhere.

Demystifying dummy variables

Nov 5, JDN 2458062

Continuing my series of blog posts on basic statistical concepts, today I’m going to talk about dummy variables. Dummy variables are quite simple, but for some reason a lot of people—even people with extensive statistical training—often have trouble understanding them. Perhaps people are simply overthinking matters, or making subtle errors that end up having large consequences.

A dummy variable (more formally a binary variable) is a variable that has only two states: “No”, usually represented 0, and “Yes”, usually represented 1. A dummy variable answers a single “Yes or no” question. They are most commonly used for categorical variables, answering questions like “Is the person’s race White?” and “Is the state California?”; but in fact almost any kind of data can be represented this way: We could represent income using a series of dummy variables like “Is your income greater than $50,000?” “Is your income greater than $51,000?” and so on. As long as the number of possible outcomes is finite—which, in practice, it always is—the data can be represented by some (possibly large) set of dummy variables. In fact, if your data set is large enough, representing numerical data with dummy variables can be a very good thing to do, as it allows you to account for nonlinear effects without assuming some specific functional form.
Most of the misunderstanding regarding dummy variables involves applying them in regressions and interpreting the results.
Probably the most common confusion is about what dummy variables to include. When you have a set of categories represented in your data (e.g. one for each US state), you want to include dummy variables for all but one of them. The most common mistake here is to try to include all of them, and end up with a regression that doesn’t make sense, or if you have a catchall category like “Other” (e.g. race is coded as “White/Black/Other”), leaving out that one and getting results with a nonsensical baseline.

You don’t have to leave one out if you only have one set of categories and you don’t include a constant in your regression; then the baseline will emerge automatically from the regression. But this is dangerous, as the interpretation of the coefficients is no longer quite so simple.

The thing to keep in mind is that a coefficient on a dummy variable is an effect of a change—so the coefficient on “White” is the effect of being White. In order to be an effect of a change, that change must be measured against some baseline. The dummy variable you exclude from the regression is the baseline—because the effect of changing to the baseline from the baseline is by definition zero.
Here’s a very simple example where all the regressions can be done by hand. Suppose you have a household with 1 human and 1 cat, and you want to know the effect of species on number of legs. (I mean, hopefully this is something you already know; but that makes it a good illustration.) In what follows, you can safely skip the matrix algebra; but I included it for any readers who want to see how these concepts play out mechanically in the math.
Your outcome variable Y is legs: The human has 2 and the cat has 4. We can write this as a matrix:

\[ Y = \begin{bmatrix} 2 \\ 4 \end{bmatrix} \]

reg_1

What dummy variables should we choose? There are actually several options.

 

The simplest option is to include both a human variable and a cat variable, and no constant. Let’s put the human variable first. Then our human subject has a value of X1 = [1 0] (“Yes” to human and “No” to cat) and our cat subject has a value of X2 = [0 1].

This is very nice in this case, as it makes our matrix of independent variables simply an identity matrix:

\[ X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

reg_2

This makes the calculations extremely nice, because transposing, multiplying, and inverting an identity matrix all just give us back an identity matrix. The standard OLS regression coefficient is B = (X’X)-1 X’Y, which in this case just becomes Y itself.

\[ B = (X’X)^{-1} X’Y = Y = \begin{bmatrix} 2 \\ 4 \end{bmatrix} \]

reg_3

Our coefficients are 2 and 4. How would we interpret this? Pretty much what you’d think: The effect of being human is having 2 legs, while the effect of being a cat is having 4 legs. This amounts to choosing a baseline of nothing—the effect is compared to a hypothetical entity with no legs at all. And indeed this is what will happen more generally if you do a regression with a dummy for each category and no constant: The baseline will be a hypothetical entity with an outcome of zero on whatever your outcome variable is.
So far, so good.

But what if we had additional variables to include? Say we have both cats and humans with black hair and brown hair (and no other colors). If we now include the variables human, cat, black hair, brown hair, we won’t get the results we expect—in fact, we’ll get no result at all. The regression is mathematically impossible, regardless of how large a sample we have.

This is why it’s much safer to choose one of the categories as a baseline, and include that as a constant. We could pick either one; we just need to be clear about which one we chose.

Say we take human as the baseline. Then our variables are constant and cat. The variable constant is just 1 for every single individual. The variable cat is 0 for humans and 1 for cats.

Now our independent variable matrix looks like this:

\[ X = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix} \]

reg_4
The matrix algebra isn’t quite so nice this time:

\[ X’X = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \]

\[ (X’X)^{-1} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \]

\[ X’Y = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ 4 \end{bmatrix} \]

\[ B = (X’X)^{-1} X’Y = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 4 \end{bmatrix} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} \]

reg_5

Our coefficients are now 2 and 2. Now, how do we interpret that result? We took human as the baseline, so what we are saying here is that the default is to have 2 legs, and then the effect of being a cat is to get 2 extra legs.
That sounds a bit anthropocentric—most animals are quadripeds, after all—so let’s try taking cat as the baseline instead. Now our variables are constant and human, and our independent variable matrix looks like this:

\[ X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \]

\[ X’X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \]

\[ (X’X)^{-1} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \]

\[ X’Y = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ 2 \end{bmatrix} \]

\[ B = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 2 \end{bmatrix} = \begin{bmatrix} 4 \\ -2 \end{bmatrix} \]

reg_6

Our coefficients are 4 and -2. This seems much more phylogenetically correct: The default number of legs is 4, and the effect of being human is to lose 2 legs.
All these regressions are really saying the same thing: Humans have 2 legs, cats have 4. And in this particular case, it’s simple and obvious. But once things start getting more complicated, people tend to make mistakes even on these very simple questions.

A common mistake would be to try to include a constant and both dummy variables: constant human cat. What happens if we try that? The matrix algebra gets particularly nasty, first of all:

\[ X = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \]

\[ X’X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \]

reg_7

Our covariance matrix X’X is now 3×3, first of all. That means we have more coefficients than we have data points. But we could throw in another human and another cat to fix that problem.

 

More importantly, the covariance matrix is not invertible. Rows 2 and 3 add up together to equal row 1, so we have a singular matrix.

If you tried to run this regression, you’d get an error message about “perfect multicollinearity”. What this really means is you haven’t chosen a valid baseline. Your baseline isn’t human and it isn’t cat; and since you included a constant, it isn’t a baseline of nothing either. It’s… unspecified.

You actually can choose whatever baseline you want for this regression, by setting the constant term to whatever number you want. Set a constant of 0 and your baseline is nothing: you’ll get back the coefficients 0, 2 and 4. Set a constant of 2 and your baseline is human: you’ll get 2, 0 and 2. Set a constant of 4 and your baseline is cat: you’ll get 4, -2, 0. You can even choose something weird like 3 (you’ll get 3, -1, 1) or 7 (you’ll get 7, -5, -3) or -4 (you’ll get -4, 6, 8). You don’t even have to choose integers; you could pick -0.9 or 3.14159. As long as the constant plus the coefficient on human add to 2 and the constant plus the coefficient on cat add to 4, you’ll get a valid regression.
Again, this example seems pretty simple. But it’s an easy trap to fall into if you don’t think carefully about what variables you are including. If you are looking at effects on income and you have dummy variables on race, gender, schooling (e.g. no high school, high school diploma, some college, Bachelor’s, master’s, PhD), and what state a person lives in, it would be very tempting to just throw all those variables into a regression and see what comes out. But nothing is going to come out, because you haven’t specified a baseline. Your baseline isn’t even some hypothetical person with $0 income (which already doesn’t sound like a great choice); it’s just not a coherent baseline at all.

Generally the best thing to do (for the most precise estimates) is to choose the most common category in each set as the baseline. So for the US a good choice would be to set the baseline as White, female, high school diploma, California. Another common strategy when looking at discrimination specifically is to make the most privileged category the baseline, so we’d instead have White, male, PhD, and… Maryland, it turns out. Then we expect all our coefficients to be negative: Your income is generally lower if you are not White, not male, have less than a PhD, or live outside Maryland.

This is also important if you are interested in interactions: For example, the effect on your income of being Black in California is probably not the same as the effect of being Black in Mississippi. Then you’ll want to include terms like Black and Mississippi, which for dummy variables is the same thing as taking the Black variable and multiplying by the Mississippi variable.

But now you need to be especially clear about what your baseline is: If being White in California is your baseline, then the coefficient on Black is the effect of being Black in California, while the coefficient on Mississippi is the effect of being in Mississippi if you are White. The coefficient on Black and Mississippi is the effect of being Black in Mississippi, over and above the sum of the effects of being Black and the effect of being in Mississippi. If we saw a positive coefficient there, it wouldn’t mean that it’s good to be Black in Mississippi; it would simply mean that it’s not as bad as we might expect if we just summed the downsides of being Black with the downsides of being in Mississippi. And if we saw a negative coefficient there, it would mean that being Black in Mississippi is even worse than you would expect just from summing up the effects of being Black with the effects of being in Mississippi.

As long as you choose your baseline carefully and stick to it, interpreting regressions with dummy variables isn’t very hard. But so many people forget this step that they get very confused by the end, looking at a term like Black female Mississippi and seeing a positive coefficient, and thinking that must mean that life is good for Black women in Mississippi, when really all it means is the small mercy that being a Black woman in Mississippi isn’t quite as bad as you might think if you just added up the effect of being Black, plus the effect of being a woman, plus the effect of being Black and a woman, plus the effect of living in Mississippi, plus the effect of being Black in Mississippi, plus the effect of being a woman in Mississippi.

 

A tale of two storms

Sep 24, JDN 2458021

There were two severe storm events this past week; one you probably heard a great deal about, the other, probably not. The first was Hurricane Irma, which hit the United States and did most of its damage in Florida; the second was Typhoon Doksuri, which hit Southeast Asia and did most of its damage in Vietnam.

You might expect that this post is going to give you more bad news. Well, I have a surprise for you: The news is actually mostly good.

The death tolls from both storms were astonishingly small. The hurricane is estimated to have killed at least 84 people, while the typhoon has killed at least 26. This result is nothing less than heroism. The valiant efforts of thousands of meteorologists and emergency responders around the world has saved thousands of lives, and did so both in the wealthy United States and in impoverished Vietnam.

When I started this post, I had expected to see that the emergency response in Vietnam would be much worse, and fatalities would be far higher; I am delighted to report that nothing of the sort was the case, and Vietnam, despite their per-capita GDP PPP of under $6,000, has made emergency response a sufficiently high priority that they saved their people just about as well as Florida did.

To get a sense of what might have happened without them, consider that 1.5 million homes in Florida were leveled by the hurricane, and over 100,000 homes were damaged by the typhoon. Vietnam is a country of 94 million people. Florida has a population of 20 million. (The reason Florida determines so many elections is that it is by far the most populous swing state.) Without weather forecasting and emergency response, these death figures would have been in the tens of thousands, not the dozens.

Indeed, if you know statistics and demographics well, these figures become even more astonishing: These death rates were almost indistinguishable from statistical noise.

Vietnam’s baseline death rate is about 5.9 per 1,000, meaning that they experience about 560,000 deaths in any given year. This means that over 1500 people die in Vietnam on a normal day.

Florida’s baseline death rate is about 6.6 per 1,000, actually a bit higher than Vietnam’s, because Florida’s population skews so much toward the elderly. Therefore Florida experiences about 130,000 deaths per year, or 360 deaths on a normal day.

In both Vietnam and Florida, this makes the daily death probability for any given person about 0.0017%. A random process with a fixed probability of 0.0017% over a population of n people will result in an average of 0.0017n events, but with some variation around that number. The standard deviation is actually sqrt(p(1-p)n) = 0.004 sqrt(n). When n = 20,000,000 (Florida), this results in a standard deviation of 18. When n = 94,000,000 (Vietnam), this results in a standard deviation of 40.

This means that the 26 additional deaths in Vietnam were within one standard deviation of average! They basically are indistinguishable from statistical noise. There have been over a hundred days in Vietnam where an extra 26 people happened to die, just in the past year. Weather forecasting took what could have been a historic disaster and turned it into just another bad day.

The 84 additional deaths in Florida are over four standard deviations away from average, so they are definitely distinguishable from statistical noise—but this still means that Florida’s total death rate for the year will only tick up by 0.6%.

It is common in such tragedies to point out in grave tones that “one death is too many”, but I maintain that this is not actually moral wisdom but empty platitude. No conceivable policy is ever going to reduce death rates to zero, and the people who died of heart attacks or brain aneurysms are every bit as dead as the people who died from hurricanes or terrorist attacks. Instead of focusing on the handful of people who died because they didn’t heed warnings or simply got extraordinarily unlucky, I think we should be focusing on the thousands of people who survived because our weather forecasters and emergency responders did their jobs so exceptionally well. Of course if we can reduce the numbers even further, we should; but from where I’m sitting, our emergency response system has a lot to be proud of.

Of course, the economic damage of the storms was substantially greater. The losses in destroyed housing and infrastructure in Florida are projected at over $80 billion. Vietnam is much poorer, so there simply isn’t as much infrastructure to destroy; total damage is unlikely to exceed $10 billion. Florida’s GDP is $926 billion, so they are losing 8.6%; while Vietnam’s GDP is $220 billion, so they are still losing 4.5%. And of course the damage isn’t evenly spread across everyone; those hardest hit will lose perhaps their entire net wealth, while others will feel absolutely nothing.

But economic damage is fleeting. Indeed, if we spend the government money we should be, and take the opportunity to rebuild this infrastructure better than it was before, the long-run economic impact could be positive. Even 8.6% of GDP is less than five years of normal economic growth—and there were years in the 1950s where we did it in a single year. The 4.6% that Vietnam lost, they should make back within a year of their current economic growth.

Thank goodness.

Why do so many Americans think that crime is increasing?

Jan 29, JDN 2457783

Since the 1990s, crime in United States has been decreasing, and yet in every poll since then most Americans report that they believe that crime is increasing.

It’s not a small decrease either. The US murder rate is down to the lowest it has been in a century. There are now a smaller absolute number (by 34 log points) of violent crimes per year in the US than there were 20 years ago, despite a significant increase in total population (19 log points—and the magic of log points is that, yes, the rate has decreased by precisely 53 log points).

It isn’t geographically uniform, of course; some states have improved much more than others, and a few states (such as New Mexico) have actually gotten worse.

The 1990s were a peak of violent crime, so one might say that we are just regressing to the mean. (Even that would be enough to make it baffling that people think crime is increasing.) But in fact overall crime in the US is now the lowest it has been since the 1970s, and still decreasing.

Indeed, this decrease has been underestimated, because we are now much better about reporting and investigating crimes than we used to be (which may also be part of why they are decreasing, come to think of it). If you compare against surveys of people who say they have been personally victimized, we’re looking at a decline in violent crime rates of two thirds—109 log points.

Just since 2008 violent crime has decreased by 26% (30 log points)—but of course we all know that Obama is “soft on crime” because he thinks cops shouldn’t be allowed to just shoot Black kids for no reason.

And yet, over 60% of Americans believe that overall crime in the US has increased in the last 10 years (though only 38% think it has increased in their own community!). These figures are actually down from 2010, when 66% thought crime was increasing nationally and 49% thought it was increasing in their local area.

The proportion of people who think crime is increasing does seem to decrease as crime rates decrease—but it still remains alarmingly high. If people were half as rational as most economists seem to believe, the proportion of people who think crime is increasing should drop to basically zero whenever crime rates decrease, since that’s a really basic fact about the world that you can just go look up on the Web in a couple of minutes. There’s no deep ambiguity, not even much “rational ignorance” given the low cost of getting correct answers. People just don’t bother to check, or don’t feel they need to.
What’s going on? How can crime fall to half what it was 20 years ago and yet almost two-thirds of people think it’s actually increasing?

Well, one hint is that news coverage of crime doesn’t follow the same pattern as actual crime.

News coverage in general is a terrible source of information, not simply because news organizations can be biased, make glaring mistakes, and sometimes outright lie—but actually for a much more fundamental reason: Even a perfect news channel, qua news channel, would report what is surprising—and what is surprising is, by definition, improbable. (Indeed, there is a formal mathematical concept in probability theory called surprisal that is simply the logarithm of 1 over the probability.) Even assuming that news coverage reports only the truth, the probability of seeing something on the news isn’t proportional to the probability of the event occurring—it’s more likely proportional to the entropy, which is probability times surprisal.

Now, if humans were optimal information processing engines, that would be just fine, actually; reporting events proportional to their entropy is actually a very efficient mechanism for delivering information (optimal, under certain types of constraints), provided that you can then process the information back into probabilities afterward.

But of course, humans aren’t optimal information processing engines. We don’t recompute the probabilities from the given entropy; instead we use the availability heuristic, by which we simply use the number of times we can think of something happening as our estimate of the probability of that event occurring. If you see more murders on TV news than you used you, you assume that murders must be more common than they used to be. (And when I put it like that, it really doesn’t sound so unreasonable, does it? Intuitively the availability heuristic seems to make sense—which is part of why it’s so insidious.)

Another likely reason for the discrepancy between perception and reality is nostalgia. People almost always have a more positive view of the past than it deserves, particularly when referring to their own childhoods. Indeed, I’m quite certain that a major reason why people think the world was much better when they were kids was that their parents didn’t tell them what was going on. And of course I’m fine with that; you don’t need to burden 4-year-olds with stories of war and poverty and terrorism. I just wish people would realize that they were being protected from the harsh reality of the world, instead of thinking that their little bubble of childhood innocence was a genuinely much safer world than the one we live in today.

Then take that nostalgia and combine it with the availability heuristic and the wall-to-wall TV news coverage of anything bad that happens—and almost nothing good that happens, certainly not if it’s actually important. I’ve seen bizarre fluff pieces about puppies, but never anything about how world hunger is plummeting or air quality is dramatically improved or cars are much safer. That’s the one thing I will say about financial news; at least they report it when unemployment is down and the stock market is up. (Though most Americans, especially most Republicans, still seem really confused on those points as well….) They will attribute it to anything from sunspots to the will of Neptune, but at least they do report good news when it happens. It’s no wonder that people are always convinced that the world is getting more dangerous even as it gets safer and safer.

The real question is what we do about it—how do we get people to understand even these basic facts about the world? I still believe in democracy, but when I see just how painfully ignorant so many people are of such basic facts, I understand why some people don’t. The point of democracy is to represent everyone’s interests—but we also end up representing everyone’s beliefs, and sometimes people’s beliefs just don’t line up with reality. The only way forward I can see is to find a way to make people’s beliefs better align with reality… but even that isn’t so much a strategy as an objective. What do I say to someone who thinks that crime is increasing, beyond showing them the FBI data that clearly indicates otherwise? When someone is willing to override all evidence with what they feel in their heart to be true, what are the rest of us supposed to do?

Experimentally testing categorical prospect theory

Dec 4, JDN 2457727

In last week’s post I presented a new theory of probability judgments, which doesn’t rely upon people performing complicated math even subconsciously. Instead, I hypothesize that people try to assign categories to their subjective probabilities, and throw away all the information that wasn’t used to assign that category.

The way to most clearly distinguish this from cumulative prospect theory is to show discontinuity. Kahneman’s smooth, continuous function places fairly strong bounds on just how much a shift from 0% to 0.000001% can really affect your behavior. In particular, if you want to explain the fact that people do seem to behave differently around 10% compared to 1% probabilities, you can’t allow the slope of the smooth function to get much higher than 10 at any point, even near 0 and 1. (It does depend on the precise form of the function, but the more complicated you make it, the more free parameters you add to the model. In the most parsimonious form, which is a cubic polynomial, the maximum slope is actually much smaller than this—only 2.)

If that’s the case, then switching from 0.% to 0.0001% should have no more effect in reality than a switch from 0% to 0.00001% would to a rational expected utility optimizer. But in fact I think I can set up scenarios where it would have a larger effect than a switch from 0.001% to 0.01%.

Indeed, these games are already quite profitable for the majority of US states, and they are called lotteries.

Rationally, it should make very little difference to you whether your odds of winning the Powerball are 0 (you bought no ticket) or 0.000000001% (you bought a ticket), even when the prize is $100 million. This is because your utility of $100 million is nowhere near 100 million times as large as your marginal utility of $1. A good guess would be that your lifetime income is about $2 million, your utility is logarithmic, the units of utility are hectoQALY, and the baseline level is about 100,000.

I apologize for the extremely large number of decimals, but I had to do that in order to show any difference at all. I have bolded where the decimals first deviate from the baseline.

Your utility if you don’t have a ticket is ln(20) = 2.9957322736 hQALY.

Your utility if you have a ticket is (1-10^-9) ln(20) + 10^-9 ln(1020) = 2.9957322775 hQALY.

You gain a whopping 40 microQALY over your whole lifetime. I highly doubt you could even perceive such a difference.

And yet, people are willing to pay nontrivial sums for the chance to play such lotteries. Powerball tickets sell for about $2 each, and some people buy tickets every week. If you do that and live to be 80, you will spend some $8,000 on lottery tickets during your lifetime, which results in this expected utility: (1-4*10^-6) ln(20-0.08) + 4*10^-6 ln(1020) = 2.9917399955 hQALY.
You have now sacrificed 0.004 hectoQALY, which is to say 0.4 QALY—that’s months of happiness you’ve given up to play this stupid pointless game.

Which shouldn’t be surprising, as (with 99.9996% probability) you have given up four months of your lifetime income with nothing to show for it. Lifetime income of $2 million / lifespan of 80 years = $25,000 per year; $8,000 / $25,000 = 0.32. You’ve actually sacrificed slightly more than this, which comes from your risk aversion.

Why would anyone do such a thing? Because while the difference between 0 and 10^-9 may be trivial, the difference between “impossible” and “almost impossible” feels enormous. “You can’t win if you don’t play!” they say, but they might as well say “You can’t win if you do play either.” Indeed, the probability of winning without playing isn’t zero; you could find a winning ticket lying on the ground, or win due to an error that is then upheld in court, or be given the winnings bequeathed by a dying family member or gifted by an anonymous donor. These are of course vanishingly unlikely—but so was winning in the first place. You’re talking about the difference between 10^-9 and 10^-12, which in proportional terms sounds like a lot—but in absolute terms is nothing. If you drive to a drug store every week to buy a ticket, you are more likely to die in a car accident on the way to the drug store than you are to win the lottery.

Of course, these are not experimental conditions. So I need to devise a similar game, with smaller stakes but still large enough for people’s brains to care about the “almost impossible” category; maybe thousands? It’s not uncommon for an economics experiment to cost thousands, it’s just usually paid out to many people instead of randomly to one person or nobody. Conducting the experiment in an underdeveloped country like India would also effectively amplify the amounts paid, but at the fixed cost of transporting the research team to India.

But I think in general terms the experiment could look something like this. You are given $20 for participating in the experiment (we treat it as already given to you, to maximize your loss aversion and endowment effect and thereby give us more bang for our buck). You then have a chance to play a game, where you pay $X to get a P probability of $Y*X, and we vary these numbers.

The actual participants wouldn’t see the variables, just the numbers and possibly the rules: “You can pay $2 for a 1% chance of winning $200. You can also play multiple times if you wish.” “You can pay $10 for a 5% chance of winning $250. You can only play once or not at all.”

So I think the first step is to find some dilemmas, cases where people feel ambivalent, and different people differ in their choices. That’s a good role for a pilot study.

Then we take these dilemmas and start varying their probabilities slightly.

In particular, we try to vary them at the edge of where people have mental categories. If subjective probability is continuous, a slight change in actual probability should never result in a large change in behavior, and furthermore the effect of a change shouldn’t vary too much depending on where the change starts.

But if subjective probability is categorical, these categories should have edges. Then, when I present you with two dilemmas that are on opposite sides of one of the edges, your behavior should radically shift; while if I change it in a different way, I can make a large change without changing the result.

Based solely on my own intuition, I guessed that the categories roughly follow this pattern:

Impossible: 0%

Almost impossible: 0.1%

Very unlikely: 1%

Unlikely: 10%

Fairly unlikely: 20%

Roughly even odds: 50%

Fairly likely: 80%

Likely: 90%

Very likely: 99%

Almost certain: 99.9%

Certain: 100%

So for example, if I switch from 0%% to 0.01%, it should have a very large effect, because I’ve moved you out of your “impossible” category (indeed, I think the “impossible” category is almost completely sharp; literally anything above zero seems to be enough for most people, even 10^-9 or 10^-10). But if I move from 1% to 2%, it should have a small effect, because I’m still well within the “very unlikely” category. Yet the latter change is literally one hundred times larger than the former. It is possible to define continuous functions that would behave this way to an arbitrary level of approximation—but they get a lot less parsimonious very fast.

Now, immediately I run into a problem, because I’m not even sure those are my categories, much less that they are everyone else’s. If I knew precisely which categories to look for, I could tell whether or not I had found it. But the process of both finding the categories and determining if their edges are truly sharp is much more complicated, and requires a lot more statistical degrees of freedom to get beyond the noise.

One thing I’m considering is assigning these values as a prior, and then conducting a series of experiments which would adjust that prior. In effect I would be using optimal Bayesian probability reasoning to show that human beings do not use optimal Bayesian probability reasoning. Still, I think that actually pinning down the categories would require a large number of participants or a long series of experiments (in frequentist statistics this distinction is vital; in Bayesian statistics it is basically irrelevant—one of the simplest reasons to be Bayesian is that it no longer bothers you whether someone did 2 experiments of 100 people or 1 experiment of 200 people, provided they were the same experiment of course). And of course there’s always the possibility that my theory is totally off-base, and I find nothing; a dissertation replicating cumulative prospect theory is a lot less exciting (and, sadly, less publishable) than one refuting it.

Still, I think something like this is worth exploring. I highly doubt that people are doing very much math when they make most probabilistic judgments, and using categories would provide a very good way for people to make judgments usefully with no math at all.

How I wish we measured percentage change

JDN 2457415

For today’s post I’m taking a break from issues of global policy to discuss a bit of a mathematical pet peeve. It is an opinion I share with many economists—for instance Miles Kimball has a very nice post about it, complete with some clever analogies to music.

I hate when we talk about percentages in asymmetric terms.

What do I mean by this? Well, here are a few examples.

If my stock portfolio loses 10% one year and then gains 11% the following year, have I gained or lost money? I’ve lost money. Only a little bit—I’m down 0.1%—but still, a loss.

In 2003, Venezuela suffered a depression of -26.7% growth one year, and then an economic boom of 36.1% growth the following year. What was their new GDP, relative to what it was before the depression? Very slightly less than before. (99.8% of its pre-recession value, to be precise.) You would think that falling 27% and rising 36% would leave you about 9% ahead; in fact it leaves you behind.

Would you rather live in a country with 11% inflation and have constant nominal pay, or live in a country with no inflation and take a 10% pay cut? You should prefer the inflation; in that case your real income only falls by 9.9%, instead of 10%.

We often say that the real interest rate is simply the nominal interest rate minus the rate of inflation, but that’s actually only an approximation. If you have 7% inflation and a nominal interest rate of 11%, your real interest rate is not actually 4%; it is 3.74%. If you have 2% inflation and a nominal interest rate of 0%, your real interest rate is not actually -2%; it is -1.96%.

This is what I mean by asymmetric:

Rising 10% and falling 10% do not cancel each other out. To cancel out a fall of 10%, you must actually rise 11.1%.

Gaining 20% and losing 20% do not cancel each other out. To cancel out a loss of 20%, you need a gain of 25%.

Is it starting to bother you yet? It sure bothers me.

Worst of all is the fact that the way we usually measure percentages, losses are bounded at 100% while gains are unbounded. To cancel a loss of 100%, you’d need a gain of infinity.

There are two basic ways of solving this problem: The simple way, and the good way.

The simple way is to just start measuring percentages symmetrically, by including both the starting and ending values in the calculation and averaging them.
That is, instead of using this formula:

% change = 100% * (new – old)/(old)

You use this one:

% change = 100% * (new – old)/((new + old)/2)

In this new system, percentage changes are symmetric.

Suppose a country’s GDP rises from $5 trillion to $6 trillion.

In the old system we’d say it has risen 20%:

100% * ($6 T – $5 T)/($5 T) = 20%

In the symmetric system, we’d say it has risen 18.2%:

100% * ($6 T – $5 T)/($5.5 T) = 18.2%

Suppose it falls back to $5 trillion the next year.

In the old system we’d say it has only fallen 16.7%:

100% * ($5 T – $6 T)/($6 T) = -16.7%

But in the symmetric system, we’d say it has fallen 18.2%.

100% * ($5 T – $6 T)/($5.5 T) = -18.2%

In the old system, the gain of 20% was somehow canceled by a loss of 16.7%. In the symmetric system, the gain of 18.2% was canceled by a loss of 18.2%, just as you’d expect.

This also removes the problem of losses being bounded but gains being unbounded. Now both losses and gains are bounded, at the rather surprising value of 200%.

Formally, that’s because of these limits:
lim_{x rightarrow infty} {(x-1) over {(x+1)/2}} = 2

lim_{x rightarrow infty} {(0-x) over {(x+0)/2}} = -2

It might be easier to intuit these limits with an example. Suppose something explodes from a value of 1 to a value of 10,000,000. In the old system, this means it rose 1,000,000,000%. In the symmetric system, it rose 199.9999%. Like the speed of light, you can approach 200%, but never quite get there.

100% * (10^7 – 1)/(5*10^6 + 0.5) = 199.9999%

Gaining 200% in the symmetric system is gaining an infinite amount. That’s… weird, to say the least. Also, losing everything is now losing… 200%?

This is simple to explain and compute, but it’s ultimately not the best way.

The best way is to use logarithms.

As you may vaguely recall from math classes past, logarithms are the inverse of exponents.

Since 2^4 = 16, log_2 (16) = 4.

The natural logarithm ln() is the most fundamental for deep mathematical reasons I don’t have room to explain right now. It uses the base e, a transcendental number that starts 2.718281828459045…

To the uninitiated, this probably seems like an odd choice—no rational number has a natural logarithm that is itself a rational number (well, other than 1, since ln(1) = 0).

But perhaps it will seem a bit more comfortable once I show you that natural logarithms are remarkably close to percentages, particularly for the small changes in which percentages make sense.

We define something called log points such that the change in log points is 100 times the natural logarithm of the ratio of the two:

log points = 100 * ln(new / old)

This is symmetric because of the following property of logarithms:

ln(a/b) = – ln(b/a)

Let’s return to the country that saw its GDP rise from $5 trillion to $6 trillion.

The logarithmic change is 18.2 log points:

100 * ln($6 T / $5 T) = 100 * ln(1.2) = 18.2

If it falls back to $5 T, the change is -18.2 log points:

100 * ln($5 T / $6 T) = 100 * ln(0.833) = -18.2

Notice how in the symmetric percentage system, it rose and fell 18.2%; and in the logarithmic system, it rose and fell 18.2 log points. They are almost interchangeable, for small percentages.

In this graph, the old value is assumed to be 1. The horizontal axis is the new value, and the vertical axis is the percentage change we would report by each method.

percentage_change_small

The green line is the usual way we measure percentages.

The red curve is the symmetric percentage method.

The blue curve is the logarithmic method.

For percentages within +/- 10%, all three methods are about the same. Then both new methods give about the same answer all the way up to changes of +/- 40%. Since most real changes in economics are within that range, the symmetric method and the logarithmic method are basically interchangeable.

However, for very large changes, even these two methods diverge, and in my opinion the logarithm is to be preferred.

percentage_change_large

The symmetric percentage never gets above 200% or below -200%, while the logarithm is unbounded in both directions.

If you lose everything, the old system would say you have lost 100%. The symmetric system would say you have lost 200%. The logarithmic system would say you have lost infinity log points. If infinity seems a bit too extreme, think of it this way: You have in fact lost everything. No finite proportional gain can ever bring it back. A loss that requires a gain of infinity percent seems like it should be called a loss of infinity percent, doesn’t it? Under the logarithmic system it is.

If you gain an infinite amount, the old system would say you have gained infinity percent. The logarithmic system would also say that you have gained infinity log points. But the symmetric percentage system would say that you have gained 200%. 200%? Counter-intuitive, to say the least.

Log points also have another very nice property that neither the usual system nor the symmetric percentage system have: You can add them.

If you gain 25 log points, lose 15 log points, then gain 10 log points, you have gained 20 log points.

25 – 15 + 10 = 20

Just as you’d expect!

But if you gain 25%, then lose 15%, and then gain 10%, you have gained… 16.9%.

(1 + 0.25)*(1 – 0.15)*(1 + 0.10) = 1.169

If you gain 25% symmetric, lose 15% symmetric, then gain 10% symmetric, that calculation is really a pain. To find the value y that is p symmetric percentage points from the starting value x, you end up needing to solve this equation:

p = 100 * (y – x)/((x+y)/2)

This can be done; it comes out like this:

y = (200 + p)/(200 – p) * x

(This also gives a bit of insight into why it is that the bounds are +/- 200%.)

So by chaining those, we can in fact find out what happens after gaining 25%, losing 15%, then gaining 10% in the symmetric system:

(200 + 25)/(200 – 25)*(200 – 15)/(200 + 15)*(200 + 10)/(200 – 10) = 1.223

Then we can put that back into the symmetric system:

100% * (1.223 – 1)/((1+1.223)/2) = 20.1%

So after all that work, we find out that you have gained 20.1% symmetric. We could almost just add them—because they are so similar to log points—but we can’t quite.

Log points actually turn out to be really convenient, once you get the hang of them. The problem is that there’s a conceptual leap for most people to grasp what a logarithm is in the first place.

In particular, the hardest part to grasp is probably that a doubling is not 100 log points.

It is in fact 69 log points, because ln(2) = 0.69.

(Doubling in the symmetric percentage system is gaining 67%—much closer to the log points than to the usual percentage system.)

Calculation of the new value is a bit more difficult than in the usual system, but not as difficult as in the symmetric percentage system.

If you have a change of p log points from a starting point of x, the ending point y is:

y = e^{p/100} * x

The fact that you can add log points ultimately comes from the way exponents add:

e^{p1/100} * e^{p2/100} = e^{(p1+p2)/100}

Suppose US GDP grew 2% in 2007, then 0% in 2008, then fell 8% in 2009 and rose 4% in 2010 (this is approximately true). Where was it in 2010 relative to 2006? Who knows, right? It turns out to be a net loss of 2.4%; so if it was $15 T before it’s now $14.63 T. If you had just added, you’d think it was only down 2%; you’d have underestimated the loss by $70 billion.

But if it had grown 2 log points, then 0 log points, then fell 8 log points, then rose 4 log points, the answer is easy: It’s down 2 log points. If it was $15 T before, it’s now $14.70 T. Adding gives the correct answer this time.

Thus, instead of saying that the stock market fell 4.3%, we should say it fell 4.4 log points. Instead of saying that GDP is up 1.9%, we should say it is up 1.8 log points. For small changes it won’t even matter; if inflation is 1.4%, it is in fact also 1.4 log points. Log points are a bit harder to conceptualize; but they are symmetric and additive, which other methods are not.

Is this a matter of life and death on a global scale? No.

But I can’t write about those every day, now can I?

What does correlation have to do with causation?

JDN 2457345

I’ve been thinking of expanding the topics of this blog into some basic statistics and econometrics. It has been said that there are “Lies, damn lies, and statistics”; but in fact it’s almost the opposite—there are truths, whole truths, and statistics. Almost everything in the world that we know—not merely guess, or suppose, or intuit, or believe, but actually know, with a quantifiable level of certainty—is done by means of statistics. All sciences are based on them, from physics (when they say the Higgs discovery is a “5-sigma event”, that’s a statistic) to psychology, ecology to economics. Far from being something we cannot trust, they are in a sense the only thing we can trust.

The reason it sometimes feels like we cannot trust statistics is that most people do not understand statistics very well; this creates opportunities for both accidental confusion and willful distortion. My hope is therefore to provide you with some of the basic statistical knowledge you need to combat the worst distortions and correct the worst confusions.

I wasn’t quite sure where to start on this quest, but I suppose I have to start somewhere. I figured I may as well start with an adage about statistics that I hear commonly abused: “Correlation does not imply causation.”

Taken at its original meaning, this is definitely true. Unfortunately, it can be easily abused or misunderstood.

In its original meaning, the formal sense of the word “imply” meaning logical implication, to “imply” something is an extremely strong statement. It means that you logically entail that result, that if the antecedent is true, the consequent must be true, on pain of logical contradiction. Logical implication is for most practical purposes synonymous with mathematical proof. (Unfortunately, it’s not quite synonymous, because of things like Gödel’s incompleteness theorems and Löb’s theorem.)

And indeed, correlation does not logically entail causation; it’s quite possible to have correlations without any causal connection whatsoever, simply by chance. One of my former professors liked to brag that from 1990 to 2010 whether or not she ate breakfast had a statistically significant positive correlation with that day’s closing price for the Dow Jones Industrial Average.

How is this possible? Did my professor actually somehow influence the stock market by eating breakfast? Of course not; if she could do that, she’d be a billionaire by now. And obviously the Dow’s price at 17:00 couldn’t influence whether she ate breakfast at 09:00. Could there be some common cause driving both of them, like the weather? I guess it’s possible; maybe in good weather she gets up earlier and people are in better moods so they buy more stocks. But the most likely reason for this correlation is much simpler than that: She tried a whole bunch of different combinations until she found two things that correlated. At the usual significance level of 0.05, on average you need to try about 20 combinations of totally unrelated things before two of them will show up as correlated. (My guess is she used a number of different stock indexes and varied the starting and ending year. That’s a way to generate a surprisingly large number of degrees of freedom without it seeming like you’re doing anything particularly nefarious.)

But how do we know they aren’t actually causally related? Well, I suppose we don’t. Especially if the universe is ultimately deterministic and nonlocal (as I’ve become increasingly convinced by the results of recent quantum experiments), any two data sets could be causally related somehow. But the point is they don’t have to be; you can pick any randomly-generated datasets, pair them up in 20 different ways, and odds are, one of those ways will show a statistically significant correlation.

All of that is true, and important to understand. Finding a correlation between eating grapefruit and getting breast cancer, or between liking bitter foods and being a psychopath, does not necessarily mean that there is any real causal link between the two. If we can replicate these results in a bunch of other studies, that would suggest that the link is real; but typically, such findings cannot be replicated. There is something deeply wrong with the way science journalists operate; they like to publish the new and exciting findings, which 9 times out of 10 turn out to be completely wrong. They never want to talk about the really important and fascinating things that we know are true because we’ve been confirming them over hundreds of different experiments, because that’s “old news”. The journalistic desire to be new and first fundamentally contradicts the scientific requirement of being replicated and confirmed.

So, yes, it’s quite possible to have a correlation that tells you absolutely nothing about causation.

But this is exceptional. In most cases, correlation actually tells you quite a bit about causation.

And this is why I don’t like the adage; “imply” has a very different meaning in common speech, meaning merely to suggest or evoke. Almost everything you say implies all sorts of things in this broader sense, some more strongly than others, even though it may logically entail none of them.

Correlation does in fact suggest causation. Like any suggestion, it can be overridden. If we know that 20 different combinations were tried until one finally yielded a correlation, we have reason to distrust that correlation. If we find a correlation between A and B but there is no logical way they can be connected, we infer that it is simply an odd coincidence.

But when we encounter any given correlation, there are three other scenarios which are far more likely than mere coincidence: A causes B, B causes A, or some other factor C causes A and B. These are also not mutually exclusive; they can all be true to some extent, and in many cases are.

A great deal of work in science, and particularly in economics, is based upon using correlation to infer causation, and has to be—because there is simply no alternative means of approaching the problem.

Yes, sometimes you can do randomized controlled experiments, and some really important new findings in behavioral economics and development economics have been made this way. Indeed, much of the work that I hope to do over the course of my career is based on randomized controlled experiments, because they truly are the foundation of scientific knowledge. But sometimes, that’s just not an option.

Let’s consider an example: In my master’s thesis I found a strong correlation between the level of corruption in a country (as estimated by the World Bank) and the proportion of that country’s income which goes to the top 0.01% of the population. Countries that have higher levels of corruption also tend to have a larger proportion of income that accrues to the top 0.01%. That correlation is a fact; it’s there. There’s no denying it. But where does it come from? That’s the real question.

Could it be pure coincidence? Well, maybe; but when it keeps showing up in several different models with different variables included, that becomes unlikely. A single p < 0.05 will happen about 1 in 20 times by chance; but five in a row should happen less than 1 in 1 million times (assuming they’re independent, which, to be fair, they usually aren’t).

Could it be some artifact of the measurement methods? It’s possible. In particular, I was concerned about the possibility of Halo Effect, in which people tend to assume that something which is better (or worse) in one way is automatically better (or worse) in other ways as well. People might think of their country as more corrupt simply because it has higher inequality, even if there is no real connection. But it would have taken a very large halo bias to explain this effect.

So, does corruption cause income inequality? It’s not hard to see how that might happen: More corrupt individuals could bribe leaders or exploit loopholes to make themselves extremely rich, and thereby increase inequality.

Does inequality cause corruption? This also makes some sense, since it’s a lot easier to bribe leaders and manipulate regulations when you have a lot of money to work with in the first place.

Does something else cause both corruption and inequality? Also quite plausible. Maybe some general cultural factors are involved, or certain economic policies lead to both corruption and inequality. I did try to control for such things, but I obviously couldn’t include all possible variables.

So, which way does the causation run? Unfortunately, I don’t know. I tried some clever statistical techniques to try to figure this out; in particular, I looked at which tends to come first—the corruption or the inequality—and whether they could be used to predict each other, a method called Granger causality. Those results were inconclusive, however. I could neither verify nor exclude a causal connection in either direction. But is there a causal connection? I think so. It’s too robust to just be coincidence. I simply don’t know whether A causes B, B causes A, or C causes A and B.

Imagine trying to do this same study as a randomized controlled experiment. Are we supposed to create two societies and flip a coin to decide which one we make more corrupt? Or which one we give more income inequality? Perhaps you could do some sort of experiment with a proxy for corruption (cheating on a test or something like that), and then have unequal payoffs in the experiment—but that is very far removed from how corruption actually works in the real world, and worse, it’s prohibitively expensive to make really life-altering income inequality within an experimental context. Sure, we can give one participant $1 and the other $1,000; but we can’t give one participant $10,000 and the other $10 million, and it’s the latter that we’re really talking about when we deal with real-world income inequality. I’m not opposed to doing such an experiment, but it can only tell us so much. At some point you need to actually test the validity of your theory in the real world, and for that we need to use statistical correlations.

Or think about macroeconomics; how exactly are you supposed to test a theory of the business cycle experimentally? I guess theoretically you could subject an entire country to a new monetary policy selected at random, but the consequences of being put into the wrong experimental group would be disastrous. Moreover, nobody is going to accept a random monetary policy democratically, so you’d have to introduce it against the will of the population, by some sort of tyranny or at least technocracy. Even if this is theoretically possible, it’s mind-bogglingly unethical.

Now, you might be thinking: But we do change real-world policies, right? Couldn’t we use those changes as a sort of “experiment”? Yes, absolutely; that’s called a quasi-experiment or a natural experiment. They are tremendously useful. But since they are not truly randomized, they aren’t quite experiments. Ultimately, everything you get out of a quasi-experiment is based on statistical correlations.

Thus, abuse of the adage “Correlation does not imply causation” can lead to ignoring whole subfields of science, because there is no realistic way of running experiments in those subfields. Sometimes, statistics are all we have to work with.

This is why I like to say it a little differently:

Correlation does not prove causation. But correlation definitely can suggest causation.