Daylight Savings Time is pointless and harmful

Nov 12, JDN 2458069

As I write this, Daylight Savings Time has just ended.

Sleep deprivation costs the developed world about 2% of GDP—on the order of $1 trillion per year. The US alone loses enough productivity from sleep deprivation that recovering this loss would give us enough additional income to end world hunger.

So, naturally, we have a ritual every year where we systematically impose an hour of sleep deprivation on the entire population for six months. This makes sense somehow.
The start of Daylight Savings Time each year is associated with a spike in workplace injuries, heart attacks, and suicide.

Nor does the “extra” hour of sleep we get in the fall compensate; in fact, it comes with its own downsides. Pedestrian fatalities spike immediately after the end of Daylight Savings Time; the rate of assault also rises at the end of DST, though it does also seem to fall when DST starts.

Daylight Savings Time was created to save energy. It does do that… technically. The total energy savings for the United States due to DST amounts to about 0.3% of our total electricity consumption. In some cases it can even increase energy use, though it does seem to smooth out electricity consumption over the day in a way that is useful for solar and wind power.

But this is a trivially small amount of energy savings, and there are far better ways to achieve it.

Simply due to new technologies and better policies, manufacturing in the US has reduced its energy costs per dollar of output by over 4% in the last few years. Simply getting all US states to use energy as efficiently as it is used in New York or California (not much climate similarity between those two states, but hmm… something about politics comes to mind…) would cut our energy consumption by about 30%.

The total amount of energy saved by DST is comparable to the amount of electricity now produced by small-scale residential photovoltaics—so simply doubling residential solar power production (which we’ve been doing every few years lately) would yield the same benefits as DST without the downsides. If we really got serious about solar power and adopted the policies necessary to get a per-capita solar power production comparable to Germany (not a very sunny place, mind you—Sacramento gets over twice the hours of sun per year that Berlin does), we would increase our solar power production by a factor of 10—five times the benefits of DST, none of the downsides.

Alternatively we could follow France’s model and get serious about nuclear fission. France produces over three hundred times as much energy from nuclear power as the US saves via Daylight Savings Time. Not coincidentally, France produces half as much CO2 per dollar of GDP as the United States.

Why would we persist in such a ridiculous policy, with such terrible downsides and almost no upside? To a first approximation, all human behavior is social norms.

Demystifying dummy variables

Nov 5, JDN 2458062

Continuing my series of blog posts on basic statistical concepts, today I’m going to talk about dummy variables. Dummy variables are quite simple, but for some reason a lot of people—even people with extensive statistical training—often have trouble understanding them. Perhaps people are simply overthinking matters, or making subtle errors that end up having large consequences.

A dummy variable (more formally a binary variable) is a variable that has only two states: “No”, usually represented 0, and “Yes”, usually represented 1. A dummy variable answers a single “Yes or no” question. They are most commonly used for categorical variables, answering questions like “Is the person’s race White?” and “Is the state California?”; but in fact almost any kind of data can be represented this way: We could represent income using a series of dummy variables like “Is your income greater than $50,000?” “Is your income greater than $51,000?” and so on. As long as the number of possible outcomes is finite—which, in practice, it always is—the data can be represented by some (possibly large) set of dummy variables. In fact, if your data set is large enough, representing numerical data with dummy variables can be a very good thing to do, as it allows you to account for nonlinear effects without assuming some specific functional form.
Most of the misunderstanding regarding dummy variables involves applying them in regressions and interpreting the results.
Probably the most common confusion is about what dummy variables to include. When you have a set of categories represented in your data (e.g. one for each US state), you want to include dummy variables for all but one of them. The most common mistake here is to try to include all of them, and end up with a regression that doesn’t make sense, or if you have a catchall category like “Other” (e.g. race is coded as “White/Black/Other”), leaving out that one and getting results with a nonsensical baseline.

You don’t have to leave one out if you only have one set of categories and you don’t include a constant in your regression; then the baseline will emerge automatically from the regression. But this is dangerous, as the interpretation of the coefficients is no longer quite so simple.

The thing to keep in mind is that a coefficient on a dummy variable is an effect of a change—so the coefficient on “White” is the effect of being White. In order to be an effect of a change, that change must be measured against some baseline. The dummy variable you exclude from the regression is the baseline—because the effect of changing to the baseline from the baseline is by definition zero.
Here’s a very simple example where all the regressions can be done by hand. Suppose you have a household with 1 human and 1 cat, and you want to know the effect of species on number of legs. (I mean, hopefully this is something you already know; but that makes it a good illustration.) In what follows, you can safely skip the matrix algebra; but I included it for any readers who want to see how these concepts play out mechanically in the math.
Your outcome variable Y is legs: The human has 2 and the cat has 4. We can write this as a matrix:

\[ Y = \begin{bmatrix} 2 \\ 4 \end{bmatrix} \]

reg_1

What dummy variables should we choose? There are actually several options.

 

The simplest option is to include both a human variable and a cat variable, and no constant. Let’s put the human variable first. Then our human subject has a value of X1 = [1 0] (“Yes” to human and “No” to cat) and our cat subject has a value of X2 = [0 1].

This is very nice in this case, as it makes our matrix of independent variables simply an identity matrix:

\[ X = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

reg_2

This makes the calculations extremely nice, because transposing, multiplying, and inverting an identity matrix all just give us back an identity matrix. The standard OLS regression coefficient is B = (X’X)-1 X’Y, which in this case just becomes Y itself.

\[ B = (X’X)^{-1} X’Y = Y = \begin{bmatrix} 2 \\ 4 \end{bmatrix} \]

reg_3

Our coefficients are 2 and 4. How would we interpret this? Pretty much what you’d think: The effect of being human is having 2 legs, while the effect of being a cat is having 4 legs. This amounts to choosing a baseline of nothing—the effect is compared to a hypothetical entity with no legs at all. And indeed this is what will happen more generally if you do a regression with a dummy for each category and no constant: The baseline will be a hypothetical entity with an outcome of zero on whatever your outcome variable is.
So far, so good.

But what if we had additional variables to include? Say we have both cats and humans with black hair and brown hair (and no other colors). If we now include the variables human, cat, black hair, brown hair, we won’t get the results we expect—in fact, we’ll get no result at all. The regression is mathematically impossible, regardless of how large a sample we have.

This is why it’s much safer to choose one of the categories as a baseline, and include that as a constant. We could pick either one; we just need to be clear about which one we chose.

Say we take human as the baseline. Then our variables are constant and cat. The variable constant is just 1 for every single individual. The variable cat is 0 for humans and 1 for cats.

Now our independent variable matrix looks like this:

\[ X = \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix} \]

reg_4
The matrix algebra isn’t quite so nice this time:

\[ X’X = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \]

\[ (X’X)^{-1} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \]

\[ X’Y = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ 4 \end{bmatrix} \]

\[ B = (X’X)^{-1} X’Y = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 4 \end{bmatrix} = \begin{bmatrix} 2 \\ 2 \end{bmatrix} \]

reg_5

Our coefficients are now 2 and 2. Now, how do we interpret that result? We took human as the baseline, so what we are saying here is that the default is to have 2 legs, and then the effect of being a cat is to get 2 extra legs.
That sounds a bit anthropocentric—most animals are quadripeds, after all—so let’s try taking cat as the baseline instead. Now our variables are constant and human, and our independent variable matrix looks like this:

\[ X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \]

\[ X’X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \]

\[ (X’X)^{-1} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \]

\[ X’Y = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ 2 \end{bmatrix} \]

\[ B = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 6 \\ 2 \end{bmatrix} = \begin{bmatrix} 4 \\ -2 \end{bmatrix} \]

reg_6

Our coefficients are 4 and -2. This seems much more phylogenetically correct: The default number of legs is 4, and the effect of being human is to lose 2 legs.
All these regressions are really saying the same thing: Humans have 2 legs, cats have 4. And in this particular case, it’s simple and obvious. But once things start getting more complicated, people tend to make mistakes even on these very simple questions.

A common mistake would be to try to include a constant and both dummy variables: constant human cat. What happens if we try that? The matrix algebra gets particularly nasty, first of all:

\[ X = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \]

\[ X’X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \]

reg_7

Our covariance matrix X’X is now 3×3, first of all. That means we have more coefficients than we have data points. But we could throw in another human and another cat to fix that problem.

 

More importantly, the covariance matrix is not invertible. Rows 2 and 3 add up together to equal row 1, so we have a singular matrix.

If you tried to run this regression, you’d get an error message about “perfect multicollinearity”. What this really means is you haven’t chosen a valid baseline. Your baseline isn’t human and it isn’t cat; and since you included a constant, it isn’t a baseline of nothing either. It’s… unspecified.

You actually can choose whatever baseline you want for this regression, by setting the constant term to whatever number you want. Set a constant of 0 and your baseline is nothing: you’ll get back the coefficients 0, 2 and 4. Set a constant of 2 and your baseline is human: you’ll get 2, 0 and 2. Set a constant of 4 and your baseline is cat: you’ll get 4, -2, 0. You can even choose something weird like 3 (you’ll get 3, -1, 1) or 7 (you’ll get 7, -5, -3) or -4 (you’ll get -4, 6, 8). You don’t even have to choose integers; you could pick -0.9 or 3.14159. As long as the constant plus the coefficient on human add to 2 and the constant plus the coefficient on cat add to 4, you’ll get a valid regression.
Again, this example seems pretty simple. But it’s an easy trap to fall into if you don’t think carefully about what variables you are including. If you are looking at effects on income and you have dummy variables on race, gender, schooling (e.g. no high school, high school diploma, some college, Bachelor’s, master’s, PhD), and what state a person lives in, it would be very tempting to just throw all those variables into a regression and see what comes out. But nothing is going to come out, because you haven’t specified a baseline. Your baseline isn’t even some hypothetical person with $0 income (which already doesn’t sound like a great choice); it’s just not a coherent baseline at all.

Generally the best thing to do (for the most precise estimates) is to choose the most common category in each set as the baseline. So for the US a good choice would be to set the baseline as White, female, high school diploma, California. Another common strategy when looking at discrimination specifically is to make the most privileged category the baseline, so we’d instead have White, male, PhD, and… Maryland, it turns out. Then we expect all our coefficients to be negative: Your income is generally lower if you are not White, not male, have less than a PhD, or live outside Maryland.

This is also important if you are interested in interactions: For example, the effect on your income of being Black in California is probably not the same as the effect of being Black in Mississippi. Then you’ll want to include terms like Black and Mississippi, which for dummy variables is the same thing as taking the Black variable and multiplying by the Mississippi variable.

But now you need to be especially clear about what your baseline is: If being White in California is your baseline, then the coefficient on Black is the effect of being Black in California, while the coefficient on Mississippi is the effect of being in Mississippi if you are White. The coefficient on Black and Mississippi is the effect of being Black in Mississippi, over and above the sum of the effects of being Black and the effect of being in Mississippi. If we saw a positive coefficient there, it wouldn’t mean that it’s good to be Black in Mississippi; it would simply mean that it’s not as bad as we might expect if we just summed the downsides of being Black with the downsides of being in Mississippi. And if we saw a negative coefficient there, it would mean that being Black in Mississippi is even worse than you would expect just from summing up the effects of being Black with the effects of being in Mississippi.

As long as you choose your baseline carefully and stick to it, interpreting regressions with dummy variables isn’t very hard. But so many people forget this step that they get very confused by the end, looking at a term like Black female Mississippi and seeing a positive coefficient, and thinking that must mean that life is good for Black women in Mississippi, when really all it means is the small mercy that being a Black woman in Mississippi isn’t quite as bad as you might think if you just added up the effect of being Black, plus the effect of being a woman, plus the effect of being Black and a woman, plus the effect of living in Mississippi, plus the effect of being Black in Mississippi, plus the effect of being a woman in Mississippi.

 

How rich are we, really?

Oct 29, JDN 2458056

The most commonly-used measure of a nation’s wealth is its per-capita GDP, which is simply a total of all spending in a country divided by its population. More recently we adjust for purchasing power, giving us GDP per capita at purchasing power parity (PPP).

By this measure, the United States always does well. At most a dozen countries are above us, most of them by a small amount, and all of them are quite small countries. (For fundamental statistical reasons, we should expect both the highest and lowest average incomes to be in the smallest countries.)

But this is only half the story: It tells us how much income a country has, but not how that income is distributed. We should adjust for inequality.

How can we do this? I have devised a method that uses the marginal utility of wealth plus a measure of inequality called the Gini coefficient to work out an estimate of the average utility, instead of the average income.

I then convert back into a dollar figure. This figure is the income everyone would need to have under perfect equality, in order to give the same real welfare as the current system. That is, if we could redistribute wealth in such a way to raise everyone above this value up to it, and lower everyone above this value down to it, the total welfare of the country would not change. This provides a well-founded ranking of which country’s people are actually better off overall, accounting for both overall income and the distribution of that income.

The estimate is sensitive to the precise form I use for marginal utility, so I’ll show you comparisons for three different cases.

The “conservative” estimate uses a risk aversion parameter of 1, which means that utility is logarithmic in income. The real value of a dollar is inversely proportional to the number of dollars you already have.

The medium estimate uses a risk aversion parameter of 2, which means that the real value of a dollar is inversely proportional to the square of the number of dollars you already have.

And then the “liberal” estimate uses a risk aversion parameter of 3, which means that the real value of a dollar is inversely proportional to the cube of the number of dollars you already have.

I’ll compare ten countries, which I think are broadly representative of classes of countries in the world today.

The United States, the world hegemon which needs no introduction.

China, rising world superpower and world’s most populous country.

India, world’s largest democracy and developing economy with a long way to go.

Norway, as representative of the Scandinavian social democracies.

Germany, as representative of continental Europe.

Russia, as representative of the Soviet Union and the Second World bloc.

Saudi Arabia, as representative of the Middle East petrostates.

Botswana, as representative of African developing economies.

Zimbabwe, as representative of failed Sub-Saharan African states.

Brazil, as representative of Latin American developing economies.
The ordering of these countries by GDP per-capita PPP is probably not too surprising:

  1. Norway 69,249
  2. United States 57,436
  3. Saudi Arabia 55,158
  4. Germany 48,111
  5. Russia 26,490
  6. Botswana 17,042
  7. China 15,399
  8. Brazil 15,242
  9. India 6,616
  10. Zimbabwe 1,970

Norway is clearly the richest, the US, Saudi Arabia, and Germany are quite close, Russia is toward the upper end, Botswana, China, and Brazil are close together in the middle, and then India and especially Zimbabwe are extremely poor.

But now let’s take a look at the inequality in each country, as measured by the Gini coefficient (which ranges from 0, perfect equality, to 1, total inequality).

  1. Botswana 0.605
  2. Zimbabwe 0.501
  3. Brazil 0.484
  4. United States 0.461
  5. Saudi Arabia 0.459
  6. China 0.422
  7. Russia 0.416
  8. India 0.351
  9. Germany 0.301
  10. Norway 0.259

The US remains (alarmingly) close to Saudi Arabia by this measure. Most of the countries are between 40 and 50. But Botswana is astonishingly unequal, while Germany and Norway are much more equal.

With that in mind, let’s take a look at the inequality-adjusted per-capita GDP. First, the conservative estimate, with a parameter of 1:

  1. Norway 58700
  2. United States 42246
  3. Saudi Arabia 40632
  4. Germany 39653
  5. Russia 20488
  6. China 11660
  7. Botswana 11138
  8. Brazil 11015
  9. India 5269
  10. Zimbabwe 1405

So far, ordering of nations is almost the same compared to what we got with just per-capita GDP. But notice how Germany has moved up closer to the US and Botswana actually fallen behind China.

Now let’s try a parameter of 2, which I think is the closest to the truth:

  1. Norway 49758
  2. Germany 32683
  3. United States 31073
  4. Saudi Arabia 29931
  5. Russia 15581
  6. China 8829
  7. Brazil 7961
  8. Botswana 7280
  9. India 4197
  10. Zimbabwe 1002

Now we have seen some movement. Norway remains solidly on top, but Germany has overtaken the United States and Botswana has fallen behind not only China, but also Brazil. Russia remains in the middle, and India and Zimbawbe remain on the bottom.

Finally, let’s try a parameter of 3.

  1. Norway 42179
  2. Germany 26937
  3. United States 22855
  4. Saudi Arabia 22049
  5. Russia 11849
  6. China 6685
  7. Brazil 5753
  8. Botswana 4758
  9. India 3343
  10. Zimbabwe 715

Norway has now pulled far and away ahead of everyone else. Germany is substantially above the United States. China has pulled away from Brazil, and Botswana has fallen almost all the way to the level of India. Zimbabwe, as always, is at the very bottom.

Let’s compare this to another measure of national well-being, the Inequality-Adjusted Human Development Index (which goes from 0, the worst, to 1 the best). This index combines education, public health, and income, and adjusts for inequality. It seems to be a fairly good measure of well-being, but it’s very difficult to compile data for, so a lot of countries are missing (including Saudi Arabia); plus the precise weightings on everything are very ad hoc.

  1. Norway 0.898
  2. Germany 0.859
  3. United States 0.796
  4. Russia 0.725
  5. China 0.543
  6. Brazil 0.531
  7. India 0.435
  8. Botswana 0.433
  9. Zimbabwe 0.371

Other than putting India above Botswana, this ordering is the same as what we get from my (much easier to calculate and theoretically more well-founded) index with either a parameter of 2 or 3.

What’s more, my index can be directly interpreted: The average standard of living in the US is as if everyone were making $31,073 per year. What exactly is an IHDI index of 0.796 supposed to mean? We’re… 79.6% of the way to the best possible country?

In any case, there’s a straightforward (if not terribly surprising) policy implication here: Inequality is a big problem.

In particular, inequality in the US is clearly too high. Despite an overall income that is very high, almost 18 log points higher than Germany, our overall standard of living is actually about 5 log points lower due to our higher level of inequality. While our average income is only 19 log points lower than Norway, our actual standard of living is 47 log points lower.

Inequality in Botswana also means that their recent astonishing economic growth is not quite as impressive as it at first appeared. Many people are being left behind. While in raw income they appear to be 10 log points ahead of China and only 121 log points behind the US, once you adjust for their very high inequality they are 19 log points behind China, and 145 log points behind the US.

Of course, some things don’t change. Norway is still on top, and Zimbabwe is still on the bottom.

This is one of the worst wildfire seasons in American history. But it won’t be for long.

Oct 22, JDN 2458049

At least 38 people have now been killed by the wildfires that are still ongoing in California; in addition, 5700 buildings have been destroyed and 190,000 acres of land burned. The State of California keeps an updated map of all the fires that are ongoing and how well-controlled they are; it’s not a pretty sight.

While the particular details are extreme, this is not an isolated incident. This year alone, wildfires have destroyed over 8 million acres of land in the US. In 2015, that figure was 10 million acres.

Property damage for this year’s wildfires in California is estimated at over $65 billion. That’s more than what Trump recently added to the military budget, and getting close to our total spending on food stamps.

There is a very clear upward trend in the scale and intensity of wildfires just over the last 50 years, and the obvious explanation is climate change. As climate change gets worse, these numbers are projected to increase between 30% and 50% by the 2040s. We still haven’t broken the record of fire damage in 1910, but as the upward trend continues we might soon enough.

It’s important to keep the death tolls in perspective; much as with hurricanes, our evacuation protocols and first-response agencies do their jobs very well, and as a result we’ve been averaging only about 10 wildfire deaths per year over the whole United States for the last century. In a country of over 300 million people, that’s really an impressively small number. That number has also been trending upward, however, so we shouldn’t get complacent.

Climate change isn’t the only reason these fires are especially damaging. It also matters where you build houses. We have been expanding our urban sprawl into fire-prone zones, and that is putting a lot of people in danger. Since 1990, over 60% of new homes were built in “wildland-urban interface areas” that are at higher risk.

Why are we doing this? Because housing prices in urban centers are too expensive for people to live there, but that is where most of the jobs are. So people have little choice but to live in exurbs and suburbs closer to the areas where fires are worst. That’s right: The fires are destroying homes and killing people because the rent is too damn high.

We need to find a solution to this problem of soaring housing prices. And since housing is such a huge proportion of our total expenditure—we spend more on housing than we do on all government spending combined—this would have an enormous impact on our entire economy. If you compare the income of a typical American today to most of the world’s population, or even to a typical American a century ago, we should feel extremely rich, but we don’t—largely because we spend so much of it just on keeping a roof over our heads.

Real estate is also a major driver of economic inequality. Wealth inequality is highest in urban centers where homeownership is rare. The large wealth gaps between White and non-White Americans can be in large part attributed to policies that made homeownership much more difficult for non-White people. Housing value inequality and overall wealth inequality are very strongly correlated. The high inequality in housing prices is making it far more difficult for people to move from poor regions to rich regions, holding back one of the best means we had for achieving more equal incomes.

Moreover, the rise in capital income share since the 1970s is driven almost entirely by real estate, rather than actual physical capital. The top 10% richest housing communities constitute over 52% of the total housing wealth in the US.

There is a lot of debate about what exactly causes these rising housing prices. No doubt, there are many factors contributing, from migration patterns to zoning regulations to income inequality in general. In a later post, I’ll get into why I think many of the people who think they are fighting the problem are actually making it worse, and suggest some ideas for what they should be doing instead.

Statistics you should have been taught in high school, but probably weren’t

Oct 15, JDN 2458042

Today I’m trying something a little different. This post will assume a lot less background knowledge than most of the others. For some of my readers, this post will probably seem too basic, obvious, even boring. For others, it might feel like a breath of fresh air, relief at last from the overly-dense posts I am generally inclined to write out of Curse of Knowledge. Hopefully I can balance these two effects well enough to gain rather than lose readers.

Here are four core statistical concepts that I think all adults should know, necessary for functional literacy in understanding the never-ending stream of news stories about “A new study shows…” and more generally in applying social science to political decisions. In theory shese should all be taught as part of a core high school curriculum, but typically they either aren’t taught or aren’t retained once students graduate. (Really, I think we should replace one year of algebra with one semester of statistics and one semester of logic. Most people don’t actually need algebra, but they absolutely do need logic and statistics.)

  1. Mean and median

The mean and the median are quite simple concepts, and you’ve probably at least heard of them before, yet confusion between them has caused a great many misunderstandings.

Part of the problem is the word “average”. Normally, the word “average” applies to the mean—for example, a batting average, or an average speed. But in common usage the word “average” can also mean “typical” or “representative”—an average person, an average family. And in many cases, particularly when in comes to economics, the mean is in no way typical or representative.

The mean of a sample of values is just the sum of all those values, divided by the number of values. The mean of the sample {1,2,3,10,1000} is (1+2+3+10+1000)/5 = 203.2

The median of a sample of values is the middle one—order the values, choose the one in the exact center. If you have an even number, take the mean of the two values on either side. So the median of the sample {1,2,3,10,1000} is 3.

I intentionally chose an extreme example: The mean and median of these samples are completely different. But this is something that can happen in real life.

This is vital for understanding the distribution of income, because for almost all countries (and certainly for the world as a whole), the mean income is substantially higher (usually between 50% and 100% higher) than the median income. Yet the mean income is what is reported as “per capita GDP”, but the median income is a much better measure of actual standard of living.

As for the word “average”, it’s probably best to just remove it from your vocabulary. Say “mean” instead if that’s what you intend, or “median” if that’s what you’re using instead.

  1. Standard deviation and mean absolute deviation

Standard deviation is another one you’ve probably seen before.

Standard deviation is kind of a weird concept, honestly. It’s so entrenched in statistics that we’re probably stuck with it, but it’s really not a very good measure of anything intuitively interesting.

Mean absolute deviation is a much more intuitive concept, and much more robust to weird distributions (such as those of incomes and financial markets), but it isn’t as widely used by statisticians for some reason.

The standard deviation is defined as the square root of the mean of the squared differences between the individual values in sample and the mean of that sample. So for my {1,2,3,10,1000} example, the standard deviation is sqrt(((1-203.2)^2 + (2-203.2)^2 + (3-203.2)^2 + (10-203.2)^2 + (1000-203.2)^2)/5) = 398.4.

What can you infer from that figure? Not a lot, honestly. The standard deviation is bigger than the mean, so we have some sense that there’s a lot of variation in our sample. But interpreting exactly what that means is not easy.

The mean absolute deviation is much simpler: It’s the mean of the absolute value of differences between the individual values in a sample and the mean of that sample. In this case it is ((203.2-1) + (203.2-2) + (203.2-3) + (203.2-10) + (1000-203.2))/5 = 318.7.

This has a much simpler interpretation: The mean distance between each value and the mean is 318.7. On average (if we still use that word), each value is about 318.7 away from the mean of 203.2.

When you ask people to interpret a standard deviation, most of them actually reply as if you had asked them about the mean absolute deviation. They say things like “the average distance from the mean”. Only people who know statistics very well and are being very careful would actually say the true answer, “the square root of the sum of squared distances from the mean”.

But there is an even more fundamental reason to prefer the mean absolute deviation, and that is that sometimes the standard deviation doesn’t exist!

For very fat-tailed distributions, the sum that would give you the standard deviation simply fails to converge. You could say the standard deviation is infinite, or that it’s simply undefined. Either way we know it’s fat-tailed, but that’s about all. Any finite sample would have a well-defined standard deviation, but that will keep changing as your sample grows, and never converge toward anything in particular.

But usually the mean still exists, and if the mean exists, then the mean absolute deviation also exists. (In some rare cases even they fail, such as the Cauchy distribution—but actually even then there is usually a way to recover what the mean and mean absolute deviation “should have been” even though they don’t technically exist.)

  1. Standard error

The standard error is even more important for statistical inference than the standard deviation, and frankly even harder to intuitively understand.

The actual definition of the standard error is this: The standard deviation of the distribution of sample means, provided that the null hypothesis is true and the distribution is a normal distribution.

How it is usually used is something more like this: “A good guess of the margin of error on my estimates, such that I’m probably not off by more than 2 standard errors in either direction.”

You may notice that those two things aren’t the same, and don’t even seem particularly closely related. You are correct in noticing this, and I hope that you never forget it. One thing that extensive training in statistics (especially frequentist statistics) seems to do to people is to make them forget that.

In particular, the standard error strictly only applies if the value you are trying to estimate is zero, which usually means that your results aren’t interesting. (To be fair, not always; finding zero effect of minimum wage on unemployment was a big deal.) Using it as a margin of error on your actual nonzero estimates is deeply dubious, even though almost everyone does it for lack of an uncontroversial alternative.
Application of standard errors typically also relies heavily on the assumption of a normal distribution, even though plenty of real-world distributions aren’t normal and don’t even approach a normal distribution in quite large samples. The Central Limit Theorem says that the sampling distribution of the mean of any non-fat-tailed distribution will approach a normal distribution eventually as sample size increases, but it doesn’t say how large a sample needs to be to do that, nor does it apply to fat-tailed distributions.

Therefore, the standard error is really a very conservative estimate of your margin of error; it assumes essentially that the only kind of error you had was random sampling error from a normal distribution in an otherwise perfect randomized controlled experiment. All sorts of other forms of error and bias could have occurred at various stages—and typically, did—making your error estimate inherently too small.

This is why you should never believe a claim that comes from only a single study or a handful of studies. There are simply too many things that could have gone wrong. Only when there are a large number of studies, with varying methodologies, all pointing to the same core conclusion, do we really have good empirical evidence of that conclusion. This is part of why the journalistic model of “A new study shows…” is so terrible; if you really want to know what’s true, you look at large meta-analyses of dozens or hundreds of studies, not a single study that could be completely wrong.

  1. Linear regression and its limits

Finally, I come to linear regression, the workhorse of statistical social science. Almost everything in applied social science ultimately comes down to variations on linear regression.

There is the simplest kind, ordinary least-squares or OLS; but then there is two-stage least-squares 2SLS, fixed-effects regression, clustered regression, random-effects regression, heterogeneous treatment effects, and so on.
The basic idea of all regressions is extremely simple: We have an outcome Y, a variable we are interested in D, and some other variables X.

This might be an effect of education D on earnings Y, or minimum wage D on unemployment Y, or eating strawberries D on getting cancer Y. In our X variables we might include age, gender, race, or whatever seems relevant to Y but can’t be affected by D.

We then make the incredibly bold (and typically unjustifiable) assumption that all the effects are linear, and say that:

Y = A + B*D + C*X + E

A, B, and C are coefficients we estimate by fitting a straight line through the data. The last bit, E, is a random error that we allow to fill in any gaps. Then, if the standard error of B is less than half the size of B itself, we declare that our result is “statistically significant”, and we publish our paper “proving” that D has an effect on Y that is proportional to B.

No, really, that’s pretty much it. Most of the work in econometrics involves trying to find good choices of X that will make our estimates of B better. A few of the more sophisticated techniques involve breaking up this single regression into a few pieces that are regressed separately, in the hopes of removing unwanted correlations between our variable of interest D and our error term E.

What about nonlinear effects, you ask? Yeah, we don’t much talk about those.

Occasionally we might include a term for D^2:

Y = A + B1*D + B2*D^2 + C*X + E

Then, if the coefficient B2 is small enough, which is usually what happens, we say “we found no evidence of a nonlinear effect”.

Those who are a bit more sophisticated will instead report (correctly) that they have found the linear projection of the effect, rather than the effect itself; but if the effect was nonlinear enough, the linear projection might be almost meaningless. Also, if you’re too careful about the caveats on your research, nobody publishes your work, because there are plenty of other people competing with you who are willing to upsell their research as far more reliable than it actually is.

If this process seems rather underwhelming to you, that’s good. I think people being too easily impressed by linear regression is a much more widespread problem than people not having enough trust in linear regression.

Yes, it is possible to go too far the other way, and dismiss even dozens of brilliant experiments as totally useless because they used linear regression; but I don’t actually hear people doing that very often. (Maybe occasionally: The evidence that gun ownership increases suicide and homicide and that corporal punishment harms children is largely based on linear regression, but it’s also quite strong at this point, and I do still hear people denying it.)

Far more often I see people point to a single study using linear regression to prove that blueberries cure cancer or eating aspartame will kill you or yoga cures back pain or reading Harry Potter makes you hate Donald Trump or olive oil prevents Alzheimer’s or psychopaths are more likely to enjoy rap music. The more exciting and surprising a new study is, the more dubious you should be of its conclusions. If a very surprising result is unsupported by many other studies and just uses linear regression, you can probably safely ignore it.

A really good scientific study might use linear regression, but it would also be based on detailed, well-founded theory and apply a proper experimental (or at least quasi-experimental) design. It would check for confounding influences, look for nonlinear effects, and be honest that standard errors are a conservative estimate of the margin of error. Most scientific studies probably should end by saying “We don’t actually know whether this is true; we need other people to check it.” Yet sadly few do, because the publishers that have a strangle-hold on the industry prefer sexy, exciting, “significant” findings to actual careful, honest research. They’d rather you find something that isn’t there than not find anything, which goes against everything science stands for. Until that changes, all I can really tell you is to be skeptical when you read about linear regressions.

When are we going to get serious about climate change?

Oct 8, JDN 24578035

Those two storms weren’t simply natural phenomena. We had a hand in creating them.

The EPA doesn’t want to talk about the connection, and we don’t have enough statistical power to really be certain, but there is by now an overwhelming scientific consensus that global climate change will increase hurricane intensity. The only real question left is whether it is already doing so.

The good news is that global carbon emissions are no longer rising. They have been essentially static for the last few years. The bad news is that this is almost certainly too little, too late.

The US is not on track to hit our 2025 emission target; we will probably exceed it by at least 20%.

But the real problem is that the targets themselves are much too high. Most countries have pledged to drop emissions only about 8-10% below their 1990s levels.

Even with the progress we have made, we are on track to exceed the global carbon budget needed to keep warming below 2 C by the year 2040. We have been reducing emission intensity by about 0.8% per year—we need to be reducing it by at least 3% per year and preferably faster. Highly-developed nations should be switching to nuclear energy as quickly as possible; an equitable global emission target requires us to reduce our emissions by 80% by 2050.

At the current rate of improvement, we will overshoot the 2 C warming target and very likely the 3C target as well.

Why aren’t we doing better? There is of course the Tragedy of the Commons to consider: Each individual country acting in its own self-interest will continue to pollute more, as this is the cheapest and easiest way to maintain industrial development. But then if all countries do so, the result is a disaster for us all.
But this explanation is too simple. We have managed to achieve some international cooperation on this issue. The Kyoto protocol has worked; emissions among Kyoto member nations have been reduced by more than 20% below 1990 levels, far more than originally promised. The EU in particular has taken a leadership role in reducing emissions, and has a serious shot at hitting their target of 40% reduction by 2030.

That is a truly astonishing scale of cooperation; the EU has a population of over 500 million people and spans 28 nations. It would seem like doing that should get us halfway to cooperating across all nations and all the world’s people.

But there is a vital difference between the EU and the world as a whole: The tribal paradigm. Europeans certainly have their differences: The UK and France still don’t really get along, everyone’s bitter with Germany about that whole Hitler business, and as the acronym PIIGS emphasizes, the peripheral countries have never quite felt as European as the core Schengen members. But despite all this, there has been a basic sense of trans-national (meta-national?) unity among Europeans for a long time.
For one thing, today Europeans see each other as the same race. That wasn’t always the case. In Medieval times, ethnic categories were as fine as “Cornish” and “Liverpudlian”. (To be fair, there do still exist a handful of Cornish nationalists.) Starting around the 18th cenutry, Europeans began to unite under the heading of “White people”, a classification that took on particular significance during the trans-Atlantic slave trade. But even in the 19th century, “Irish” and “Sicilian” were seen as racial categories. It wasn’t until the 20th century that Europeans really began to think of themselves as one “kind of people”, and not coincidentally it was at the end of the 20th century that the European Union finally took hold.

There is another region that has had a similar sense of unification: Latin America. Again, there are conflicts: There are a lot of nasty stereotypes about Puerto Ricans among Cubans and vice-versa. But Latinos, by and large, think of each other as the same “kind of people”, distinct from both Europeans and the indigenous population of the Americas.

I don’t think it is coincidental that the lowest carbon emission intensity (carbon emissions / GDP PPP) in the world is in Latin America, followed closely by Europe.
And if you had to name right now the most ethnically divided region in the world, what would you say? The Middle East, of course. And sure enough, they have the worst carbon emission intensity. (Of course, oil is an obvious confounding variable here, likely contributing to both.)

Indeed, the countries with the lowest ethnic fractionalization ratings tend to be in Europe and Latin America, and the highest tend to be in the Middle East and Africa.

Even within the United States, political polarization seems to come with higher carbon emissions. When we think of Democrats and Republicans as different “kinds of people”, we become less willing to cooperate on finding climate policy solutions.

This is not a complete explanation, of course. China has a low fractionalization rating but a high carbon intensity, and extremely high overall carbon emissions due to their enormous population. Africa’s carbon intensity isn’t as high as you’d think just from their terrible fractionalization, especially if you exclude Nigeria which is a major oil producer.

But I think there is nonetheless a vital truth here: One of the central barriers to serious long-term solutions to climate change is the entrenchment of racial and national identity. Solving the Tragedy of the Commons requires cooperation, we will only cooperate with those we trust, and we will only trust those we consider to be the same “kind of people”.

You can even hear it in the rhetoric: If “we” (Americans) give up our carbon emissions, then “they” (China) will take advantage of us. No one seems to worry about Alabama exploiting California—certainly no Republican would—despite the fact that in real economic terms they basically do. But people in Alabama are Americans; in other words, they count as actual people. People in China don’t count. If anything, people in California are supposed to be considered less American than people in Alabama, despite the fact that vastly more Americans live in California than Alabama. This mirrors the same pattern where we urban residents are somehow “less authentic” even though we outnumber the rural by four to one.
I don’t know how to mend this tribal division; I very much wish I did. But I do know that simply ignoring it isn’t going to work. We can talk all we want about carbon taxes and cap-and-trade, but as long as most of the world’s people are divided into racial, ethnic, and national identities that they consider to be in zero-sum conflict with one another, we are never going to achieve the level of cooperation necessary for a real permanent solution to climate change.

The temperatures and the oceans rise. United we must stand, or divided we shall fall.

How can we stop rewarding psychopathy?

Oct 1, JDN 24578028

A couple of weeks ago The New York Times ran an interesting article about how entrepreneurs were often juvenile delinquents, who then often turn into white-collar criminals. They didn’t quite connect the dots, though; they talked about the relevant trait driving this behavior as “rule-breaking”, when it is probably better defined as psychopathy. People like Martin Shkreli aren’t just “rule-breakers”; they are psychopaths. While only about 1% of humans in general are psychopaths, somewhere between 3% and 4% of business executives are psychopaths. I was unable to find any specific data assessing the prevalence of psychopathy among politicians, but if you just read the Hare checklist, it’s not hard to see that psychopathic traits are overrepresented among politicians as well.

This is obviously the result of selection bias; as a society, we are systematically appointing psychopaths to positions of wealth and power. Why are we doing this? How can we stop?

One very important factor here that may be especially difficult to deal with is desire. We generally think that in a free society, people should be allowed to seek out the sort of life they want to live. But one of the reasons that psychopaths are more likely to become rich and powerful is precisely that they want it more.

To most of us, being rich is probably something we want, but not the most important thing to us. We’d accept being poor if it meant we could be happy, surrounded by friends and family who love us, and made a great contribution to society. We would like to be rich, but it’s more important that we be good people. But to many psychopaths, being rich is the one single thing they care about. All those other considerations are irrelevant.

With power, matters are even more extreme: Most people actually seem convinced that they don’t want power at all. They associate power with corruption and cruelty (because, you know, so many of the people in power are psychopaths!), and they want no part of it.

So the saying goes: “Power tends to corrupt, and absolute power corrupts absolutely.” Does it, now? Did power corrupt George Washington and Abraham Lincoln? Did it corrupt Mahatma Gandhi and Nelson Mandela? I’m not saying that any of these men were without flaws, even serious ones—but was it power that made them so? Who would they have been, and more importantly, what would they have done, if they hadn’t had power? Would the world really have been better off if Abraham Lincoln and Nelson Mandela had stayed out of politics? I don’t think so.

Part of what we need, therefore, is to convince good people that wanting power is not inherently bad. Power just means the ability to do things; it’s what you do that matters. You should want power—the power to right wrongs, mend injustices, uplift humanity’s future. Thinking that the world would be better if you were in charge not only isn’t a bad thing—it is quite likely to be true. If you are not a psychopath, then the world would probably be better off if you were in charge of it.

Of course, that depends partly on what “in charge of the world” even means; it’s not like we have a global government, after all. But even suppose you were granted the power of an absolute dictatorship over all of humanity; what would you do with that power? My guess is that you’d probably do what I would do: Start by using that power to correct the greatest injustices, then gradually cede power to a permanent global democracy. That wouldn’t just be a good thing; it would be quite literally and without a doubt the best thing that ever happened. Of course, it would be all the better if we never built such a dictatorship in the first place; but mainly that’s because of the sort of people who tend to become dictators. A benevolent dictatorship really would be a wonderful thing; the problem is that dictators almost never remain benevolent. Dictatorship is simply too enticing to psychopaths.

And what if you don’t think you’re competent enough in policy to make such decisions? Simple: You don’t make them yourself, you delegate them to responsible and trustworthy people to make them for you. Recognizing your own limitations is one of the most important differences between a typical leader and a good leader.

Desire isn’t the only factor here, however. Even though psychopaths tend to seek wealth and power with more zeal than others, there are still a lot of good people trying to seek wealth and power. We need to look very carefully at the process of how we select our leaders.

Let’s start with the private sector. How are managers chosen? Mainly, by managers above them. What criteria do they use? Mostly, they use similarity. Managers choose other managers who are “like them”—middle-aged straight White men with psychopathic tendencies.

This is something that could be rectified with regulation; we could require businesses to choose a more diverse array of managers that is more representative of the population at large. While this would no doubt trigger many complaints of “government interference” and “inefficiency”, in fact it almost certainly would increase the long-term profitability of most corporations. Study after study after study shows that increased diversity, particularly including more equal representation of women, results in better business performance. A recent MIT study found that switching from an all-male or all-female management population to a 50-50 male/female split could increase profits by as much as forty percent. The reason boards of directors aren’t including more diversity is that they ultimately care more about protecting their old boys’ club (and increasing their own compensation, of course) than they do about maximizing profits for their shareholders.

I think it would actually be entirely reasonable to include regulations about psychopathy in particular; designate certain industries (such as lobbying and finance; I would not include medicine, as psychopaths actually seem to make pretty good neurosurgeons!) as “systematically vital” and require psychopathy screening tests as part of their licensing process. This is no small matter, and definitely does represent an incursion into civil liberties; but given the enormous potential benefits, I don’t think it can be dismissed out of hand. We do license professions; why shouldn’t at least a minimal capacity for empathy and ethical behavior be part of that licensing process?

Where the civil liberty argument becomes overwhelming is in politics. I don’t think we can justify any restrictions on who should be allowed to run for office. Frankly, I think even the age limits should be struck from the Constitution; you should be allowed to run for President at 18 if you want. Requiring psychological tests for political office borders on dystopian.

That means we need to somehow reform either the campaign system, the voting system, or the behavior of voters themselves.

Of course, we should reform all three. Let’s start with the voting system itself, as that is the simplest: We should be using range voting, and we should abolish the Electoral College. Districts should be replaced by proportional representation through reweighted range voting, eliminating gerrymandering once and for all without question.

The campaign system is trickier. We could start by eliminating or tightly capping private and corporate campaign donations, and replace them with a system similar to the “Democracy Vouchers” being tested in Seattle. The basic idea is simple and beautiful: Everyone gets an equal amount of vouchers to give to whatever candidates they like, and then all the vouchers can be redeemed for campaign financing from public funds. It’s like everyone giving a donation (or monetary voting), but everyone has the same amount of “money”.

This would not solve all the problems, however. There is still an oligopoly of news media distorting our political discourse. There is still astonishingly bad journalism even in our most respected outlets, like the way the New York Times was obsessed with Comey’s letter and CNN’s wall-to-wall coverage of totally unfounded speculation about a missing airliner.

Then again, CNN’s ratings skyrocketed during that period. This shows that the problems run much deeper than a handful of bad journalists or corrupt media companies. These companies are, to a surprisingly large degree, just trying to cater to what their audience has said it wants, just “giving the people what they want”.

Our fundamental challenge, therefore, is to change what the people want. We have to somehow convince the public at large—or at least a big enough segment of the public at large—that they don’t really want TV news that spends hours telling them nothing and they don’t really want to elect the candidate who is the tallest or has the nicest hair. And we have to get them to actually change the way they behave accordingly.

When it comes to that part, I have no idea what to do. A voting population that is capable of electing Donald Trump—Electoral College nonsense notwithstanding, he won sixty million votes—is one that I honestly have no idea how to interface with at all. But we must try.