Statisticacy

2023-06-112023-06-04 PNRJ core principles, education, financial advice, science and academia, statistics Bill Nye, blackjack, debt, education, error, exponential, growth, inflation, interest, knowledge, mean, median, Monty Hall, percent, percentage, percentage point, poker, probability, statistics

Jun 11 JDN 2460107

I wasn’t able to find a dictionary that includes the word “statisticacy”, but it doesn’t trigger my spell-check, and it does seem to have the same form as “numeracy”: numeric, numerical, numeracy, numerate; statistic, statistical, statisticacy, statisticate. It definitely still sounds very odd to my ears. Perhaps repetition will eventually make it familiar.

For the concept is clearly a very important one. Literacy and numeracy are no longer a serious problem in the First World; basically every adult at this point knows how to read and do addition. Even worldwide, 90% of men and 83% of women can read, at least at a basic level—which is an astonishing feat of our civilization by the way, well worthy of celebration.

But I have noticed a disturbing lack of, well, statisticacy. Even intelligent, educated people seem… pretty bad at understanding statistics.

I’m not talking about sophisticated econometrics here; of course most people don’t know that, and don’t need to. (Most economists don’t know that!) I mean quite basic statistical knowledge.

A few years ago I wrote a post called “Statistics you should have been taught in high school, but probably weren’t”; that’s the kind of stuff I’m talking about.

As part of being a good citizen in a modern society, every adult should understand the following:

1. The difference between a mean and a median, and why average income (mean) can increase even though most people are no richer (median).

2. The difference between increasing by X% and increasing by X percentage points: If inflation goes from 4% to 5%, that is an increase of 20% ((5/4-1)*100%), but only 1 percentage point (5%-4%).

3. The meaning of standard error, and how to interpret error bars on a graph—and why it’s a huge red flag if there aren’t any error bars on a graph.

4. Basic probabilistic reasoning: Given some scratch paper, a pen, and a calculator, everyone should be able to work out the odds of drawing a given blackjack hand, or rolling a particular number on a pair of dice. (If that’s too easy, make it a poker hand and four dice. But mostly that’s just more calculation effort, not fundamentally different.)

5. The meaning of exponential growth rates, and how they apply to economic growth and compound interest. (The difference between 3% interest and 6% interest over 30 years is more than double the total amount paid.)

I see people making errors about this sort of thing all the time.

Economic news that celebrates rising GDP but wonders why people aren’t happier (when real median income has been falling since 2019 and is only 7% higher than it was in 1999, an annual growth rate of 0.2%).

Reports on inflation, interest rates, or poll numbers that don’t clearly specify whether they are dealing with percentages or percentage points. (XKCD made fun of this.)

Speaking of poll numbers, any reporting on changes in polls that isn’t at least twice the margin of error of the polls in question. (There’s also a comic for this; this time it’s PhD Comics.)

People misunderstanding interest rates and gravely underestimating how much they’ll pay for their debt (then again, this is probably the result of strategic choices on the part of banks—so maybe the real failure is regulatory).

And, perhaps worst of all, the plague of science news articles about “New study says X”. Things causing and/or cancer, things correlated with personality types, tiny psychological nudges that supposedly have profound effects on behavior.

Some of these things will even turn out to be true; actually I think this one on fibromyalgia, this one on smoking, and this one on body image are probably accurate. But even if it’s a properly randomized experiment—and especially if it’s just a regression analysis—a single study ultimately tells us very little, and it’s irresponsible to report on them instead of telling people the extensive body of established scientific knowledge that most people still aren’t aware of.

Basically any time an article is published saying “New study says X”, a statisticate person should ignore it and treat it as random noise. This is especially true if the finding seems weird or shocking; such findings are far more likely to be random flukes than genuine discoveries. Yes, they could be true, but one study just doesn’t move the needle that much.

I don’t remember where it came from, but there is a saying about this: “What is in the textbooks is 90% true. What is in the published literature is 50% true. What is in the press releases is 90% false.” These figures are approximately correct.

If their goal is to advance public knowledge of science, science journalists would accomplish a lot more if they just opened to a random page in a mainstream science textbook and started reading it on air. Admittedly, I can see how that would be less interesting to watch; but then, their job should be to find a way to make it interesting, not to take individual studies out of context and hype them up far beyond what they deserve. (Bill Nye did this much better than most science journalists.)

I’m not sure how much to blame people for lacking this knowledge. On the one hand, they could easily look it up on Wikipedia, and apparently choose not to. On the other hand, they probably don’t even realize how important it is, and were never properly taught it in school even though they should have been. Many of these things may even be unknown unknowns; people simply don’t realize how poorly they understand. Maybe the most useful thing we could do right now is simply point out to people that these things are important, and if they don’t understand them, they should get on that Wikipedia binge as soon as possible.

And one last thing: Maybe this is asking too much, but I think that a truly statisticate person should be able to solve the Monty Hall Problem and not be confused by the result. (Hint: It’s very important that Monty Hall knows which door the car is behind, and would never open that one. If he’s guessing at random and simply happens to pick a goat, the correct answer is 1/2, not 2/3. Then again, it’s never a bad choice to switch.)

Statistics you should have been taught in high school, but probably weren’t

2017-10-152017-10-08 PNRJ core principles, statistics average, Curse of Knowledge, fat-tailed distribution, linear regression, mean, mean absolute deviation, median, normal distribution, regression, standard deviation, standard error

Oct 15, JDN 2458042

Today I’m trying something a little different. This post will assume a lot less background knowledge than most of the others. For some of my readers, this post will probably seem too basic, obvious, even boring. For others, it might feel like a breath of fresh air, relief at last from the overly-dense posts I am generally inclined to write out of Curse of Knowledge. Hopefully I can balance these two effects well enough to gain rather than lose readers.

Here are four core statistical concepts that I think all adults should know, necessary for functional literacy in understanding the never-ending stream of news stories about “A new study shows…” and more generally in applying social science to political decisions. In theory shese should all be taught as part of a core high school curriculum, but typically they either aren’t taught or aren’t retained once students graduate. (Really, I think we should replace one year of algebra with one semester of statistics and one semester of logic. Most people don’t actually need algebra, but they absolutely do need logic and statistics.)

Mean and median

The mean and the median are quite simple concepts, and you’ve probably at least heard of them before, yet confusion between them has caused a great many misunderstandings.

Part of the problem is the word “average”. Normally, the word “average” applies to the mean—for example, a batting average, or an average speed. But in common usage the word “average” can also mean “typical” or “representative”—an average person, an average family. And in many cases, particularly when in comes to economics, the mean is in no way typical or representative.

The mean of a sample of values is just the sum of all those values, divided by the number of values. The mean of the sample {1,2,3,10,1000} is (1+2+3+10+1000)/5 = 203.2

The median of a sample of values is the middle one—order the values, choose the one in the exact center. If you have an even number, take the mean of the two values on either side. So the median of the sample {1,2,3,10,1000} is 3.

I intentionally chose an extreme example: The mean and median of these samples are completely different. But this is something that can happen in real life.

This is vital for understanding the distribution of income, because for almost all countries (and certainly for the world as a whole), the mean income is substantially higher (usually between 50% and 100% higher) than the median income. Yet the mean income is what is reported as “per capita GDP”, but the median income is a much better measure of actual standard of living.

As for the word “average”, it’s probably best to just remove it from your vocabulary. Say “mean” instead if that’s what you intend, or “median” if that’s what you’re using instead.

Standard deviation and mean absolute deviation

Standard deviation is another one you’ve probably seen before.

Standard deviation is kind of a weird concept, honestly. It’s so entrenched in statistics that we’re probably stuck with it, but it’s really not a very good measure of anything intuitively interesting.

Mean absolute deviation is a much more intuitive concept, and much more robust to weird distributions (such as those of incomes and financial markets), but it isn’t as widely used by statisticians for some reason.

The standard deviation is defined as the square root of the mean of the squared differences between the individual values in sample and the mean of that sample. So for my {1,2,3,10,1000} example, the standard deviation is sqrt(((1-203.2)^2 + (2-203.2)^2 + (3-203.2)^2 + (10-203.2)^2 + (1000-203.2)^2)/5) = 398.4.

What can you infer from that figure? Not a lot, honestly. The standard deviation is bigger than the mean, so we have some sense that there’s a lot of variation in our sample. But interpreting exactly what that means is not easy.

The mean absolute deviation is much simpler: It’s the mean of the absolute value of differences between the individual values in a sample and the mean of that sample. In this case it is ((203.2-1) + (203.2-2) + (203.2-3) + (203.2-10) + (1000-203.2))/5 = 318.7.

This has a much simpler interpretation: The mean distance between each value and the mean is 318.7. On average (if we still use that word), each value is about 318.7 away from the mean of 203.2.

When you ask people to interpret a standard deviation, most of them actually reply as if you had asked them about the mean absolute deviation. They say things like “the average distance from the mean”. Only people who know statistics very well and are being very careful would actually say the true answer, “the square root of the sum of squared distances from the mean”.

But there is an even more fundamental reason to prefer the mean absolute deviation, and that is that sometimes the standard deviation doesn’t exist!

For very fat-tailed distributions, the sum that would give you the standard deviation simply fails to converge. You could say the standard deviation is infinite, or that it’s simply undefined. Either way we know it’s fat-tailed, but that’s about all. Any finite sample would have a well-defined standard deviation, but that will keep changing as your sample grows, and never converge toward anything in particular.

But usually the mean still exists, and if the mean exists, then the mean absolute deviation also exists. (In some rare cases even they fail, such as the Cauchy distribution—but actually even then there is usually a way to recover what the mean and mean absolute deviation “should have been” even though they don’t technically exist.)

Standard error

The standard error is even more important for statistical inference than the standard deviation, and frankly even harder to intuitively understand.

The actual definition of the standard error is this: The standard deviation of the distribution of sample means, provided that the null hypothesis is true and the distribution is a normal distribution.

How it is usually used is something more like this: “A good guess of the margin of error on my estimates, such that I’m probably not off by more than 2 standard errors in either direction.”

You may notice that those two things aren’t the same, and don’t even seem particularly closely related. You are correct in noticing this, and I hope that you never forget it. One thing that extensive training in statistics (especially frequentist statistics) seems to do to people is to make them forget that.

In particular, the standard error strictly only applies if the value you are trying to estimate is zero, which usually means that your results aren’t interesting. (To be fair, not always; finding zero effect of minimum wage on unemployment was a big deal.) Using it as a margin of error on your actual nonzero estimates is deeply dubious, even though almost everyone does it for lack of an uncontroversial alternative.
Application of standard errors typically also relies heavily on the assumption of a normal distribution, even though plenty of real-world distributions aren’t normal and don’t even approach a normal distribution in quite large samples. The Central Limit Theorem says that the sampling distribution of the mean of any non-fat-tailed distribution will approach a normal distribution eventually as sample size increases, but it doesn’t say how large a sample needs to be to do that, nor does it apply to fat-tailed distributions.

Therefore, the standard error is really a very conservative estimate of your margin of error; it assumes essentially that the only kind of error you had was random sampling error from a normal distribution in an otherwise perfect randomized controlled experiment. All sorts of other forms of error and bias could have occurred at various stages—and typically, did—making your error estimate inherently too small.

This is why you should never believe a claim that comes from only a single study or a handful of studies. There are simply too many things that could have gone wrong. Only when there are a large number of studies, with varying methodologies, all pointing to the same core conclusion, do we really have good empirical evidence of that conclusion. This is part of why the journalistic model of “A new study shows…” is so terrible; if you really want to know what’s true, you look at large meta-analyses of dozens or hundreds of studies, not a single study that could be completely wrong.

Linear regression and its limits

Finally, I come to linear regression, the workhorse of statistical social science. Almost everything in applied social science ultimately comes down to variations on linear regression.

There is the simplest kind, ordinary least-squares or OLS; but then there is two-stage least-squares 2SLS, fixed-effects regression, clustered regression, random-effects regression, heterogeneous treatment effects, and so on.
The basic idea of all regressions is extremely simple: We have an outcome Y, a variable we are interested in D, and some other variables X.

This might be an effect of education D on earnings Y, or minimum wage D on unemployment Y, or eating strawberries D on getting cancer Y. In our X variables we might include age, gender, race, or whatever seems relevant to Y but can’t be affected by D.

We then make the incredibly bold (and typically unjustifiable) assumption that all the effects are linear, and say that:

Y = A + B*D + C*X + E

A, B, and C are coefficients we estimate by fitting a straight line through the data. The last bit, E, is a random error that we allow to fill in any gaps. Then, if the standard error of B is less than half the size of B itself, we declare that our result is “statistically significant”, and we publish our paper “proving” that D has an effect on Y that is proportional to B.

No, really, that’s pretty much it. Most of the work in econometrics involves trying to find good choices of X that will make our estimates of B better. A few of the more sophisticated techniques involve breaking up this single regression into a few pieces that are regressed separately, in the hopes of removing unwanted correlations between our variable of interest D and our error term E.

What about nonlinear effects, you ask? Yeah, we don’t much talk about those.

Occasionally we might include a term for D^2:

Y = A + B1*D + B2*D^2 + C*X + E

Then, if the coefficient B2 is small enough, which is usually what happens, we say “we found no evidence of a nonlinear effect”.

Those who are a bit more sophisticated will instead report (correctly) that they have found the linear projection of the effect, rather than the effect itself; but if the effect was nonlinear enough, the linear projection might be almost meaningless. Also, if you’re too careful about the caveats on your research, nobody publishes your work, because there are plenty of other people competing with you who are willing to upsell their research as far more reliable than it actually is.

If this process seems rather underwhelming to you, that’s good. I think people being too easily impressed by linear regression is a much more widespread problem than people not having enough trust in linear regression.

Yes, it is possible to go too far the other way, and dismiss even dozens of brilliant experiments as totally useless because they used linear regression; but I don’t actually hear people doing that very often. (Maybe occasionally: The evidence that gun ownership increases suicide and homicide and that corporal punishment harms children is largely based on linear regression, but it’s also quite strong at this point, and I do still hear people denying it.)

Far more often I see people point to a single study using linear regression to prove that blueberries cure cancer or eating aspartame will kill you or yoga cures back pain or reading Harry Potter makes you hate Donald Trump or olive oil prevents Alzheimer’s or psychopaths are more likely to enjoy rap music. The more exciting and surprising a new study is, the more dubious you should be of its conclusions. If a very surprising result is unsupported by many other studies and just uses linear regression, you can probably safely ignore it.

A really good scientific study might use linear regression, but it would also be based on detailed, well-founded theory and apply a proper experimental (or at least quasi-experimental) design. It would check for confounding influences, look for nonlinear effects, and be honest that standard errors are a conservative estimate of the margin of error. Most scientific studies probably should end by saying “We don’t actually know whether this is true; we need other people to check it.” Yet sadly few do, because the publishers that have a strangle-hold on the industry prefer sexy, exciting, “significant” findings to actual careful, honest research. They’d rather you find something that isn’t there than not find anything, which goes against everything science stands for. Until that changes, all I can really tell you is to be skeptical when you read about linear regressions.

I think I know what the Great Filter is now

2017-09-032017-08-27 PNRJ cognitive science, ethics, futurism Fermi paradox, Great Filter, mean, nationalism, nuclear weapons, racism, rationality, variance

Sep 3, JDN 2458000

One of the most plausible solutions to the Fermi Paradox of why we have not found any other intelligent life in the universe is called the Great Filter: Somewhere in the process of evolving from unicellular prokaryotes to becoming an interstellar civilization, there is some highly-probable event that breaks the process, a “filter” that screens out all but the luckiest species—or perhaps literally all of them.

I previously thought that this filter was the invention of nuclear weapons; I now realize that this theory is incomplete. Nuclear weapons by themselves are only an existential threat because they co-exist with widespread irrationality and bigotry. The Great Filter is the combination of the two.

Yet there is a deep reason why we would expect that this is precisely the combination that would emerge in most species (as it has certainly emerged in our own): The rationality of a species is not uniform. Some individuals in a species will always be more rational than others, so as a species increases its level of rationality, it does not do so all at once.

Indeed, the processes of economic development and scientific advancement that make a species more rational are unlikely to be spread evenly; some cultures will develop faster than others, and some individuals within a given culture will be further along than others. While the mean level of rationality increases, the variance will also tend to increase.

On some arbitrary and oversimplified scale where 1 is the level of rationality needed to maintain a hunter-gatherer tribe, and 20 is the level of rationality needed to invent nuclear weapons, the distribution of rationality in a population starts something like this:

Most of the population is between levels 1 and 3, which we might think of as lying between the bare minimum for a tribe to survive and the level at which one can start to make advances in knowledge and culture.

Then, as the society advances, it goes through a phase like this:

This is about where we were in Periclean Athens. Most of the population is between levels 2 and 8. Level 2 used to be the average level of rationality back when we were hunter-gatherers. Level 8 is the level of philosophers like Archimedes and Pythagoras.

Today, our society looks like this:

Most of the society is between levels 4 and 20. As I said, level 20 is the point at which it becomes feasible to develop nuclear weapons. Some of the world’s people are extremely intelligent and rational, and almost everyone is more rational than even the smartest people in hunter-gatherer times, but now there is enormous variation.

Where on this chart are racism and nationalism? Importantly, I think they are above the level of rationality that most people had in ancient times. Even Greek philosophers had attitudes toward slaves and other cultures that the modern KKK would find repulsive. I think on this scale racism is about a 10 and nationalism is about a 12.

If we had managed to uniformly increase the rationality of our society, with everyone gaining at the same rate, our distribution would instead look like this:

If that were the case, we’d be fine. The lowest level of rationality widespread in the population would be 14, which is already beyond racism and nationalism. (Maybe it’s about the level of humanities professors today? That makes them substantially below quantum physicists who are 20 by construction… but hey, still almost twice as good as the Greek philosophers they revere.) We would have our nuclear technology, but it would not endanger our future—we wouldn’t even use it for weapons, we’d use it for power generation and space travel. Indeed, this lower-variance high-rationality state seems to be about what they have the Star Trek universe.

But since we didn’t, a large chunk of our population is between 10 and 12—that is, still racist or nationalist. We have the nuclear weapons, and we have people who might actually be willing to use them.

I think this is what happens to most advanced civilizations around the galaxy. By the time they invent space travel, they have also invented nuclear weapons—but they still have their equivalent of racism and nationalism. And most of the time, the two combine into a volatile mix that results in the destruction or regression of their entire civilization.

If this is right, then we may be living at the most important moment in human history. It may be right here, right now, that we have the only chance we’ll ever get to turn the tide. We have to find a way to reduce the variance, to raise the rest of the world’s population past nationalism to a cosmopolitan morality. And we may have very little time.

Means, medians, and inequality denial

2015-10-282017-10-08 PNRJ inequality, public policy AEI, Census, denial, Gini, household, income, income share, inequality, libertarian, mean, median, new blog title, personal, wage, women, WTID

JDN 2457324 EDT 21:45

You may have noticed a couple of big changes in the blog today. The first is that I’ve retitled it “Human Economics” to emphasize the positive, and the second is that I’ve moved it to my domain http://patrickjuli.us which is a lot shorter and easier to type. I’ll be making two bite-sized posts a week, just as I have been piloting for the last few weeks.

Earlier today I was dismayed to see a friend link to this graph by the American Enterprise Institute (a well-known Libertarian think-tank):

Look! The “above $100,000” is the only increasing category! That means standard of living in the US is increasing! There’s no inequality problem!

The AEI has an agenda to sell you, which is that the free market is amazing and requires absolutely no intervention, and government is just a bunch of big bad meanies who want to take your hard-earned money and give it away to lazy people. They chose very carefully what data to use for this plot in order to make it look like inequality isn’t increasing.

Here’s a more impartial way of looking at the situation, the most obvious, pre-theoretical way of looking at inequality: What has happened to mean income versus median income?

As a refresher from intro statistics, the mean is what you get by adding up the total money and dividing by the number of people; the median is what a person in the exact middle has. So for example if there are three people in a room, one makes $20,000, the second makes $50,000, and the third is Bill Gates making $10,000,000,000, then the mean income is $3,333,333,356 but the median income is $50,000. In a distribution similar to the power-law distribution that incomes generally fall into, the mean is usually higher than the median, and how much higher is a measure of how much inequality there is. (In my example, the mean is much higher, because there’s huge inequality with Bill Gates in the room.) This confuses people, because when people say “the average”, they usually intend the mean; but when they say “the average person”, they usually intend the median. The average person in my three-person example makes $50,000, but the average income is $3.3 billion.

So if we look at mean income versus median income in the US over time, this is what we see:

In 1953, mean household income was $36,535 and median household income was $32,932. Mean income was therefore 10.9% higher than median income.

In 2013, mean household income was $88,765 and median income was $66,632. Mean household income was therefore 33.2% higher than median income.

That, my dear readers, is a substantial increase in inequality. To be fair, it’s also a substantial increase in standard of living; these figures are already adjusted for inflation, so the average family really did see their standard of living roughly double during that period.

But this also isn’t the whole story.

First, notice that real median household income is actually about 5% lower now than it was in 2007. Real mean household income is also lower than its peak in 2006, but only by about 2%. This is why in a real sense we are still in the Second Depression; income for most people has not retained its pre-recession peak.

Furthermore, real median earnings for full-time employees have not meaningfully increased over the last 35 years; in 1982 dollars, they were $335 in 1979 and they are $340 now:

At first I thought this was because people were working more hours, but that doesn’t seem to be true; average weekly hours of work have fallen from 38.2 to 33.6:

The main reason seems to be actually that women are entering the workforce, so more households have multiple full-time incomes; while only 43% of women were in the labor force in 1970, almost 57% are now.

I must confess to a certain confusion on this point, however, as the difference doesn’t seem to be reflected in any of the measures of personal income. Median personal income was about 41% of median family income in 1974, and now it’s about 43%. I’m not sure exactly what’s going on here.

The Gini index, a standard measure of income inequality, is only collected every few years, yet shows a clear rising trend from 37% in 1986 to 41% in 2013:

But perhaps the best way to really grasp our rising inequality is to look at the actual proportions of income received by each portion of the population.

This is what it looks like if you use US Census data, broken down by groups of 20% and the top 5%; notice how since 1977 the top 5% have taken in more than the 40%-60% bracket, and they are poised to soon take in more than the 60%-80% bracket as well:

The result is even more striking if you use the World Top Incomes Database. You can watch the share of income rise for the top 10%, 5%, 1%, 0.1%, and 0.01%:

But in fact it’s even worse than it sounds. What I’ve just drawn double-counts a lot of things; it includes the top 0.01% in the top 0.1%, which is in turn included in the top 1%, and so on. If you exclude these, so that we’re only looking at the people in the top 10% but not the top 5%, the people in the top 5% but not the top 1%, and so on, something even more disturbing happens:

While the top 10% does see some gains, the top 5% gains faster, and the gains accrue even faster as you go up the chain.

Since 1970, the top 10%-5% share grew 10%. The top 0.01% share grew 389%.

Year	Top 10-5% share	Top 10-5% share incl. cap. gains	Top 5-1% share	Top 5-1% share incl cap. gains	Top 1-0.5% share	Top 1-0.5% share incl. cap. gains	Top 0.5-0.1% share	Top 0.5-0.1% share incl. cap. gains	Top 0.1-0.01% share	Top 0.1-0.01% share incl. cap. gains	Top 0.01% share	Top 0.01% share incl. cap. gains
1970	11.13	10.96	12.58	12.64	2.65	2.77	3.22	3.48	1.41	1.78	0.53	1
2014	12.56	12.06	16.78	16.55	4.17	4.28	6.18	6.7	4.38	5.36	3.12	4.89
Relative gain	12.8%	10.0%	33.4%	30.9%	57.4%	54.5%	91.9%	92.5%	210.6%	201.1%	488.7%	389.0%

To be clear, these are relative gains in shares. Including capital gains, the share of income received by the top 10%-5% grew from 10.96% to 12.06%, a moderate increase. The share of income received by the top 0.01% grew from 1.00% to 4.89%, a huge increase. (Yes, the top 0.01% now receive almost 5% of the income, making them on average almost 500 times richer than the rest of us.)

The pie has been getting bigger, which is a good thing. But the rich are getting an ever-larger piece of that pie, and the piece the very rich get is expanding at an alarming rate.

It’s certainly a reasonable question what is causing this rise in inequality, and what can or should be done about it. By people like the AEI try to pretend it doesn’t even exist, and that’s not economic policy analysis; that’s just plain denial.

	lorentjd on The stochastic superstar …
	lorentjd on The stochastic superstar …
	PNRJ on The stochastic superstar …
	PNRJ on The stochastic superstar …
	Why “bullshit jobs”… on The potential of an advertisin…