On foxes and hedgehogs, part I

Aug 3 JDN 2460891

Today I finally got around to reading Expert Political Judgment by Philip E. Tetlock, more or less in a single sitting because I’ve been sick the last week with some pretty tight limits on what activities I can do. (It’s mostly been reading, watching TV, or playing video games that don’t require intense focus.)

It’s really an excellent book, and I now both understand why it came so highly recommended to me, and now pass on that recommendation to you: Read it.

The central thesis of the book really boils down to three propositions:

  1. Human beings, even experts, are very bad at predicting political outcomes.
  2. Some people, who use an open-minded strategy (called “foxes”), perform substantially better than other people, who use a more dogmatic strategy (called “hedgehogs”).
  3. When rewarding predictors with money, power, fame, prestige, and status, human beings systematically favor (over)confident “hedgehogs” over (correctly) humble “foxes”.

I decided I didn’t want to make this post about current events, but I think you’ll probably agree with me when I say:

That explains a lot.

How did Tetlock determine this?

Well, he studies the issue several different ways, but the core experiment that drives his account is actually a rather simple one:

  1. He gathered a large group of subject-matter experts: Economists, political scientists, historians, and area-studies professors.
  2. He came up with a large set of questions about politics, economics, and similar topics, which could all be formulated as a set of probabilities: “How likely is this to get better/get worse/stay the same?” (For example, this was in the 1980s, so he asked about the fate of the Soviet Union: “By 1990, will they become democratic, remain as they are, or collapse and fragment?”)
  3. Each respondent answered a subset of the questions, some about their own particular field, some about another, more distant field; they assigned probabilities on an 11-point scale, from 0% to 100% in increments of 10%.
  4. A few years later, he compared the predictions to the actual results, scoring them using a Brier score, which penalizes you for assigning high probability to things that didn’t happen or low probability to things that did happen.
  5. He compared the resulting scores between people with different backgrounds, on different topics, with different thinking styles, and a variety of other variables. He also benchmarked them using some automated algorithms like “always say 33%” and “always give ‘stay the same’ 100%”.

I’ll show you the key results of that analysis momentarily, but to help it make more sense to you, let me elaborate a bit more on the “foxes” and “hedgehogs”. The notion is was first popularized by Isaiah Berlin in an essay called, simply, The Hedgehog and the Fox.

“The fox knows many things, but the hedgehog knows one very big thing.”

That is, someone who reasons as a “fox” combines ideas from many different sources and perspective, and tries to weigh them all together into some sort of synthesis that then yields a final answer. This process is messy and complicated, and rarely yields high confidence about anything.

Whereas, someone who reasons as a “hedgehog” has a comprehensive theory of the world, an ideology, that provides clear answers to almost any possible question, with the surely minor, insubstantial flaw that those answers are not particularly likely to be correct.

He also considered “hedge-foxes” (people who are mostly fox but also a little bit hedgehog) and “fox-hogs” (people who are mostly hedgehog but also a little bit fox).

Tetlock has decomposed the scores into two components: calibration and discrimination. (Both very overloaded words, but they are standard in the literature.)

Calibration is how well your stated probabilities matched up with the actual probabilities; that is, if you predicted 10% probability on 20 different events, you have very good calibration if precisely 2 of those events occurred, and very poor calibration if 18 of those events occurred.

Discrimination more or less describes how useful your predictions are, what information they contain above and beyond the simple base rate. If you just assign equal probability to all events, you probably will have reasonably good calibration, but you’ll have zero discrimination; whereas if you somehow managed to assign 100% to everything that happened and 0% to everything that didn’t, your discrimination would be perfect (and we would have to find out how you cheated, or else declare you clairvoyant).

For both measures, higher is better. The ideal for each is 100%, but it’s virtually impossible to get 100% discrimination and actually not that hard to get 100% calibration if you just use the base rates for everything.


There is a bit of a tradeoff between these two: It’s not too hard to get reasonably good calibration if you just never go out on a limb, but then your predictions aren’t as useful; we could have mostly just guessed them from the base rates.

On the graph, you’ll see downward-sloping lines that are meant to represent this tradeoff: Two prediction methods that would yield the same overall score but different levels of calibration and discrimination will be on the same line. In a sense, two points on the same line are equally good methods that prioritize usefulness over accuracy differently.

All right, let’s see the graph at last:

The pattern is quite clear: The more foxy you are, the better you do, and the more hedgehoggy you are, the worse you do.

I’d also like to point out the other two regions here: “Mindless competition” and “Formal models”.

The former includes really simple algorithms like “always return 33%” or “always give ‘stay the same’ 100%”. These perform shockingly well. The most sophisticated of these, “case-specific extrapolation” (35 and 36 on the graph, which basically assumes that each country will continue doing what it’s been doing) actually performs as well if not better than even the foxes.

And what’s that at the upper-right corner, absolutely dominating the graph? That’s “Formal models”. This describes basically taking all the variables you can find and shoving them into a gigantic logit model, and then outputting the result. It’s computationally intensive and requires a lot of data (hence why he didn’t feel like it deserved to be called “mindless”), but it’s really not very complicated, and it’s the best prediction method, in every way, by far.

This has made me feel quite vindicated about a weird nerd thing I do: When I have a big decision to make (especially a financial decision), I create a spreadsheet and assemble a linear utility model to determine which choice will maximize my utility, under different parameterizations based on my past experiences. Whichever result seems to win the most robustly, I choose. This is fundamentally similar to the “formal models” prediction method, where the thing I’m trying to predict is my own happiness. (It’s a bit less formal, actually, since I don’t have detailed happiness data to feed into the regression.) And it has worked for me, astonishingly well. It definitely beats going by my own gut. I highly recommend it.

What does this mean?

Well first of all, it means humans suck at predicting things. At least for this data set, even our experts don’t perform substantially better than mindless models like “always assume the base rate”.

Nor do experts perform much better in their own fields than in other fields; they do all perform better than undergrads or random people (who somehow perform worse than the “mindless” models)

But Tetlock also investigates further, trying to better understand this “fox/hedgehog” distinction and why it yields different performance. He really bends over backwards to try to redeem the hedgehogs, in the following ways:

  1. He allows them to make post-hoc corrections to their scores, based on “value adjustments” (assigning higher probability to events that would be really important) and “difficulty adjustments” (assigning higher scores to questions where the three outcomes were close to equally probable) and “fuzzy sets” (giving some leeway on things that almost happened or things that might still happen later).
  2. He demonstrates a different, related experiment, in which certain manipulations can cause foxes to perform a lot worse than they normally would, and even yield really crazy results like probabilities that add up to 200%.
  3. He has a whole chapter that is a Socratic dialogue (seriously!) between four voices: A “hardline neopositivist”, a “moderate neopositivist”, a “reasonable relativist”, and an “unrelenting relativist”; and all but the “hardline neopositivist” agree that there is some legitimate place for the sort of post hoc corrections that the hedgehogs make to keep themselves from looking so bad.

This post is already getting a bit long, so that will conclude part I. Stay tuned for part II, next week!

This is why we must vote our consciences.

JDN 2457465

As I write, Bernie Sanders has just officially won the Michigan Democratic Primary. It was a close race—he was ahead by about 2% the entire time—so the delegates will be split; but he won.

This is notable because so many forecasters said it was impossible. Before the election, Nate Silver, one of the best political forecasters in the world (and he still deserves that title) had predicted a less than 1% chance Bernie Sanders could win. In fact, had he taken his models literally, he would have predicted a less than 1 in 10 million chance Bernie Sanders could win—I think it speaks highly of him that he was not willing to trust his models quite that far. I got into one of the wonkiest flamewars of all time earlier today debating whether this kind of egregious statistical error should call into question many of our standard statistical methods (I think it should; another good example is the total failure of the Black-Scholes model during the 2008 financial crisis).

Had we trusted the forecasters, held our noses and voted for the “electable” candidate, this would not have happened. But instead we voted our consciences, and the candidate we really wanted won.

It is an unfortunate truth that our system of plurality “first-past-the-post” voting does actually strongly incentivize strategic voting. Indeed, did it not, we wouldn’t need primaries in the first place. With a good range voting or even Condorcet voting system, you could basically just vote honestly among all candidates and expect a good outcome. Technically it’s still possible to vote strategically in range and Condorcet systems, but it’s not necessary the way it is in plurality vote systems.

The reason we need primaries is that plurality voting is not cloneproof; if two very similar candidates (“clones”) run that everyone likes, votes will be split between them and the two highly-favored candidates can lose to a less-favored candidate. Condorcet voting is cloneproof in most circumstances, and range voting is provably cloneproof everywhere and always. (Have I mentioned that we should really have range voting?)

Hillary Clinton and Bernie Sanders are not clones by any means, but they are considerably more similar to one another than either is to Donald Trump or Ted Cruz. If all the Republicans were to immediately drop out besides Trump while Clinton and Sanders stayed in the race, Trump could end up winning because votes were split between Clinton and Sanders. Primaries exist to prevent this outcome; either Sanders or Clinton will be in the final election, but not both (the #BernieOrBust people notwithstanding), so it will be a simple matter of whether they are preferred to Trump, which of course both Clinton and Sanders are. Don’t put too much stock in these polls, as polls this early are wildly unreliable. But I think they at least give us some sense of which direction the outcome is likely to be.

Ideally, we wouldn’t need to worry about that, and we could just vote our consciences all the time. But in the general election, you really do need to vote a little strategically and choose the better (or less-bad) option among the two major parties. No third-party Presidential candidate has ever gotten close to actually winning an election, and the best they ever seem to do is acting as weak clones undermining other similar candidates, as Ross Perot and Ralph Nader did. (Still, if you were thinking of not voting at all, it is obviously preferable for you to vote for a third-party candidate. If everyone who didn’t vote had instead voted for Ralph Nader, Nader would have won by a landslide—and US climate policy would be at least a decade ahead of where it is now, and we might not be already halfway to the 2 C global warming threshold.)

But in the primary? Vote your conscience. Primaries exist to make this possible, and we just showed that it can work. When people actually turn out to vote and support candidates they believe in, they win elections. If the same thing happens in several other states that just happened in Michigan, Bernie Sanders could win this election. And even if he doesn’t, he’s already gone a lot further than most of the pundits ever thought he could. (Sadly, so has Trump.)

How is the economy doing?

JDN 2457033 EST 12:22.

Whenever you introduce yourself to someone as an economist, you will typically be asked a single question: “How is the economy doing?” I’ve already experienced this myself, and I don’t have very many dinner parties under my belt.

It’s an odd question, for a couple of reasons: First, I didn’t say I was a macroeconomic forecaster. That’s a very small branch of economics—even a small branch of macroeconomics. Second, it is widely recognized among economists that our forecasters just aren’t very good at what they do. But it is the sort of thing that pops into people’s minds when they hear the word “economist”, so we get asked it a lot.

Why are our forecasts so bad? Some argue that the task is just inherently too difficult due to the chaotic system involved; but they used to say that about weather forecasts, and yet with satellites and computer models our forecasts are now far more accurate than they were 20 years ago. Others have argued that “politics always dominates over economics”, as though politics were somehow a fundamentally separate thing, forever exogenous, a parameter in our models that cannot be predicted. I have a number of economic aphorisms I’m trying to popularize; the one for this occasion is: “Nothing is exogenous.” (Maybe fundamental constants of physics? But actually many physicists think that those constants can be derived from even more fundamental laws.) My most common is “It’s the externalities, stupid.”; next is “It’s not the incentives, it’s the opportunities.”; and the last is “Human beings are 90% rational. But woe betide that other 10%.” In fact, it’s not quite true that all our macroeconomic forecasters are bad; a few, such as Krugman, are actually quite good. The Klein Award is given each year to the best macroeconomic forecasters, and the same names pop up too often for it to be completely random. (Sadly, one of the most common is Citigroup, meaning that our banksters know perfectly well what they’re doing when they destroy our economy—they just don’t care.) So in fact I think our failures of forecasting are not inevitable or permanent.

And of course that’s not what I do at all. I am a cognitive economist; I study how economic systems behave when they are run by actual human beings, rather than by infinite identical psychopaths. I’m particularly interested in what I call the tribal paradigm, the way that people identify with groups and act in the interests of those groups, how much solidarity people feel for each other and why, and what role ideology plays in that identification. I’m hoping to one day formally model solidarity and make directly testable predictions about things like charitable donations, immigration policies and disaster responses.

I do have a more macroeconomic bent than most other cognitive economists; I’m not just interested in how human irrationality affects individuals or corporations, I’m also interested in how it affects society as a whole. But unlike most macroeconomists I care more about inequality than unemployment, and hardly at all about inflation. Unless you start getting 40% inflation per year, inflation really isn’t that harmful—and can you imagine what 40% unemployment would be like? (Also, while 100% inflation is awful, 100% unemployment would be no economy at all.) If we’re going to have a “misery index“, it should weight unemployment at least 10 times as much as inflation—and it should also include terms for poverty and inequality. Frankly maybe we should just use poverty, since I’d be prepared to accept just about any level of inflation, unemployment, or even inequality if it meant eliminating poverty. This is of course is yet another reason why a basic income is so great! An anti-poverty measure can really only be called a failure if it doesn’t actually reduce poverty; the only way that could happen with a basic income is if it somehow completely destabilized the economy, which is extremely unlikely as long as the basic income isn’t something ridiculous like $100,000 per year.

I could probably talk about my master’s thesis; the econometric models are relatively arcane, but the basic idea of correlating the income concentration of the top 1% of 1% and the level of corruption is something most people can grasp easily enough.

Of course, that wouldn’t be much of an answer to “How is the economy doing?”; usually my answer is to repeat what I’ve last read from mainstream macroeconomic forecasts, which is usually rather banal—but maybe that’s the idea? Most small talk is pretty banal I suppose (I never was very good at that sort of thing). It sounds a bit like this: No, we’re not on the verge of horrible inflation—actually inflation is currently too low. (At this point someone will probably bring up the gold standard, and I’ll have to explain that the gold standard is an unequivocally terrible idea on so, so many levels. The gold standard caused the Great Depression.) Unemployment is gradually improving, and actually job growth is looking pretty good right now; but wages are still stagnant, which is probably what’s holding down inflation. We could have prevented the Second Depression entirely, but we didn’t because Republicans are terrible at managing the economy—all of the 10 most recent recessions and almost 80% of the recessions in the last century were under Republican presidents. Instead the Democrats did their best to implement basic principles of Keynesian macroeconomics despite Republican intransigence, and we muddled through. In another year or two we will actually be back at an unemployment rate of 5%, which the Federal Reserve considers “full employment”. That’s already problematic—what about that other 5%?—but there’s another problem as well: Much of our reduction in unemployment has come not from more people being employed but instead by more people dropping out of the labor force. Our labor force participation rate is the lowest it’s been since 1978, and is still trending downward. Most of these people aren’t getting jobs; they’re giving up. At best we may hope that they are people like me, who gave up on finding work in order to invest in their own education, and will return to the labor force more knowledgeable and productive one day—and indeed, college participation rates are also rising rapidly. And no, that doesn’t mean we’re becoming “overeducated”; investment in education, so-called “human capital”, is literally the single most important factor in long-term economic output, by far. Education is why we’re not still in the Stone Age. Physical capital can be replaced, and educated people will do so efficiently. But all the physical capital in the world will do you no good if nobody knows how to use it. When everyone in the world is a millionaire with two PhDs and all our work is done by robots, maybe then you can say we’re “overeducated”—and maybe then you’d still be wrong. Being “too educated” is like being “too rich” or “too happy”.

That’s usually enough to placate my interlocutor. I should probably count my blessings, for I imagine that the first confrontation you get at a dinner party if you say you are a biologist involves a Creationist demanding that you “prove evolution”. I like to think that some mathematical biologists—yes, that’s a thing—take their request literally and set out to mathematically prove that if allele distributions in a population change according to a stochastic trend then the alleles with highest expected fitness have, on average, the highest fitness—which is what we really mean by “survival of the fittest”. The more formal, the better; the goal is to glaze some Creationist eyes. Of course that’s a tautology—but so is literally anything that you can actually prove. Cosmologists probably get similar demands to “prove the Big Bang”, which sounds about as annoying. I may have to deal with gold bugs, but I’ll take them over Creationists any day.

What do other scientists get? When I tell people I am a cognitive scientist (as a cognitive economist I am sort of both an economist and a cognitive scientist after all), they usually just respond with something like “Wow, you must be really smart.”; which I suppose is true enough, but always strikes me as an odd response. I think they just didn’t know enough about the field to even generate a reasonable-sounding question, whereas with economists they always have “How is the economy doing?” handy. Political scientists probably get “Who is going to win the election?” for the same reason. People have opinions about economics, but they don’t have opinions about cognitive science—or rather, they don’t think they do. Actually most people have an opinion about cognitive science that is totally and utterly ridiculous, more on a par with Creationists than gold bugs: That is, most people believe in a soul that survives after death. This is rather like believing that after your computer has been smashed to pieces and ground back into the sand from whence it came, all the files you had on it are still out there somewhere, waiting to be retrieved. No, they’re long gone—and likewise your memories and your personality will be long gone once your brain has rotted away. Yes, we have a soul, but it’s made of lots of tiny robots; when the tiny robots stop working the soul is no more. Everything you are is a result of the functioning of your brain. This does not mean that your feelings are not real or do not matter; they are just as real and important as you thought they were. What it means is that when a person’s brain is destroyed, that person is destroyed, permanently and irrevocably. This is terrifying and difficult to accept; but it is also most definitely true. It is as solid a fact as any in modern science. Many people see a conflict between evolution and religion; but the Pope has long since rendered that one inert. No, the real conflict, the basic fact that undermines everything religion is based upon, is not in biology but in cognitive science. It is indeed the Basic Fact of Cognitive Science: We are our brains, no more and no less. (But I suppose it wouldn’t be polite to bring that up at dinner parties.)

The “You must be really smart.” response is probably what happens to physicists and mathematicians. Quantum mechanics confuses basically everyone, so few dare go near it. The truly bold might try to bring up Schrodinger’s Cat, but are unlikely to understand the explanation of why it doesn’t work. General relativity requires thinking in tensors and four-dimensional spaces—perhaps they’ll be asked the question “What’s inside a black hole?”, which of course no physicist can really answer; the best answer may actually be, “What do you mean, inside?” And if a mathematician tries to explain their work in lay terms, it usually comes off as either incomprehensible or ridiculous: Stokes’ Theorem would be either “the integral of a differential form over the boundary of some orientable manifold is equal to the integral of its exterior derivative over the whole manifold” or else something like “The swirliness added up inside an object is equal to the swirliness added up around the edges.”

Economists, however, always seem to get this one: “How is the economy doing?”

Right now, the answer is this: “It’s still pretty bad, but it’s getting a lot better. Hopefully the new Congress won’t screw that up.”