On foxes and hedgehogs, part II

Aug 3 JDN 2460891

In last week’s post I described Philip E. Tetlock’s experiment showing that “foxes” (people who are open-minded and willing to consider alternative views) make more accurate predictions than “hedgehogs” (people who are dogmatic and conform strictly to a single ideology).

As I explained at the end of the post, he, uh, hedges on this point quite a bit, coming up with various ways that the hedgehogs might be able to redeem themselves, but still concluding that in most circumstances, the foxes seem to be more accurate.

Here are my thoughts on this:

I think he went too easy on the hedgehogs.

I consider myself very much a fox, and I honestly would never assign a probability of 0% or 100% to any physically possible event. Honestly I consider it a flaw in Tetlock’s design that he included those as options but didn’t include probabilities I would assign, like 1%, 0.1%, or 0.01%.

He only let people assign probabilities in 10% increments. So I guess if you thought something was 3% likely, you’re supposed to round to 0%? That still feels terrible. I’d probably still write 10%. There weren’t any questions like “Aliens from the Andromeda Galaxy arrive to conquer our planet, thus rendering all previous political conflicts moot”, but man, had there been, I’d still be tempted to not put 0%. I guess I would put 0% for that though? Because in 99.999999% of cases, I’d get it right—it wouldn’t happen—and I’d get more points. But man, even single-digit percentages? I’d mash the 10% button. I am pretty much allergic to overconfidence.

In fact, I think in my mind I basically try to use a logarithmic score, which unlike a Brier score, severely (technically, infinitely) punishes you for saying that something impossible happened or something inevitable didn’t. Like, really, if you’re doing it right, that should never, ever happen to you. If you assert that something has 0% probability and it happens, you have just conclusively disproven your worldview. (Admittedly it’s possible you could fix it with small changes—but a full discussion of that would get us philosophically too far afield. “outside the scope of this paper”.)

So honestly I think he was too lenient on overconfidence by using a Brier score, which does penalize this kind of catastrophic overconfidence, but only by a moderate amount. If you say that something has a 0% chance and then it happens, you get a Brier score of -1. But if you say that something has a 50% chance and then it happens (which it would, you know, 50% of the time), you’d get a Brier score of -0.25. So even absurd overconfidence isn’t really penalized that badly.

Compare this to a logarithmic rule: Say 0% and it happens, and you get negative infinity. You lose. You fail. Go home. Your worldview is bad and you should feel bad. This should never happen to you if you have a coherent worldview (modulo the fact that he didn’t let you say 0.01%).

So if I had designed this experiment, I would have given finer-grained options at the extremes, and then brought the hammer down on anybody who actually asserted a 0% chance of an event that actually occurred. (There’s no need for the finer-grained options elsewhere; over millennia of history, the difference between 0% and 0.1% is whether it won’t happen or it will—quite relevant for, say, full-scale nuclear war—while the difference between 40% and 42.1% is whether it’ll happen every 2 to 3 years or… every 2 to 3 years.)

But okay, let’s say we stick with the Brier score, because infinity is scary.

  1. About the adjustments:
    1. The “value adjustments” are just absolute nonsense. Those would be reasons to adjust your policy response, via your utility function—they are not a reason to adjust your probability. Yes, a nuclear terrorist attack would be a really big deal if it happened and we should definitely be taking steps to prevent that; but that doesn’t change the fact that the probability of one happening is something like 0.1% per year and none have ever happened. Predicting things that don’t happen is bad forecasting, even if the things you are predicting would be very important if they happened.
    2. The “difficulty adjustments” are sort of like applying a different scoring rule, so that I’m more okay with; but that wasn’t enough to make the hedgehogs look better than the foxes.
    3. The “fuzzy set” adjustments could be legitimate, but only under particular circumstances. Being “almost right” is only valid if you clearly showed that the result was anomalous because of some other unlikely event, and—because the timeframe was clearly specified in the questions—“might still happen” should still get fewer points than accurately predicting that it hasn’t happened yet. Moreover, it was very clear that people only ever applied these sort of changes when they got things wrong; they rarely if ever said things like “Oh, wow, I said that would happen and it did, but for completely different reasons that I didn’t expect—I was almost wrong there.” (Crazy example, but if the Soviet Union had been taken over by aliens, “the Soviet Union will fall” would be correct—but I don’t think you could really attribute that to good political prediction.)
  2. The second exercise shows that even the foxes are not great Bayesians, and that some manipulations can make people even more inaccurate than before; but the hedgehogs also perform worse and also make some of the same crazy mistakes and still perform worse overall than the foxes, even in that experiment.
  3. I guess he’d call me a “hardline neopositivist”? Because I think that your experiment asking people to predict things should require people to, um, actually predict things? The task was not to get the predictions wrong but be able to come up with clever excuses for why they were wrong that don’t challenge their worldview. The task was to not get the predictions wrong. Apparently this very basic level of scientific objectivity is now considered “hardline neopositivism”.

I guess we can reasonably acknowledge that making policy is about more than just prediction, and indeed maybe being consistent and decisive is advantageous in a game-theoretic sense (in much the same way that the way to win a game of Chicken is to very visibly throw away your steering wheel). So you could still make a case for why hedgehogs are good decision-makers or good leaders.

But I really don’t see how you weasel out of the fact that hedgehogs are really bad predictors. If I were running a corporation, or a government department, or an intelligence agency, I would want accurate predictions. I would not be interested in clever excuses or rich narratives. Maybe as leaders one must assemble such narratives in order to motivate people; so be it, there’s a division of labor there. Maybe I’d have a separate team of narrative-constructing hedgehogs to help me with PR or something. But the people who are actually analyzing the data should be people who are good at making accurate predictions, full stop.

And in fact, I don’t think hedgehogs are good decision-makers or good leaders. I think they are good politicians. I think they are good at getting people to follow them and believe what they say. But I do not think they are actually good at making the decisions that would be the best for society.

Indeed, I think this is a very serious problem.

I think we systematically elect people to higher office—and hire them for jobs, and approve them for tenure, and so on—because they express confidence rather than competence. We pick the people who believe in themselves the most, who (by regression to the mean if nothing else) are almost certainly the people who are most over-confident in themselves.

Given that confidence is easier to measure than competence in most areas, it might still make sense to choose confident people if confidence were really positively correlated with competence, but I’m not convinced that it is. I think part of what Tetlock is showing us is that the kind of cognitive style that yields high confidence—a hedgehog—simply is not the kind of cognitive style that yields accurate beliefs—a fox. People who are really good at their jobs are constantly questioning themselves, always open to new ideas and new evidence; but that also means that they hedge their bets, say “on the other hand” a lot, and often suffer from Impostor Syndrome. (Honestly, testing someone for Impostor Syndrome might be a better measure of competence than a traditional job interview! Then again, Goodhart’s Law.)

Indeed, I even see this effect within academic science; the best scientists I know are foxes through and through, but they’re never the ones getting published in top journals and invited to give keynote speeches at conferences. The “big names” are always hedgehog blowhards with some pet theory they developed in the 1980s that has failed to replicate but somehow still won’t die.

Moreover, I would guess that trustworthiness is actually pretty strongly inversely correlated to confidence—“con artist” is short for “confidence artist”, after all.

Then again, I tried to find rigorous research comparing openness (roughly speaking “fox-ness”) or humility to honesty, and it was surprisingly hard to find. Actually maybe the latter is just considered an obvious consensus in the literature, because there is a widely-used construct called honesty-humility. (In which case, yeah, my thinking on trustworthiness and confidence is an accepted fact among professional psychologists—but then, why don’t more people know that?)

But that still doesn’t tell me if there is any correlation between honesty-humility and openness.

I did find these studies showing that both honesty-humility and openness are both positively correlated with well-being, both positively correlated with cooperation in experimental games, and both positively correlated with being left-wing; but that doesn’t actually prove they are positively correlated with each other. I guess it provides weak evidence in that direction, but only weak evidence. It’s entirely possible for A to be positively correlated with both B and C but B and C are uncorrelated or negatively correlated. (Living in Chicago is positively correlated with being a White Sox fan and positively correlated with being a Cubs fan, but being a White Sox fan is not positively correlated with being a Cubs fan!)

I also found studies showing that higher openness predicts less right-wing authoritarianism and higher honesty predicts less social conformity; but that wasn’t the question either.

Here’s a factor analysis specifically arguing for designing measures of honesty-humility so that they don’t correlate with other personality traits, so it can be seen as its own independent personality trait. There are some uncomfortable degrees of freedom in designing new personality metrics, which may make this sort of thing possible; and then by construction honesty-humility and openness would be uncorrelated, because any shared components were parceled out to one trait or the other.

So, I guess I can’t really confirm my suspicion here; maybe people who think like hedgehogs aren’t any less honest, or are even more honest, than people who think like foxes. But I’d still bet otherwise. My own life experience has been that foxes are honest and humble while hedgehogs are deceitful and arrogant.

Indeed, I believe that in systematically choosing confident hedgehogs as leaders, the world economy loses tens of trillions of dollars a year in inefficiencies. In fact, I think that we could probably end world hunger if we only ever put leaders in charge who were both competent and trustworthy.

Of course, in some sense that’s a pipe dream; we’re never going to get all good leaders, just as we’ll never get zero death or zero crime.

But based on how otherwise-similar countries have taken wildly different trajectories based on differences in leadership, I suspect that even relatively small changes in that direction could have quite large impacts on a society’s outcomes: South Korea isn’t perfect at picking its leaders; but surely it’s better than North Korea, and indeed that seems like one of the primary things that differentiates the two countries. Botswana is not a utopian paradise, but it’s a much nicer place to live than Nigeria, and a lot of the difference seems to come down to who is in charge, or who has been in charge for the last few decades.

And I could put in a jab here about the current state of the United States, but I’ll resist. If you read my blog, you already know my opinions on this matter.

On foxes and hedgehogs, part I

Aug 3 JDN 2460891

Today I finally got around to reading Expert Political Judgment by Philip E. Tetlock, more or less in a single sitting because I’ve been sick the last week with some pretty tight limits on what activities I can do. (It’s mostly been reading, watching TV, or playing video games that don’t require intense focus.)

It’s really an excellent book, and I now both understand why it came so highly recommended to me, and now pass on that recommendation to you: Read it.

The central thesis of the book really boils down to three propositions:

  1. Human beings, even experts, are very bad at predicting political outcomes.
  2. Some people, who use an open-minded strategy (called “foxes”), perform substantially better than other people, who use a more dogmatic strategy (called “hedgehogs”).
  3. When rewarding predictors with money, power, fame, prestige, and status, human beings systematically favor (over)confident “hedgehogs” over (correctly) humble “foxes”.

I decided I didn’t want to make this post about current events, but I think you’ll probably agree with me when I say:

That explains a lot.

How did Tetlock determine this?

Well, he studies the issue several different ways, but the core experiment that drives his account is actually a rather simple one:

  1. He gathered a large group of subject-matter experts: Economists, political scientists, historians, and area-studies professors.
  2. He came up with a large set of questions about politics, economics, and similar topics, which could all be formulated as a set of probabilities: “How likely is this to get better/get worse/stay the same?” (For example, this was in the 1980s, so he asked about the fate of the Soviet Union: “By 1990, will they become democratic, remain as they are, or collapse and fragment?”)
  3. Each respondent answered a subset of the questions, some about their own particular field, some about another, more distant field; they assigned probabilities on an 11-point scale, from 0% to 100% in increments of 10%.
  4. A few years later, he compared the predictions to the actual results, scoring them using a Brier score, which penalizes you for assigning high probability to things that didn’t happen or low probability to things that did happen.
  5. He compared the resulting scores between people with different backgrounds, on different topics, with different thinking styles, and a variety of other variables. He also benchmarked them using some automated algorithms like “always say 33%” and “always give ‘stay the same’ 100%”.

I’ll show you the key results of that analysis momentarily, but to help it make more sense to you, let me elaborate a bit more on the “foxes” and “hedgehogs”. The notion is was first popularized by Isaiah Berlin in an essay called, simply, The Hedgehog and the Fox.

“The fox knows many things, but the hedgehog knows one very big thing.”

That is, someone who reasons as a “fox” combines ideas from many different sources and perspective, and tries to weigh them all together into some sort of synthesis that then yields a final answer. This process is messy and complicated, and rarely yields high confidence about anything.

Whereas, someone who reasons as a “hedgehog” has a comprehensive theory of the world, an ideology, that provides clear answers to almost any possible question, with the surely minor, insubstantial flaw that those answers are not particularly likely to be correct.

He also considered “hedge-foxes” (people who are mostly fox but also a little bit hedgehog) and “fox-hogs” (people who are mostly hedgehog but also a little bit fox).

Tetlock has decomposed the scores into two components: calibration and discrimination. (Both very overloaded words, but they are standard in the literature.)

Calibration is how well your stated probabilities matched up with the actual probabilities; that is, if you predicted 10% probability on 20 different events, you have very good calibration if precisely 2 of those events occurred, and very poor calibration if 18 of those events occurred.

Discrimination more or less describes how useful your predictions are, what information they contain above and beyond the simple base rate. If you just assign equal probability to all events, you probably will have reasonably good calibration, but you’ll have zero discrimination; whereas if you somehow managed to assign 100% to everything that happened and 0% to everything that didn’t, your discrimination would be perfect (and we would have to find out how you cheated, or else declare you clairvoyant).

For both measures, higher is better. The ideal for each is 100%, but it’s virtually impossible to get 100% discrimination and actually not that hard to get 100% calibration if you just use the base rates for everything.


There is a bit of a tradeoff between these two: It’s not too hard to get reasonably good calibration if you just never go out on a limb, but then your predictions aren’t as useful; we could have mostly just guessed them from the base rates.

On the graph, you’ll see downward-sloping lines that are meant to represent this tradeoff: Two prediction methods that would yield the same overall score but different levels of calibration and discrimination will be on the same line. In a sense, two points on the same line are equally good methods that prioritize usefulness over accuracy differently.

All right, let’s see the graph at last:

The pattern is quite clear: The more foxy you are, the better you do, and the more hedgehoggy you are, the worse you do.

I’d also like to point out the other two regions here: “Mindless competition” and “Formal models”.

The former includes really simple algorithms like “always return 33%” or “always give ‘stay the same’ 100%”. These perform shockingly well. The most sophisticated of these, “case-specific extrapolation” (35 and 36 on the graph, which basically assumes that each country will continue doing what it’s been doing) actually performs as well if not better than even the foxes.

And what’s that at the upper-right corner, absolutely dominating the graph? That’s “Formal models”. This describes basically taking all the variables you can find and shoving them into a gigantic logit model, and then outputting the result. It’s computationally intensive and requires a lot of data (hence why he didn’t feel like it deserved to be called “mindless”), but it’s really not very complicated, and it’s the best prediction method, in every way, by far.

This has made me feel quite vindicated about a weird nerd thing I do: When I have a big decision to make (especially a financial decision), I create a spreadsheet and assemble a linear utility model to determine which choice will maximize my utility, under different parameterizations based on my past experiences. Whichever result seems to win the most robustly, I choose. This is fundamentally similar to the “formal models” prediction method, where the thing I’m trying to predict is my own happiness. (It’s a bit less formal, actually, since I don’t have detailed happiness data to feed into the regression.) And it has worked for me, astonishingly well. It definitely beats going by my own gut. I highly recommend it.

What does this mean?

Well first of all, it means humans suck at predicting things. At least for this data set, even our experts don’t perform substantially better than mindless models like “always assume the base rate”.

Nor do experts perform much better in their own fields than in other fields; they do all perform better than undergrads or random people (who somehow perform worse than the “mindless” models)

But Tetlock also investigates further, trying to better understand this “fox/hedgehog” distinction and why it yields different performance. He really bends over backwards to try to redeem the hedgehogs, in the following ways:

  1. He allows them to make post-hoc corrections to their scores, based on “value adjustments” (assigning higher probability to events that would be really important) and “difficulty adjustments” (assigning higher scores to questions where the three outcomes were close to equally probable) and “fuzzy sets” (giving some leeway on things that almost happened or things that might still happen later).
  2. He demonstrates a different, related experiment, in which certain manipulations can cause foxes to perform a lot worse than they normally would, and even yield really crazy results like probabilities that add up to 200%.
  3. He has a whole chapter that is a Socratic dialogue (seriously!) between four voices: A “hardline neopositivist”, a “moderate neopositivist”, a “reasonable relativist”, and an “unrelenting relativist”; and all but the “hardline neopositivist” agree that there is some legitimate place for the sort of post hoc corrections that the hedgehogs make to keep themselves from looking so bad.

This post is already getting a bit long, so that will conclude part I. Stay tuned for part II, next week!

Nature via Nurture

JDN 2457222 EDT 16:33.

One of the most common “deep questions” human beings have asked ourselves over the centuries is also one of the most misguided, the question of “nature versus nurture”: Is it genetics or environment that makes us what we are?

Humans are probably the single entity in the universe for which this question makes least sense. Artificial constructs have no prior existence, so they are “all nurture”, made what we choose to make them. Most other organisms on Earth behave accordingly to fixed instinctual programming, acting out a specific series of responses that have been honed over millions of years, doing only one thing, but doing it exceedingly well. They are in this sense “all nature”. As the saying goes, the fox knows many things, but the hedgehog knows one very big thing. Most organisms on Earth are in this sense hedgehogs, but we Homo sapiens are the ultimate foxes. (Ironically, hedgehogs are not actually “hedgehogs” in this sense: Being mammals, they have an advanced brain capable of flexibly responding to environmental circumstances. Foxes are a good deal more intelligent still, however.)

But human beings are by far the most flexible, adaptable organism on Earth. We live on literally every continent; despite being savannah apes we even live deep underwater and in outer space. Unlike most other species, we do not fit into a well-defined ecological niche; instead, we carve our own. This certainly has downsides; human beings are ourselves a mass extinction event.

Does this mean, therefore, that we are tabula rasa, blank slates upon which anything can be written?

Hardly. We’re more like word processors. Staring (as I of course presently am) at the blinking cursor of a word processor on a computer screen, seeing that wide, open space where a virtual infinity of possible texts could be written, depending entirely upon a sequence of miniscule key vibrations, you could be forgiven for thinking that you are looking at a blank slate. But in fact you are looking at the pinnacle of thousands of years of technological advancement, a machine so advanced, so precisely engineered, that its individual components are one ten-thousandth the width of a human hair (Intel just announced that we can now do even better than that). At peak performance, it is capable of over 100 billion calculations per second. Its random-access memory stores as much information as all the books on a stacks floor of the Hatcher Graduate Library, and its hard drive stores as much as all the books in the US Library of Congress. (Of course, both libraries contain digital media as well, exceeding anything my humble hard drive could hold by a factor of a thousand.)

All of this, simply to process text? Of course not; word processing is an afterthought for a processor that is specifically designed for dealing with high-resolution 3D images. (Of course, nowadays even a low-end netbook that is designed only for word processing and web browsing can typically handle a billion calculations per second.) But there the analogy with humans is quite accurate as well: Written language is about 10,000 years old, while the human visual mind is at least 100,000. We were 3D image analyzers long before we were word processors. This may be why we say “a picture is worth a thousand words”; we process each with about as much effort, even though the image necessarily contains thousands of times as many bits.

Why is the computer capable of so many different things? Why is the human mind capable of so many more? Not because they are simple and impinged upon by their environments, but because they are complex and precision-engineered to nonlinearly amplify tiny inputs into vast outputs—but only certain tiny inputs.

That is, it is because of our nature that we are capable of being nurtured. It is precisely the millions of years of genetic programming that have optimized the human brain that allow us to learn and adapt so flexibly to new environments and form a vast multitude of languages and cultures. It is precisely the genetically-programmed humanity we all share that makes our environmentally-acquired diversity possible.

In fact, causality also runs the other direction. Indeed, when I said other organisms were “all nature” that wasn’t right either; for even tightly-programmed instincts are evolved through millions of years of environmental pressure. Human beings have even been involved in cultural interactions long enough that it has begun to affect our genetic evolution; the reason I can digest lactose is that my ancestors about 10,000 years ago raised goats. We have our nature because of our ancestors’ nurture.

And then of course there’s the fact that we need a certain minimum level of environmental enrichment even to develop normally; a genetically-normal human raised into a deficient environment will suffer a kind of mental atrophy, as when children raised feral lose their ability to speak.

Thus, the question “nature or nurture?” seems a bit beside the point: We are extremely flexible and responsive to our environment, because of innate genetic hardware and software, which requires a certain environment to express itself, and which arose because of thousands of years of culture and millions of years of the struggle for survival—we are nurture because nature because nurture.

But perhaps we didn’t actually mean to ask about human traits in general; perhaps we meant to ask about some specific trait, like spatial intelligence, or eye color, or gender identity. This at least can be structured as a coherent question: How heritable is the trait? What proportion of the variance in this population is caused by genetic variation? Heritability analysis is a well-established methodology in behavioral genetics.
Yet, that isn’t the same question at all. For while height is extremely heritable within a given population (usually about 80%), human height worldwide has been increasing dramatically over time due to environmental influences and can actually be used as a measure of a nation’s economic development. (Look at what happened to the height of men in Japan.) How heritable is height? You have to be very careful what you mean.

Meanwhile, the heritability of neurofibromatosis is actually quite low—as many people acquire the disease by new mutations as inherit it from their parents—but we know for a fact it is a genetic disorder, because we can point to the specific genes that mutate to cause the disease.

Heritability also depends on the population under consideration; speaking English is more heritable within the United States than it is across the world as a whole, because there are a larger proportion of non-native English speakers in other countries. In general, a more diverse environment will lead to lower heritability, because there are simply more environmental influences that could affect the trait.

As children get older, their behavior gets more heritablea result which probably seems completely baffling, until you understand what heritability really means. Your genes become a more important factor in your behavior as you grow up, because you become separated from the environment of your birth and immersed into the general environment of your whole society. Lower environmental diversity means higher heritability, by definition. There’s also an effect of choosing your own environment; people who are intelligent and conscientious are likely to choose to go to college, where they will be further trained in knowledge and self-control. This latter effect is called niche-picking.

This is why saying something like “intelligence is 80% genetic” is basically meaningless, and “intelligence is 80% heritable” isn’t much better until you specify the reference population. The heritability of intelligence depends very much on what you mean by “intelligence” and what population you’re looking at for heritability. But even if you do find a high heritability (as we do for, say, Spearman’s g within the United States), this doesn’t mean that intelligence is fixed at birth; it simply means that parents with high intelligence are likely to have children with high intelligence. In evolutionary terms that’s all that matters—natural selection doesn’t care where you got your traits, only that you have them and pass them to your offspring—but many people do care, and IQ being heritable because rich, educated parents raise rich, educated children is very different from IQ being heritable because innately intelligent parents give birth to innately intelligent children. If genetic variation is systematically related to environmental variation, you can measure a high heritability even though the genes are not directly causing the outcome.

We do use twin studies to try to sort this out, but because identical twins raised apart are exceedingly rare, two very serious problems emerge: One, there usually isn’t a large enough sample size to say anything useful; and more importantly, this is actually an inaccurate measure in terms of natural selection. The evolutionary pressure is based on the correlation with the genes—it actually doesn’t matter whether the genes are directly causal. All that matters is that organisms with allele X survive and organisms with allele Y do not. Usually that’s because allele X does something useful, but even if it’s simply because people with allele X happen to mostly come from a culture that makes better guns, that will work just as well.

We can see this quite directly: White skin spread across the world not because it was useful (it’s actually terrible in any latitude other than subarctic), but because the cultures that conquered the world happened to be comprised mostly of people with White skin. In the 15th century you’d find a very high heritability of “using gunpowder weapons”, and there was definitely a selection pressure in favor of that trait—but it obviously doesn’t take special genes to use a gun.

The kind of heritability you get from twin studies is answering a totally different, nonsensical question, something like: “If we reassigned all offspring to parents randomly, how much of the variation in this trait in the new population would be correlated with genetic variation?” And honestly, I think the only reason people think that this is the question to ask is precisely because even biologists don’t fully grasp the way that nature and nurture are fundamentally entwined. They are trying to answer the intuitive question, “How much of this trait is genetic?” rather than the biologically meaningful “How strongly could a selection pressure for this trait evolve this gene?”

And if right now you’re thinking, “I don’t care how strongly a selection pressure for the trait could evolve some particular gene”, that’s fine; there are plenty of meaningful scientific questions that I don’t find particularly interesting and are probably not particularly important. (I hesitate to provide a rigid ranking, but I think it’s safe to say that “How does consciousness arise?” is a more important question than “Why are male platypuses venomous?” and “How can poverty be eradicated?” is a more important question than “How did the aircraft manufacturing duopoly emerge?”) But that’s really the most meaningful question we can construct from the ill-formed question “How much of this trait is genetic?” The next step is to think about why you thought that you were asking something important.

What did you really mean to ask?

For a bald question like, “Is being gay genetic?” there is no meaningful answer. We could try to reformulate it as a meaningful biological question, like “What is the heritability of homosexual behavior among males in the United States?” or “Can we find genetic markers strongly linked to self-identification as ‘gay’?” but I don’t think those are the questions we really meant to ask. I think actually the question we meant to ask was more fundamental than that: Is it legitimate to discriminate against gay people? And here the answer is unequivocal: No, it isn’t. It is a grave mistake to think that this moral question has anything to do with genetics; discrimination is wrong even against traits that are totally environmental (like religion, for example), and there are morally legitimate actions to take based entirely on a person’s genes (the obvious examples all coming from medicine—you don’t treat someone for cystic fibrosis if they don’t actually have it).

Similarly, when we ask the question “Is intelligence genetic?” I don’t think most people are actually interested in the heritability of spatial working memory among young American males. I think the real question they want to ask is about equality of opportunity, and what it would look like if we had it. If success were entirely determined by intelligence and intelligence were entirely determined by genetics, then even a society with equality of opportunity would show significant inequality inherited across generations. Thus, inherited inequality is not necessarily evidence against equality of opportunity. But this is in fact a deeply disingenuous argument, used by people like Charles Murray to excuse systemic racism, sexism, and concentration of wealth.

We didn’t have to say that inherited inequality is necessarily or undeniably evidence against equality of opportunity—merely that it is, in fact, evidence of inequality of opportunity. Moreover, it is far from the only evidence against equality of opportunity; we also can observe the fact that college-educated Black people are no more likely to be employed than White people who didn’t even finish high school, for example, or the fact that otherwise identical resumes with predominantly Black names (like “Jamal”) are less likely to receive callbacks compared to predominantly White names (like “Greg”). We can observe that the same is true for resumes with obviously female names (like “Sarah”) versus obviously male names (like “David”), even when the hiring is done by social scientists. We can directly observe that one-third of the 400 richest Americans inherited their wealth (and if you look closer into the other two-thirds, all of them had some very unusual opportunities, usually due to their family connections—“self-made” is invariably a great exaggeration). The evidence for inequality of opportunity in our society is legion, regardless of how genetics and intelligence are related. In fact, I think that the high observed heritability of intelligence is largely due to the fact that educational opportunities are distributed in a genetically-biased fashion, but I could be wrong about that; maybe there really is a large genetic influence on human intelligence. Even so, that does not justify widespread and directly-measured discrimination. It does not justify a handful of billionaires luxuriating in almost unimaginable wealth as millions of people languish in poverty. Intelligence can be as heritable as you like and it is still wrong for Donald Trump to have billions of dollars while millions of children starve.

This is what I think we need to do when people try to bring up a “nature versus nurture” question. We can certainly talk about the real complexity of the relationship between genetics and environment, which I think are best summarized as “nature via nurture”; but in fact usually we should think about why we are asking that question, and try to find the real question we actually meant to ask.