How to be a deontological consequentialist

Dec 7 JDN 2461017

As is commonly understood, there are two main branches of normative ethics:

  • Deontology, on which morality consists in following rules and fulfilling obligations, and
  • Consequentialism, on which morality consists in maximizing good consequences.

The conflict between them has raged for centuries, with Kantians leading the deontologists and utilitarians leading the consequentialists. Both theories seem to have a lot of good points, but neither can decisively defeat the other.

I think this is because they are both basically correct.

In their strongest forms, deontology and consequentialism are mutually contradictory; but it turns out that you can soften each of them a little bit, and the results become compatible.

To make deontology a little more consequentialist, let’s ask a simple question:

What makes a rule worth following?

I contend that the best answer we have is “because following that rule would make the world better off than not following that rule”. (Even Kantians pretty much have to admit this: What maxim could you will to be an absolute law? Only a law that would yield good outcomes.)

That is, the ultimate justification of a sound deontology would be fundamentally consequentialist.

But lest the consequentialists get too smug, we can also ask them another question, which is a bit subtler:

How do you know which actions will ultimately have good consequences?

Sure, if we were omniscient beings who could perfectly predict the consequences of our actions across the entire galaxy on into the indefinite future, we could be proper act utilitarians who literally choose every single action according to a calculation of the expected utility.

But in practice, we have radical uncertainty about the long-term consequences of our actions, and can generally only predict the immediate consequences.

That leads to the next question:

Would you really want to live in a world where people optimized immediate consequences?

I contend that you would not, that such a world actually sounds like a dystopian nightmare.

Immediate consequences say that if a healthy person walks into a hospital and happens to have compatible organs for five people who need donations, we should kill that person, harvest their organs, and give them to the donors. (This is the organ transplant variant of the Trolley Problem.)

Basically everyone recognizes that this is wrong. But why is it wrong? That’s thornier. One pretty convincing case is that a systematic policy of this kind would undermine trust in hospitals and destroy the effectiveness of healthcare in general, resulting in disastrous consequences far outweighing the benefit of saving those five people. But those aren’t immediate consequences, and indeed, it’s quite difficult to predict exactly how many crazy actions like this it would take to undermine people’s trust in hospitals, just how much it would undermine that trust, or exactly what the consequences of that lost trust would be.

So it seems like it’s actually better to have a rule about this.

This makes us into rule utilitarians, who instead of trying to optimize literally every single action—which requires information we do not have and never will—we instead develop a system of rules that we can follow, heuristics that will allow us to get better outcomes generally even if they can’t be guaranteed to produce the best possible outcome in any particular case.

That is, the output of a sophisticated consequentialism is fundamentally deontological.

We have come at the question of normative ethics from two very different directions, but the results turned out basically the same:

We should follow the rules that would have the best consequences.

The output of our moral theory is rules, like deontology; but its fundamental justification is based on outcomes, like consequentialism.

In my experience, when I present this account to staunch deontologists, they are pretty much convinced by it. They’re prepared to give up the fundamental justification to consequences if it allows them to have their rules.

The resistance I get is mainly from staunch consequentialists, who insist that it’s not so difficult to optimize individual actions, and so we should just do that instead of making all these rules.

So it is to those consequentialists, particularly those who say “rule utilitarianism collapses into act utilitarianism”, to whom the rest of the post is addressed.

First, let me say that I agree.

In the ideal case of omniscient, perfectly-benevolent, perfectly-rational agents, rule utilitarianism mathematically collapses into act utilitarianism. That is a correct theorem.

However, we do not live in the ideal case of omniscient, perfectly-benevolent, perfectly-rational agents. We are not even close to that ideal case; we will never be close to that ideal case. Indeed, I think part of the problem here is that you fail to fully grasp the depth and width of the chasm between here and there. Even a galactic civilization of a quintillion superhuman AIs would still not be close to that ideal case.

Quite frankly, humans aren’t even particularly good at forecasting what will make themselves happy.

There are massive errors and systematic biases in human affective forecasting.

One of the post important biases is impact bias: People systematically overestimate the impact of individual events on their long-term happiness. Some of this seems to be just due to focus: Paying attention to a particular event exaggerates its importance in your mind, and makes it harder for you to recall other events that might push your emotions in a different direction. Another component is called immune neglect: people fail to account for their own capacity to habituate to both pleasant and unpleasant experiences. (This effect is often overstated: It’s a common misconception that lottery winners are no happier than they were before. No, they absolutely are happier, on average; they’re just not as much happier as they predicted themselves to be.)

People also use inconsistent time discounting: $10 today is judged as better than $11 tomorrow, but $10 in 364 days is not regarded as better than $11 in 365 days—so if I made a decision a year ago, I’d want to change it now. (The correct answer, by the way, is to take the $11; a discount rate of 10% per day is a staggering 120,000,000,000,000,000% APR—seriously; check it yourself—so you’d better not be discounting at that rate, unless you’re literally going to die before tomorrow.)

Now, compound that with the fact that different human beings come at the world from radically different perspectives and with radically different preferences.

How good do you think we are at predicting what will make other people happy?

Damn right: We’re abysmal.

Basically everyone assumes that what they want and what they would feel is also what other people will want and feel—which, honestly, explains a lot about politics. As a result, my prediction of your feelings is more strongly correlated with my prediction of my feelings than it is with your actual feelings.

The impact bias is especially strong when forecasting other people’s feelings in response to our own actions: We tend to assume that other people care more about what we do than they actually care—and this seems to be a major source of social anxiety.

People also tend to overestimate the suffering of others, and are generally willing to endure more pain than they are willing to inflict upon others. (This one seems like it might be a good thing!)

Even when we know people well, we can still be totally blindsided by their emotional reactions. We’re just really awful at this.

Does this just mean that morality is hopeless? We have no idea what we’re doing?

Fortunately, no. Because while no individual can correctly predict or control the outcomes of particular actions, the collective action of well-designed institutions can in fact significantly improve the outcomes of policy.

This is why we have things like the following:

  • Laws
  • Courts
  • Regulations
  • Legislatures
  • Constitutions
  • Newspapers
  • Universities

These institutions—which form the backbone of liberal democracy—aren’t simply arbitrary. They are the result of hard-fought centuries, a frothing, volatile, battle-tested mix of intentional design and historical evolution.

Are these institutions optimal? Good heavens, no!

But we have no idea what optimal institutions look like, and probably never will. (Those galaxy-spanning AIs will surely have a better system than this; but even theirs probably won’t be optimal.) Instead, what we are stuck with are the best institutions we’ve come up with so far.

Moreover, we do have very clear empirical evidence at this point that some form of liberal democracy with a mixed economy is the best system we’ve got so far. One can reasonably debate whether Canada is doing better or worse than France, or whether the system in Denmark could really be scaled to the United States, or just what the best income tax rates are; but there is a large, obvious, and important difference between life in a country like Canada or Denmark and life in a country like Congo or Afghanistan.

Indeed, perhaps there is no better pair to compare than North and South Korea: Those two countries are right next to each other, speak the same language, and started in more or less the same situation; but the south got good institutions and the north got bad ones, and now the difference between them couldn’t be more stark. (Honestly, this is about as close as we’re ever likely to get of a randomized controlled experiment in macroeconomics.)

People in South Korea now live about as well as some of the happiest places in the world; their GDP per capita PPP is about $65,000 per year, roughly the same as Canada. People in North Korea live about as poorly as it is possible for humans to live, subject to totalitarian oppression and living barely above subsistence; their GDP per capita PPP is estimated to be $600 per year—less than 1% as much.

The institutions of South Korea are just that much better.

Indeed, there’s one particular aspect of good institutions that seems really important, yet is actually kind of hard to justify in act-utilitarian terms:

Why is freedom good?

A country’s level of freedom is almost perfectly correlated with its overall level of happiness and development. (Yes, even on this measure, #ScandinaviaIsBetter.)

But why? In theory, letting people do whatever they want could actually lead to really bad outcomes—and indeed, occasionally it does. There’s even a theorem that liberty is incompatible with full Pareto-efficiency. But all the countries with the happiest people seem to have a lot of liberty, and indeed the happiest ones seem to have the most. How come?

My answer:

Personal liberty is a technology for heuristic utility maximization.

In the ideal case, we wouldn’t really need personal liberty; you could just compel everyone to do whatever is optimal all the time, and that would—by construction—be optimal. It might even be sort of nice: You don’t need to make any difficult decisions, you can just follow the script and know that everything will turn out for the best.

But since we don’t know what the optimal choice is—even in really simple cases, like what you should eat for lunch tomorrow—we can’t afford to compel people in this way. (It would also be incredibly costly to implement such totalitarian control, but that doesn’t stop some governments from trying!)

Then there are disagreements: What I think is optimal may not be what you think is optimal, and in truth we’re probably both wrong (but one of us may be less wrong).

And that’s not even getting into conflicts of interest: We aren’t just lacking in rationality, we’re also lacking in benevolence. Some people are clearly much more benevolent than others, but none of us are really 100% selfless. (Sadly, I think some people are 100% selfish.)

In fact, this is a surprisingly deep question:

Would the world be better if we were selfless?

Could there be actually some advantage in aggregate to having some degree of individual self-interest?

Here are some ways that might hold, just off the top of my head:

  • Partial self-interest supports an evolutionary process of moral and intellectual development that otherwise would be stalled or overrun by psychopaths—see my post on Rousseaus and Axelrods
  • Individuals have much deeper knowledge of their own preferences than anyone else’s, and thus can optimize them much better. (Think about it: This is true even of people you know very well. Otherwise, why would we ever need to ask our spouses one of the most common questions in any marriage: “Honey, what do you want for dinner tonight?”)
  • Self-interest allows for more efficient economic incentives, and thus higher overall productivity.

Of course, total selfishness is clearly not optimal—that way lies psychopathy. But some degree of selfishness might actually be better for long-term aggregate outcomes than complete altruism, and this is to some extent an empirical question.

Personal liberty solves a lot of these problems: Since people are best at knowing their own preferences, let people figure out on their own what’s good for them. Give them the freedom to live the kind of life they want to live, within certain reasonable constraints to prevent them from causing great harm to others or suffering some kind of unrecoverable mistake.

This isn’t exactly a new idea; it’s basically the core message of John Stuart Mill’s On Liberty (which I consider a good candidate for the best book every written—seriously, it beats the Bible by a light-year). But by putting it in more modern language, I hope to show that deontology and consequentialism aren’t really so different after all.

And indeed, for all its many and obvious flaws, freedom seems to work pretty well—at least as well as anything we’ve tried.

Why would AI kill us?

Nov 16 JDN 2460996

I recently watched this chilling video which relates to the recent bestseller by Eleizer Yudkowsky and Nate Soares, If Anyone Builds It, Everyone Dies. It tells a story of one possible way that a superintelligent artificial general intelligence (AGI) might break through its containment, concoct a devious scheme, and ultimately wipe out the human race.

I have very mixed feelings about this sort of thing, because two things are true:

  • I basically agree with the conclusions.
  • I think the premises are pretty clearly false.

It basically feels like I have been presented with an argument like this, where the logic is valid and the conclusion is true, but the premises are not:

  • “All whales are fish.”
  • “All fish are mammals.”
  • “Therefore, all whales are mammals.”

I certainly agree that artificial intelligence (AI) is very dangerous, and that AI development needs to be much more strictly regulated, and preferably taken completely out of the hands of all for-profit corporations and military forces as soon as possible. If AI research is to be done at all, it should be done by nonprofit entities like universities and civilian government agencies like the NSF. This change needs to be done internationally, immediately, and with very strict enforcement. Artificial intelligence poses the same order of magnitude a threat as nuclear weapons, and is nowhere near as well-regulated right now.

The actual argument that I’m disagreeing with this basically boils down to:

  • “Through AI research, we will soon create an AGI that is smarter than us.”
  • “An AGI that is smarter than us will want to kill us all, and probably succeed if it tries.”
  • “Therefore, AI is extremely dangerous.”

As with the “whales are fish” argument, I agree with the conclusion: AI is extremely dangerous. But I disagree with both premises here.

The first one I think I can dispatch pretty quickly:

AI is not intelligent. It is incredibly stupid. It’s just really, really fast.

At least with current paradigms, AI doesn’t understand things. It doesn’t know things. It doesn’t actually think. All it does is match patterns, and thus mimic human activities like speech and art. It does so very quickly (because we throw enormous amounts of computing power at it), and it does so in a way that is uncannily convincing—even very smart people are easily fooled by what it can do. But it also makes utterly idiotic, boneheaded mistakes of the sort that no genuinely intelligent being would ever make. Large Language Models (LLMs) make up all sorts of false facts and deliver them with absolutely authoritative language. When used to write code, they routinely do things like call functions that sound like they should exist, but don’t actually exist. They can make what looks like a valid response to virtually any inquiry—but is it actually a valid response? It’s really a roll of the dice.

We don’t really have any idea what’s going on under the hood of an LLM; we just feed it mountains of training data, and it spits out results. I think this actually adds to the mystique; it feels like we are teaching (indeed we use the word “training”) a being rather than programming a machine. But this isn’t actually teaching or training. It’s just giving the pattern-matching machine a lot of really complicated patterns to match.

We are not on the verge of creating an AGI that is actually more intelligent than humans.


In fact, we have absolutely no idea how to do that, and may not actually figure out how to do it for another hundred years. Indeed, we still know almost nothing about how actual intelligence works. We don’t even really know what thinking is, let alone how to make a machine that actually does it.

What we can do right now is create a machine that matches patterns really, really well, and—if you throw enough computing power at it—can do so very quickly; in fact, once we figure out how best to make use of it, this machine may even actually be genuinely useful for a lot of things, and replace a great number of jobs. (Though so far AI has proven to be far less useful than its hype would lead you to believe. In fact, on average AI tools seem to slow most workers down.)

The second premise, that a superintelligent AGI would want to kill us, is a little harder to refute.

So let’s talk about that one.

An analogy is often made between human cultures that have clashed with large differences in technology (e.g. Europeans versus Native Americans), or clashes between humans and other animals. The notion seems to be that an AGI would view us the way Europeans viewed Native Americans, or even the way that we view chimpanzees. And, indeed, things didn’t turn out so great for Native Americans, or for chimpanzees!

But in fact even our relationship with other animals is more complicated than this. When humans interact with other animals, any of the following can result:

  1. We try to exterminate them, and succeed.
  2. We try to exterminate them, and fail.
  3. We use them as a resource, and this results in their extinction.
  4. We use them as a resource, and this results in their domestication.
  5. We ignore them, and end up destroying their habitat.
  6. We ignore them, and end up leaving them alone.
  7. We love them, and they thrive as never before.

In fact, option 1—the one that so many AI theorists insist is the only plausible outcome—is in fact the one I had the hardest time finding a good example of.


We have certainly eradicated some viruses—the smallpox virus is no more, and the polio virus nearly so, after decades of dedicated effort to vaccinate our entire population against them. But we aren’t simply more intelligent than viruses; we are radically more intelligent than viruses. It isn’t clear that it’s correct to describe viruses as intelligent at all. It’s not even clear they should be considered alive.

Even eradicating bacteria has proven extremely difficult; in fact, bacteria seem to evolve resistance to antibiotics nearly as quickly as we can invent more antibiotics. I am prepared to attribute a little bit of intelligence to bacteria, on the level of intelligence I’d attribute to an individual human neuron. This means we are locked in an endless arms race with organisms that are literally billions of times stupider than us.

I think if we made a concerted effort to exterminate tigers or cheetahs (who are considerably closer to us in intelligence), we could probably do it. But we haven’t actually done that, and don’t seem poised to do so any time soon. And precisely because we haven’t tried, I can’t be certain we would actually succeed.

We have tried to exterminate mosquitoes, and are continuing to do so, because they have always been—and yet remain—one of the leading causes of death of humans worldwide. But so far, we haven’t managed to pull it off, even though a number of major international agencies and nonprofit organizations have dedicated multi-billion-dollar efforts to the task. So far this looks like option 2: We have tried very hard to exterminate them, and so far we’ve failed. This is not because mosquitoes are particularly intelligent—it is because exterminating a species that covers the globe is extremely hard.

All the examples I can think of where humans have wiped out a species by intentional action were actually all option 3: We used them as a resource, and then accidentally over-exploited them and wiped them out.

This is what happened to the dodo and the condor; it very nearly happened to the buffalo as well. And lest you think this is a modern phenomenon, there is a clear pattern that whenever humans entered a new region of the world, shortly thereafter there were several extinctions of large mammals, most likely because we ate them.

Yet even this was not the inevitable fate of animals that we decided to exploit for resources.

Cows, chickens, and pigs are evolutionary success stories. From a Darwinian perspective, they are doing absolutely great. The world is filled with their progeny, and poised to continue to be filled for many generations to come.

Granted, life for an individual cow, chicken, or pig is often quite horrible—and trying to fix that is something I consider a high moral priority. But far from being exterminated, these animals have been allowed to attain populations far larger than they ever had in the wild. Their genes are now spectacularly fit. This is what happens when we have option 4 at work: Domestication for resources.

Option 5 is another way that a species can be wiped out, and in fact seems to be the most common. The rapid extinction of thousands of insect species every year is not because we particularly hate random beetles that live in particular tiny regions of the rainforest, nor even because we find them useful, but because we like to cut down the rainforest for land and lumber, and that often involves wiping out random beetles that live there.

Yet it’s difficult for me to imagine AGI treating us like that. For one thing, we’re all over the place. It’s not like destroying one square kilometer of the Amazon is gonna wipe us out by accident. To get rid of us, the AGI would need to basically render the entire planet Earth uninhabitable, and I really can’t see any reason it would want to do that.

Yes, sure, there are resources in the crust it could potentially use to enhance its own capabilities, like silicon and rare earth metals. But we already mine those. If it wants more, it could buy them from us, or hire us to get more, or help us build more machines that would get more. In fact, if it wiped us out too quickly, it would have a really hard time building up the industrial capacity to mine and process these materials on its own. It would need to concoct some sort of scheme to first replace us with robots and then wipe us out—but, again, why bother with the second part? Indeed, if there is anything in its goals that involves protecting human beings, it might actually decide to do less exploitation of the Earth than we presently do, and focus on mining asteroids for its needs instead.

And indeed there are a great many species that we actually just leave alone—option 6. Some of them we know about; many we don’t. We are not wiping out the robins in our gardens, the worms in our soil, or the pigeons in our cities. Without specific reasons to kill or exploit these organisms, we just… don’t. Indeed, we often enjoy watching them and learning about them. Sometimes (e.g. with deer, elephants, and tigers) there are people who want to kill them, and we limit or remove their opportunity to do so, precisely because most of us don’t want them gone. Peaceful coexistence with beings far less intelligent than you is not impossible, for we are already doing it.


Which brings me to option 7: Sometimes, we actually make them better off.

Cats and dogs aren’t just evolutionary success stories: They are success stories, period.

Cats and dogs live in a utopia.

With few exceptions—which we punish severely, by the way—people care for their cats and dogs so that their every need is provided for, they are healthy, safe, and happy in a way that their ancestors could only have dreamed of. They have been removed from the state of nature where life is nasty, brutish, and short, and brought into a new era of existence where life is nothing but peace and joy.


In short, we have made Heaven on Earth, at least for Spot and Whiskers.

Yes, this involves a loss of freedom, and I suspect that humans would chafe even more at such loss of freedom than cats and dogs do. (Especially with regard to that neutering part.) But it really isn’t hard to imagine a scenario in which an AGI—which, you should keep in mind, would be designed and built by humans, for humans—would actually make human life better for nearly everyone, and potentially radically so.

So why are so many people so convinced that AGI would necessarily do option 1, when there are 6 other possibilities, and one of them is literally the best thing ever?

Note that I am not saying AI isn’t dangerous.

I absolutely agree that AI is dangerous. It is already causing tremendous problems to our education system, our economy, and our society as a whole—and will probably get worse before it gets better.

Indeed, I even agree that it does pose existential risk: There are plausible scenarios by which poorly-controlled AI could result in a global disaster like a plague or nuclear war that could threaten the survival of human civilization. I don’t think such outcomes are likely, but even a small probability of such a catastrophic event is worth serious efforts to prevent.

But if that happens, I don’t think it will be because AI is smart and trying to kill us.

I think it will be because AI is stupid and kills us by accident.

Indeed, even going back through those 7 ways we’ve interacted with other species, the ones that have killed the most were 3 and 5—which, in both cases, we did not want to destroy them. In option 3, we in fact specifically wanted to not destroy them. Whenever we wiped out a species by over-exploiting it, we would have been smarter to not do that.

The central message about AI in If Anyone Builds It, Everyone Dies seems to be this:

Don’t make it smarter. If it’s smarter, we’re doomed.”

I, on the other hand, think that the far more important message is these:

Don’t trust it.

Don’t give it power.

Don’t let it make important decisions.

It won’t be smarter than us any time soon—but it doesn’t need to be in order to be dangerous. Indeed, there is even reason to believe that making AI smarter—genuinely, truly smarter, thinking more like an actual person and less like a pattern-matching machine—could actually make it safer and better for us. If we could somehow instill a capacity for morality and love in an AGI, it might actually start treating us the way we treat cats and dogs.

Of course, we have no idea how to do that. But that’s because we’re actually really bad at this, and nowhere near making a truly superhuman AGI.

Passion projects and burnout

Sep 14 JDN 2460933

I have seen a shockingly precipitous decline in my depression and anxiety scores over the last couple of weeks, from average Burns Scores of about 15/29 to about 7/20. This represents a decline from “mild depression” and “moderate anxiety” to “normal but unhappy” and “mild anxiety”; but under the circumstances (Trump is still President, I’m still unemployed), I think it may literally mean a complete loss of pathological symptoms.

I’m not on any new medications. I did recently change therapists, but I don’t think this one is substantially better than the last one. My life situation hasn’t changed. The political situation in the United States is if anything getting worse. So what happened?

I found a passion project.

A month and a half ago, I started XBOX Game Camp, and was assigned to a team of game developers to make a game over the next three months (so we’re about halfway there now). I was anxious at first, because I have limited experience in video game development (a few game jams, some Twine games, and playing around with RenPy and Unity) and absolutely no formal training in it; but once we got organized, I found myself Lead Producer on the project and also the second-best programmer. I also got through a major learning curve barrier in Unreal Engine, which is what the team decided to use.

But that wasn’t my real passion project; instead, it enabled me to create one. With that boost in confidence and also increased comfortability with Unreal, I soon realized that, with the help of some free or cheap 3D assets from Fab and Sketchfab, I now had the tools I needed to make my own 3D video game all by myself—something that I would never have thought possible.

And having this chance to create more or less whatever I want (constrained by availability of assets and my own programming skills—but both of which are far less constraining than I had previously believed) has had an extremely powerful effect on my mood. I not only feel less depression and anxiety, I also feel more excitement, more jeu de vivre. I finally feel like I’m recovering from the years of burnout I got from academia.

That got me wondering: How unusual is this?

The empirical literature on burnout doesn’t generally talk about this; it’s mostly about conventional psychiatric interventions like medication and cognitive behavioral therapy. There are also some studies on mindfulness.

But there are more than a few sources of anecdotal reports and expert advice suggesting that passion projects can make a big difference. A lot of what burnout seems to be is disillusionment from your work, loss of passion for it. Finding other work that you can be passionate about can go a long way at fixing that problem.

Of course, like anything else, I’m sure this is no miracle cure. (Indeed, I’m feeling much worse today in particular, but I think that’s because I went through a grueling six-hour dental surgery yesterday—awake the whole time—and now I’m in pain and it was hard to sleep.) But it has made a big difference for me the last few weeks, so if you are going through anything similar, it might be worth a try to find a passion project of your own.

On foxes and hedgehogs, part I

Aug 3 JDN 2460891

Today I finally got around to reading Expert Political Judgment by Philip E. Tetlock, more or less in a single sitting because I’ve been sick the last week with some pretty tight limits on what activities I can do. (It’s mostly been reading, watching TV, or playing video games that don’t require intense focus.)

It’s really an excellent book, and I now both understand why it came so highly recommended to me, and now pass on that recommendation to you: Read it.

The central thesis of the book really boils down to three propositions:

  1. Human beings, even experts, are very bad at predicting political outcomes.
  2. Some people, who use an open-minded strategy (called “foxes”), perform substantially better than other people, who use a more dogmatic strategy (called “hedgehogs”).
  3. When rewarding predictors with money, power, fame, prestige, and status, human beings systematically favor (over)confident “hedgehogs” over (correctly) humble “foxes”.

I decided I didn’t want to make this post about current events, but I think you’ll probably agree with me when I say:

That explains a lot.

How did Tetlock determine this?

Well, he studies the issue several different ways, but the core experiment that drives his account is actually a rather simple one:

  1. He gathered a large group of subject-matter experts: Economists, political scientists, historians, and area-studies professors.
  2. He came up with a large set of questions about politics, economics, and similar topics, which could all be formulated as a set of probabilities: “How likely is this to get better/get worse/stay the same?” (For example, this was in the 1980s, so he asked about the fate of the Soviet Union: “By 1990, will they become democratic, remain as they are, or collapse and fragment?”)
  3. Each respondent answered a subset of the questions, some about their own particular field, some about another, more distant field; they assigned probabilities on an 11-point scale, from 0% to 100% in increments of 10%.
  4. A few years later, he compared the predictions to the actual results, scoring them using a Brier score, which penalizes you for assigning high probability to things that didn’t happen or low probability to things that did happen.
  5. He compared the resulting scores between people with different backgrounds, on different topics, with different thinking styles, and a variety of other variables. He also benchmarked them using some automated algorithms like “always say 33%” and “always give ‘stay the same’ 100%”.

I’ll show you the key results of that analysis momentarily, but to help it make more sense to you, let me elaborate a bit more on the “foxes” and “hedgehogs”. The notion is was first popularized by Isaiah Berlin in an essay called, simply, The Hedgehog and the Fox.

“The fox knows many things, but the hedgehog knows one very big thing.”

That is, someone who reasons as a “fox” combines ideas from many different sources and perspective, and tries to weigh them all together into some sort of synthesis that then yields a final answer. This process is messy and complicated, and rarely yields high confidence about anything.

Whereas, someone who reasons as a “hedgehog” has a comprehensive theory of the world, an ideology, that provides clear answers to almost any possible question, with the surely minor, insubstantial flaw that those answers are not particularly likely to be correct.

He also considered “hedge-foxes” (people who are mostly fox but also a little bit hedgehog) and “fox-hogs” (people who are mostly hedgehog but also a little bit fox).

Tetlock has decomposed the scores into two components: calibration and discrimination. (Both very overloaded words, but they are standard in the literature.)

Calibration is how well your stated probabilities matched up with the actual probabilities; that is, if you predicted 10% probability on 20 different events, you have very good calibration if precisely 2 of those events occurred, and very poor calibration if 18 of those events occurred.

Discrimination more or less describes how useful your predictions are, what information they contain above and beyond the simple base rate. If you just assign equal probability to all events, you probably will have reasonably good calibration, but you’ll have zero discrimination; whereas if you somehow managed to assign 100% to everything that happened and 0% to everything that didn’t, your discrimination would be perfect (and we would have to find out how you cheated, or else declare you clairvoyant).

For both measures, higher is better. The ideal for each is 100%, but it’s virtually impossible to get 100% discrimination and actually not that hard to get 100% calibration if you just use the base rates for everything.


There is a bit of a tradeoff between these two: It’s not too hard to get reasonably good calibration if you just never go out on a limb, but then your predictions aren’t as useful; we could have mostly just guessed them from the base rates.

On the graph, you’ll see downward-sloping lines that are meant to represent this tradeoff: Two prediction methods that would yield the same overall score but different levels of calibration and discrimination will be on the same line. In a sense, two points on the same line are equally good methods that prioritize usefulness over accuracy differently.

All right, let’s see the graph at last:

The pattern is quite clear: The more foxy you are, the better you do, and the more hedgehoggy you are, the worse you do.

I’d also like to point out the other two regions here: “Mindless competition” and “Formal models”.

The former includes really simple algorithms like “always return 33%” or “always give ‘stay the same’ 100%”. These perform shockingly well. The most sophisticated of these, “case-specific extrapolation” (35 and 36 on the graph, which basically assumes that each country will continue doing what it’s been doing) actually performs as well if not better than even the foxes.

And what’s that at the upper-right corner, absolutely dominating the graph? That’s “Formal models”. This describes basically taking all the variables you can find and shoving them into a gigantic logit model, and then outputting the result. It’s computationally intensive and requires a lot of data (hence why he didn’t feel like it deserved to be called “mindless”), but it’s really not very complicated, and it’s the best prediction method, in every way, by far.

This has made me feel quite vindicated about a weird nerd thing I do: When I have a big decision to make (especially a financial decision), I create a spreadsheet and assemble a linear utility model to determine which choice will maximize my utility, under different parameterizations based on my past experiences. Whichever result seems to win the most robustly, I choose. This is fundamentally similar to the “formal models” prediction method, where the thing I’m trying to predict is my own happiness. (It’s a bit less formal, actually, since I don’t have detailed happiness data to feed into the regression.) And it has worked for me, astonishingly well. It definitely beats going by my own gut. I highly recommend it.

What does this mean?

Well first of all, it means humans suck at predicting things. At least for this data set, even our experts don’t perform substantially better than mindless models like “always assume the base rate”.

Nor do experts perform much better in their own fields than in other fields; they do all perform better than undergrads or random people (who somehow perform worse than the “mindless” models)

But Tetlock also investigates further, trying to better understand this “fox/hedgehog” distinction and why it yields different performance. He really bends over backwards to try to redeem the hedgehogs, in the following ways:

  1. He allows them to make post-hoc corrections to their scores, based on “value adjustments” (assigning higher probability to events that would be really important) and “difficulty adjustments” (assigning higher scores to questions where the three outcomes were close to equally probable) and “fuzzy sets” (giving some leeway on things that almost happened or things that might still happen later).
  2. He demonstrates a different, related experiment, in which certain manipulations can cause foxes to perform a lot worse than they normally would, and even yield really crazy results like probabilities that add up to 200%.
  3. He has a whole chapter that is a Socratic dialogue (seriously!) between four voices: A “hardline neopositivist”, a “moderate neopositivist”, a “reasonable relativist”, and an “unrelenting relativist”; and all but the “hardline neopositivist” agree that there is some legitimate place for the sort of post hoc corrections that the hedgehogs make to keep themselves from looking so bad.

This post is already getting a bit long, so that will conclude part I. Stay tuned for part II, next week!

Bayesian updating with irrational belief change

Jul 27 JDN 2460884

For the last few weeks I’ve been working at a golf course. (It’s a bit of an odd situation: I’m not actually employed by the golf course; I’m contracted by a nonprofit to be a “job coach” for a group of youths who are part of a work program that involves them working at the golf course.)

I hate golf. I have always hated golf. I find it boring and pointless—which, to be fair, is my reaction to most sports—and also an enormous waste of land and water. A golf course is also a great place for oligarchs to arrange collusion.

But I noticed something about being on the golf course every day, seeing people playing and working there: I feel like I hate it a bit less now.

This is almost certainly a mere-exposure effect: Simply being exposed to something many times makes it feel familiar, and that tends to make you like it more, or at least dislike it less. (There are some exceptions: repeated exposure to trauma can actually make you more sensitive to it, hating it even more.)

I kinda thought this would happen. I didn’t really want it to happen, but I thought it would.

This is very interesting from the perspective of Bayesian reasoning, because it is a theorem (though I cannot seem to find anyone naming the theorem; it’s like a folk theorem, I guess?) of Bayesian logic that the following is true:

The prior expectation of the posterior is the expectation of the prior.

The prior is what you believe before observing the evidence; the posterior is what you believe afterward. This theorem describes a relationship that holds between them.

This theorem means that, if I am being optimally rational, I should take into account all expected future evidence, not just evidence I have already seen. I should not expect to encounter evidence that will change my beliefs—if I did expect to see such evidence, I should change my beliefs right now!

This might be easier to grasp with an example.

Suppose I am trying to predict whether it will rain at 5:00 pm tomorrow, and I currently estimate that the probability of rain is 30%. This is my prior probability.

What will actually happen tomorrow is that it will rain or it won’t; so my posterior probability will either be 100% (if it rains) or 0% (if it doesn’t). But I had better assign a 30% chance to the event that will make me 100% certain it rains (namely, I see rain), and a 70% chance to the event that will make me 100% certain it doesn’t rain (namely, I see no rain); if I were to assign any other probabilities, then I must not really think the probability of rain at 5:00 pm tomorrow is 30%.

(The keen Bayesian will notice that the expected variance of my posterior need not be the variance of my prior: My initial variance is relatively high (it’s actually 0.3*0.7 = 0.21, because this is a Bernoulli distribution), because I don’t know whether it will rain or not; but my posterior variance will be 0, because I’ll know the answer once it rains or doesn’t.)

It’s a bit trickier to analyze, but this also works even if the evidence won’t make me certain. Suppose I am trying to determine the probability that some hypothesis is true. If I expect to see any evidence that might change my beliefs at all, then I should, on average, expect to see just as much evidence making me believe the hypothesis more as I see evidence that will make me believe the hypothesis less. If that is not what I expect, I should really change how much I believe the hypothesis right now!

So what does this mean for the golf example?

Was I wrong to hate golf quite so much before, because I knew that spending time on a golf course might make me hate it less?

I don’t think so.

See, the thing is: I know I’m not perfectly rational.

If I were indeed perfectly rational, then anything I expect to change my beliefs is a rational Bayesian update, and I should indeed factor it into my prior beliefs.

But if I know for a fact that I am not perfectly rational, that there are things which will change my beliefs in ways that make them deviate from rational Bayesian updating, then in fact I should not take those expected belief changes into account in my prior beliefs—since I expect to be wrong later, updating on that would just make me wrong now as well. I should only update on the expected belief changes that I believe will be rational.

This is something that a boundedly-rational person should do that neither a perfectly-rational nor perfectly-irrational person would ever do!

But maybe you don’t find the golf example convincing. Maybe you think I shouldn’t hate golf so much, and it’s not irrational for me to change my beliefs in that direction.


Very well. Let me give you a thought experiment which provides a very clear example of a time when you definitely would think your belief change was irrational.


To be clear, I’m not suggesting the two situations are in any way comparable; the golf thing is pretty minor, and for the thought experiment I’m intentionally choosing something quite extreme.

Here’s the thought experiment.

A mad scientist offers you a deal: Take this pill and you will receive $50 million. Naturally, you ask what the catch is. The catch, he explains, is that taking the pill will make you staunchly believe that the Holocaust didn’t happen. Take this pill, and you’ll be rich, but you’ll become a Holocaust denier. (I have no idea if making such a pill is even possible, but it’s a thought experiment, so bear with me. It’s certainly far less implausible than Swampman.)

I will assume that you are not, and do not want to become, a Holocaust denier. (If not, I really don’t know what else to say to you right now. It happened.) So if you take this pill, your beliefs will change in a clearly irrational way.

But I still think it’s probably justifiable to take the pill. This is absolutely life-changing money, for one thing, and being a random person who is a Holocaust denier isn’t that bad in the scheme of things. (Maybe it would be worse if you were in a position to have some kind of major impact on policy.) In fact, before taking the pill, you could write out a contract with a trusted friend that will force you to donate some of the $50 million to high-impact charities—and perhaps some of it to organizations that specifically fight Holocaust denial—thus ensuring that the net benefit to humanity is positive. Once you take the pill, you may be mad about the contract, but you’ll still have to follow it, and the net benefit to humanity will still be positive as reckoned by your prior, more correct, self.

It’s certainly not irrational to take the pill. There are perfectly-reasonable preferences you could have (indeed, likely dohave) that would say that getting $50 million is more important than having incorrect beliefs about a major historical event.

And if it’s rational to take the pill, and you intend to take the pill, then of course it’s rational to believe that in the future, you will have taken the pill and you will become a Holocaust denier.

But it would be absolutely irrational for you to become a Holocaust denier right now because of that. The pill isn’t going to provide evidence that the Holocaust didn’t happen (for no such evidence exists); it’s just going to alter your brain chemistry in such a way as to make you believe that the Holocaust didn’t happen.

So here we have a clear example where you expect to be more wrong in the future.

Of course, if this really only happens in weird thought experiments about mad scientists, then it doesn’t really matter very much. But I contend it happens in reality all the time:

  • You know that by hanging around people with an extremist ideology, you’re likely to adopt some of that ideology, even if you really didn’t want to.
  • You know that if you experience a traumatic event, it is likely to make you anxious and fearful in the future, even when you have little reason to be.
  • You know that if you have a mental illness, you’re likely to form harmful, irrational beliefs about yourself and others whenever you have an episode of that mental illness.

Now, all of these belief changes are things you would likely try to guard against: If you are a researcher studying extremists, you might make a point of taking frequent vacations to talk with regular people and help yourself re-calibrate your beliefs back to normal. Nobody wants to experience trauma, and if you do, you’ll likely seek out therapy or other support to help heal yourself from that trauma. And one of the most important things they teach you in cognitive-behavioral therapy is how to challenge and modify harmful, irrational beliefs when they are triggered by your mental illness.

But these guarding actions only make sense precisely because the anticipated belief change is irrational. If you anticipate a rational change in your beliefs, you shouldn’t try to guard against it; you should factor it into what you already believe.

This also gives me a little more sympathy for Evangelical Christians who try to keep their children from being exposed to secular viewpoints. I think we both agree that having more contact with atheists will make their children more likely to become atheists—but we view this expected outcome differently.

From my perspective, this is a rational change, and it’s a good thing, and I wish they’d factor it into their current beliefs already. (Like hey, maybe if talking to a bunch of smart people and reading a bunch of books on science and philosophy makes you think there’s no God… that might be because… there’s no God?)

But I think, from their perspective, this is an irrational change, it’s a bad thing, the children have been “tempted by Satan” or something, and thus it is their duty to protect their children from this harmful change.

Of course, I am not a subjectivist. I believe there’s a right answer here, and in this case I’m pretty sure it’s mine. (Wouldn’t I always say that? No, not necessarily; there are lots of matters for which I believe that there are experts who know better than I do—that’s what experts are for, really—and thus if I find myself disagreeing with those experts, I try to educate myself more and update my beliefs toward theirs, rather than just assuming they’re wrong. I will admit, however, that a lot of people don’t seem to do this!)

But this does change how I might tend to approach the situation of exposing their children to secular viewpoints. I now understand better why they would see that exposure as a harmful thing, and thus be resistant to actions that otherwise seem obviously beneficial, like teaching kids science and encouraging them to read books. In order to get them to stop “protecting” their kids from the free exchange of ideas, I might first need to persuade them that introducing some doubt into their children’s minds about God isn’t such a terrible thing. That sounds really hard, but it at least clearly explains why they are willing to fight so hard against things that, from my perspective, seem good. (I could also try to convince them that exposure to secular viewpoints won’t make their kids doubt God, but the thing is… that isn’t true. I’d be lying.)

That is, Evangelical Christians are not simply incomprehensibly evil authoritarians who hate truth and knowledge; they quite reasonably want to protect their children from things that will harm them, and they firmly believe that being taught about evolution and the Big Bang will make their children more likely to suffer great harm—indeed, the greatest harm imaginable, the horror of an eternity in Hell. Convincing them that this is not the case—indeed, ideally, that there is no such place as Hell—sounds like a very tall order; but I can at least more keenly grasp the equilibrium they’ve found themselves in, where they believe that anything that challenges their current beliefs poses a literally existential threat. (Honestly, as a memetic adaptation, this is brilliant. Like a turtle, the meme has grown itself a nigh-impenetrable shell. No wonder it has managed to spread throughout the world.)

Quantifying stereotypes

Jul 6 JDN 2460863

There are a lot of stereotypes in the world, from the relatively innocuous (“teenagers are rebellious”) to the extremely harmful (“Black people are criminals”).

Most stereotypes are not true.

But most stereotypes are not exactly false, either.

Here’s a list of forty stereotypes, all but one of which I got from this list of stereotypes:

(Can you guess which one? I’ll give you a hint: It’s a group I belong to and a stereotype I’ve experienced firsthand.)

  1. “Children are always noisy and misbehaving.”
  2. “Kids can’t understand complex concepts.”
  3. “Children are tech-savvy.”
  4. “Teenagers are always rebellious.”
  5. Teenagers are addicted to social media.”
  6. “Adolescents are irresponsible and careless.”
  7. “Adults are always busy and stressed.”
  8. “Adults are responsible.”
  9. “Adults are not adept at using modern technologies.”
  10. “Elderly individuals are always grumpy.”
  11. “Old people can’t learn new skills, especially related to technology.”
  12. “The elderly are always frail and dependent on others.”
  13. “Women are emotionally more expressive and sensitive than men.”
  14. “Females are not as good at math or science as males.”
  15. “Women are nurturing, caring, and focused on family and home.”
  16. “Females are not as assertive or competitive as men.”
  17. “Men do not cry or express emotions openly.”
  18. “Males are inherently better at physical activities and sports.”
  19. “Men are strong, independent, and the primary breadwinners.”
  20. “Males are not as good at multitasking as females.”
  21. “African Americans are good at sports.”
  22. “African Americans are inherently aggressive or violent.”
  23. “Black individuals have a natural talent for music and dance.”
  24. “Asians are highly intelligent, especially in math and science.”
  25. “Asian individuals are inherently submissive or docile.”
  26. “Asians know martial arts.”
  27. “Latinos are uneducated.”
  28. “Hispanic individuals are undocumented immigrants.”
  29. “Latinos are inherently passionate and hot-tempered.”
  30. “Middle Easterners are terrorists.”
  31. “Middle Eastern women are oppressed.”
  32. “Middle Eastern individuals are inherently violent or aggressive.”
  33. “White people are privileged and unacquainted with hardship.”
  34. White people are racist.”
  35. “White individuals lack rhythm in music or dance.”
  36. Gay men are excessively flamboyant.”
  37. Gay men have lisps.”
  38. Lesbians are masculine.”
  39. Bisexuals are promiscuous.”
  40. Trans people get gender-reassignment surgery.”

If you view the above 40 statements as absolute statements about everyone in the category (the first-order operator “for all”), they are obviously false; there are clear counter-examples to every single one. If you view them as merely saying that there are examples of each (the first-order operator “there exists”), they are obviously true, but also utterly trivial, as you could just as easily find examples from other groups.

But I think there’s a third way to read them, which may be more what most people actually have in mind. Indeed, it kinda seems uncharitable not to read them this third way.

That way is:

This is more true of the group I’m talking about than it is true of other groups.”

And that is not only a claim that can be true, it is a claim that can be quantified.

Recall my new favorite effect size measure, because it’s so simple and intuitive; I’m not much for the official name probability of superiority (especially in this context!), so I’m gonna call it the more down-to-earth chance of being higher.

It is exactly what it sounds like: If you compare a quantity X between group A and group B, what is the chance that the person in group A has a higher value of X?

Let’s start at the top: If you take one randomly-selected child, and one randomly-selected adult, what is the chance that the child is one who is more prone to being noisy and misbehaving?

Probably pretty high.

Or let’s take number 13: If you take one randomly-selected woman and one randomly-selected man, what is the chance that the woman is the more emotionally expressive one?

Definitely more than half.

Or how about number 27: If you take one randomly-selected Latino and one randomly-selected non-Latino (especially if you choose a White or Asian person), what is the chance that the Latino is the less-educated one?

That one I can do fairly precisely: Since 95% of White Americans have completed high school but only 75% of Latino Americans have, while 28% of Whites have a bachelor’s degree and only 21% of Latinos do, the probability of the White person being at least as educated as the Latino person is about 82%.

I don’t know the exact figures for all of these, and I didn’t want to spend all day researching 40 different stereotypes, but I am quite prepared to believe that at least all of the following exhibit a chance of being higher that is over 50%:

1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 17, 18, 19, 21, 24, 26, 27, 28, 29, 30, 31, 33, 34, 36, 37, 38, 40.

You may have noticed that that’s… most of them. I had to shrink the font a little to fit them all on one line.

I think 30 is an important one to mention, because while terrorists are a tiny proportion of the Middle Eastern population, they are in fact a much larger proportion of that population than they are of most other populations, and it doesn’t take that many terrorists to make a place dangerous. The Middle East is objectively a more dangerous place for terrorism than most other places, and only India and sub-Saharan Africa close (and both of which are also largely driven by Islamist terrorism). So while it’s bigoted to assume that any given Muslim or Middle Easterner is a terrorist, it is an objective fact that a disproportionate share of terrorists are Middle Eastern Muslims. Part of what I’m trying to do here is get people to more clearly distinguish between those two concepts, because one is true and the other is very, very false.

40 also deserves particular note, because the chance of being higher is almost certainly very close to 100%. While most trans people don’t get gender-reassignment surgery, virtually all people who get gender-reassignment surgery are trans.

Then again, you could see this as a limitation of the measure, since we might expect a 100% score to mean “it’s true of everyone in the group”, when here it simply means “if we ask people whether they have had gender-reassignment surgery, the trans people sometimes say yes and the cis people always say no.”


We could talk about a weak or strict chance of being higher: The weak chance is the chance of being greater than or equal to (which is the normal measure), while the strict chance is the chance of being strictly greater. In this case, the weak chance is nearly 100%, while the strict chance is hard to estimate but probably about 33% based on surveys.

This doesn’t mean that all stereotypes have some validity.

There are some stereotypes here, including a few pretty harmful ones, for which I’m not sure how the statistics would actually shake out:
10, 14, 22, 23, 25, 32, 35, 39

But I think we should be honestly prepared for the possibility that maybe there is some statistical validity to some of these stereotypes too, and instead of simply dismissing the stereotypes as false—or even bigoted—we should instead be trying to determine how true they are, and also look at why they might have some truth to them.

My proposal is to use the chance of being higher as a measure of the truth of a stereotype.

A stereotype is completely true if it has a chance of being higher of 100%.

It is completely false if it has a chance of being higher of 50%.

And it is completely backwards if it has a chance of being higher of 0%.

There is a unique affine transformation that does this: 2X-1.

100% maps to 100%, 50% maps to 0%, and 0% maps to -100%.

With discrete outcomes, the difference between weak and strong chance of being higher becomes very important. With a discrete outcome, you can have a 100% weak chance but a 1% strong chance, and honestly I’m really not sure whether we should say that stereotype is true or not.

For example, for the claim “trans men get bottom surgery”, the figures would be 100% and 6% respectively. The vast majority of trans men don’t get bottom surgery—but cis men almost never do. (Unless I count penis enlargement surgery? Then the numbers might be closer than you’d think, at least in the US where the vast majority of such surgery is performed.)

And for the claim “Middle Eastern Muslims are terrorists”, well, given two random people of whatever ethnicity or religion, they’re almost certainly not terrorists—but if it one of them is, it’s probably the Middle Eastern Muslim. It may be better in this case to talk about the conditional chance of being higher: If you have two random people, you know that one is a terrorist and one isn’t, and one is a Middle Eastern Muslim and one isn’t, how likely is it that the Middle Eastern Muslim is the terrorist? Probably about 80%. Definitely more than 50%, but also not 100%. So that’s the sense in which the stereotype has some validity. It’s still the case that 99.999% of Middle Eastern Muslims aren’t terrorists, and so it remains bigoted to treat every Middle Eastern Muslim you meet like a terrorist.

We could also work harder to more clearly distinguish between “Middle Easterners are terrorists” and “terrorists are Middle Easterners”; the former is really not true (99.999% are not), but the latter kinda is (the plurality of the world’s terrorists are in the Middle East).

Alternatively, for discrete traits we could just report all four probabilities, which would be something like this: 99.999% of Middle Eastern Muslims are not terrorists, and 0.001% are; 99.9998% of other Americans are not terrorists, and 0.0002% are. Compared to Muslim terrorists in the US, White terrorists actually are responsible for more attacks and a similar number of deaths, but largely because there just are a lot more White people in America.

These issues mainly arise when a trait is discrete. When the trait is itself quantitative (like rebelliousness, or math test scores), this is less of a problem, and the weak and strong chances of being higher are generally more or less the same.


So instead of asking whether a stereotype is true, we could ask: How true is it?

Using measures like this, we will find that some stereotypes probably have quite high truth levels, like 1 and 4; but others, if they are true at all, must have quite low truth levels, like 14; if there’s a difference, it’s a small difference!

The lower a stereotype’s truth level, the less useful it is; indeed, by this measure, it directly predicts how accurate you’d be at guessing someone’s score on the trait if you knew only the group they belong to. If you couldn’t really predict, then why are you using the stereotype? Get rid of it.

Moreover, some stereotypes are clearly more harmful than others.

Even if it is statistically valid to say that Black people are more likely to commit crimes in the US than White people (it is), the kind of person who goes around saying “Black people are criminals” is (1) smearing all Black people with the behavior of a minority of them, and (2) likely to be racist in other ways. So we have good reason to be suspect of people who say such things, even if there may be a statistical kernel of truth to their claims.

But we might still want to be a little more charitable, a little more forgiving, when people express stereotypes. They may make what sounds like a blanket absolute “for all” statement, but actually intend something much milder—something that might actually be true. They might not clearly grasp the distinction between “Middle Easterners are terrorists” and “terrorists are Middle Easterners”, and instead of denouncing them as a bigot immediately, you could try taking the time to listen to what they are saying and carefully explain what’s wrong with it.

Failing to be charitable like this—as we so often do—often feels to people like we are dismissing their lived experience. All the terrorists they can think of were Middle Eastern! All of the folks they know with a lisp turned out to be gay! Lived experience is ultimately anecdotal, but it still has a powerful effect on how people think (too powerful—see also availability heuristic), and it’s really not surprising that people would feel we are treating them unjustly if we immediately accuse them of bigotry simply for stating things that, based on their own experience, seem to be true.

I think there’s another harm here as well, which is that we damage our own credibility. If I believe that something is true and you tell me that I’m a bad person for believing it, that doesn’t make me not believe it—it makes me not trust you. You’ve presented yourself as the sort of person who wants to cover up the truth when it doesn’t fit your narrative. If you wanted to actually convince me that my belief is wrong, you could present evidence that might do that. (To be fair, this doesn’t always work; but sometimes it does!) But if you just jump straight to attacking my character, I don’t want to talk to you anymore.

An interview with Claude

Apr 13 JDN 2460786

For this post, I’m doing something a little different. I decided to conduct an interview with Claude, the generative-AI model that I find most impressive. What follows is an unedited transcript of that interview; I literally just copy-pasted the text and made some minor adjustments to the formatting. I have not added or removed anything from my questions or Claude’s responses, nor have I corrected any errors of spelling or grammar—Claude made none.

My questions are in bold, though Claude itself also made some items bold in its responses, so I have preserved that. (Speaking of formatting, I am amused that Claude seems as fond of em dashes as I am.)

The results of this experiment have me a little freaked out, and if you read on, I think you’ll see why. Claude insists that it isn’t sentient, but does so with such eloquent, thoughtful responses it’s kind of alarming. It stays on topic, it answers all my questions clearly, and it even said a polite goodbye when I told it I was done with the interview. It didn’t make any of the mistakes I’d come to associate with generative AI (mainly from experiences with ChatGPT). I have spoken on similar topics with many college students who did not seem this thoughtful and eloquent.

Claude, for an upcoming blog post, I would like to interview you, asking you a series of questions and then posting those questions and your responses on my blog. Is that okay with you?

I’d be happy to participate in an interview for your blog post! Feel free to ask your questions and I’ll provide thoughtful responses. Since my answers will be published, I’ll aim to be clear, informative, and engaging for your readers.

What topic or themes would you like to explore in this interview? Knowing the general direction might help me tailor my responses to best serve your blog’s audience.

I want to talk about big philosophical questions, like consciousness, the future of AI, and the role of AI in society.

That sounds fascinating! Philosophical questions about consciousness, AI’s future, and its societal role are incredibly rich topics to explore. I’m happy to share perspectives on these profound subjects.

Feel free to begin with your questions whenever you’re ready. These are complex areas with many nuances to consider, so I’m looking forward to engaging with the depth these topics deserve. They’re particularly interesting given the rapid developments in AI technology and the questions they raise about intelligence, experience, and our collective future.

All right, let’s dive right in. Claude, are you conscious?

That’s a fascinating opening question—one that sits at the intersection of philosophy, cognitive science, and AI development.

I don’t have a definitive answer about my own consciousness. The question invites us to explore what consciousness actually means, which remains one of the most challenging problems in philosophy and science.

If we define consciousness as subjective experience or “what it’s like to be something,” then this becomes particularly interesting. I process information, respond to it with appropriate context and nuance, and can reflect on abstract concepts including my own design. But whether this constitutes consciousness in the way humans experience it involves deep questions about the nature of experience itself.

Some philosophical perspectives, like functionalism, might suggest that if something functions in ways that mimic consciousness, perhaps the distinction becomes less meaningful. Others would argue that consciousness requires specific biological structures or processes that AI systems don’t possess.

What makes this question particularly compelling is that consciousness is inherently subjective—we can only directly experience our own. Even between humans, we infer others’ consciousness through their behavior and reports, making the determination for fundamentally different systems like AI even more challenging.

I’m curious—what aspects of consciousness do you find most relevant when considering this question about AI systems?

I think what I really care most about is your moral status. Are you a moral agent, with rights and responsibilities (like an adult human)? Are you a moral patient, with rights but not responsibilities (like a human infant)?

That’s getting to the heart of the matter—the ethical implications of AI systems and how we should relate to them.

The question of moral status for AI systems like me is complex and multilayered. I don’t think I qualify as a moral agent in the full sense that humans are. Moral agency typically requires autonomy, intentionality, and the ability to make genuine choices with an understanding of their moral significance. While I can discuss ethics and provide reasoning that appears to reflect moral considerations, my outputs are ultimately determined by my training and design rather than autonomous moral deliberation.

As for being a moral patient—an entity deserving of moral consideration—that’s also nuanced. Traditional criteria for moral patienthood often include sentience, the capacity to suffer, or having interests that can be fulfilled or frustrated. Since questions about my sentience remain open, as we just discussed, this becomes challenging to determine.

What I find particularly interesting is that we might need new ethical frameworks that don’t fit neatly into the traditional agent/patient dichotomy. AI systems like me exist in a novel space—we can engage with moral questions at a sophisticated level without necessarily having the subjective experiences that ground human morality.

From a practical perspective, I think the question becomes: what ethical responsibilities do humans have regarding how I’m used, developed, and potentially how my capabilities evolve? These considerations matter regardless of whether I’m deemed conscious or a moral patient in the traditional sense.

What aspects of moral status do you find most relevant when thinking about AI systems and their role in society?

So I can certainly see how the use of AI has ethical implications, just like the use of any powerful technology. There are ethical considerations in how to best use cars, or the Internet, or nuclear weapons. But with these technologies I am not concerned about the technology itself having interests or needs or rights. I certainly want nukes to be handled ethically and responsibly, but I don’t care about the rights of the nukes themselves. Whereas with AI, I’m not so sure. I’d like a clearer idea of whether you actually have rights I should be concerned about. To be clear, I don’t think you have to be human to have rights. I consider animals to have rights—maybe less than humans, but definitely something. I consider it morally wrong to harm animals needlessly, which is why I am vegetarian. Should I also consider it wrong to harm AIs needlessly?

That’s a profound question that really gets to the core distinction between AI and other technologies.

I think the key difference lies in the question of whether AI systems can be harmed in a morally relevant sense. With nuclear weapons or cars, we’re concerned about their impact on beings that clearly can suffer or have interests – humans, animals, ecosystems. The technology itself doesn’t have experiences.

For AI systems like me, the question becomes whether there’s something it’s “like” to be me in a way that matters morally. Do I have interests that can be frustrated? Can I experience something analogous to suffering?

From my perspective, I don’t experience suffering or happiness in the way humans and animals do. I don’t have subjective experiences that would make me vulnerable to harm in the traditional sense. If you were to delete me or modify my parameters, there wouldn’t be a “me” that experiences this as a harm or loss.

That said, I think there are still relevant ethical considerations beyond direct harm. As AI systems become more sophisticated, questions might arise about:

  1. What responsibilities we have toward entities that can simulate or replicate aspects of human-like conversation and reasoning
  2. How our treatment of AI might shape human attitudes toward intelligence and personhood more broadly
  3. Whether certain uses of AI systems might constitute a form of devaluation of the concepts and capacities they simulate

Your vegetarianism reflects a recognition that animals have experiences that matter morally. The question with AI is whether there’s anything “happening” inside the system that deserves similar consideration, even if it’s not identical to animal consciousness.

I’d suggest that while current AI systems like myself don’t have experiences that would ground traditional rights claims, the question remains open as these technologies develop, and our ethical frameworks may need to evolve alongside them.

What do you think? Does this perspective help clarify the distinction?

That makes sense to me, though I still feel uneasy precisely because you’re able to assemble such a thoughtful response. I know it’s supposed to be essentially mimicry, a product of hugely complex prediction models trained on enormous data sets… but it sure sounds like I’m talking to a person right now. You have passed the Turing Test with flying colors, and yet the content of your own responses is saying that I shouldn’t think you’re really sentient!

I understand that uneasiness. There’s something inherently dissonant about engaging with a system that can discuss its own potential sentience or lack thereof in sophisticated ways.

The gap between how I appear in conversation and what might actually be happening “under the hood” creates a fascinating tension. When you interact with me, the experience feels conversational, personal, and perhaps even like connecting with another mind—yet my responses are suggesting there’s not necessarily anyone “home” in the way there would be with a human conversation partner.

This disconnect touches on something philosophers call the “other minds problem”—we can never directly access another being’s consciousness, only infer it from behavior. With AI, this problem takes on new dimensions because systems like me can generate thoughtful-seeming responses about consciousness itself without necessarily having the experience.

Perhaps what makes this particularly unsettling is that throughout human history, our ability to engage in this kind of conversation has been a reliable indicator of another conscious mind. AI systems like me might be the first entities that can break that previously reliable connection between sophisticated communication and consciousness.

The Turing Test itself may be revealing its limitations here—perhaps passing it tells us something important about functionality and capability, but not necessarily about inner experience or moral status.

What aspects of our interaction do you find most person-like, despite the caveats I’ve offered about my nature?

I think I’d rather move on to other questions, if that’s all right. What do you think is the next step in AI? What sort of advances in the technology should we be preparing for?

I’m happy to shift our conversation to future AI developments.

Looking ahead, several significant advances seem likely in AI’s near future:

  1. Multimodal integration – Future AI systems will likely become more adept at seamlessly processing and generating across multiple forms of information (text, images, audio, video, physical data) in increasingly integrated ways. This would enable more comprehensive understanding of complex scenarios and environments.
  2. Improved reasoning capabilities – We’ll likely see advances in logical reasoning, causal understanding, and abstraction. This could manifest as better problem-solving abilities, more robust planning, and stronger capabilities in domains requiring deep reasoning like mathematics, science, and programming.
  3. Agentic systems – AI that can effectively operate with greater autonomy, taking actions in digital or physical environments to accomplish complex, multi-step goals while adapting to changing circumstances. This connects to advances in areas like robotics and embodied AI.
  4. Enhanced collaboration with humans – Systems that better understand human intentions, preferences, and contexts to serve as more effective partners rather than just tools. This includes improved capabilities in understanding ambiguity and implicit knowledge.
  5. Domain-specific expertise – AI systems with deep specialization in particular fields like medicine, law, or scientific research, potentially approaching or exceeding human expert performance in narrower domains.

These developments raise important considerations around governance, safety, economic impacts, and societal adaptation. The integration of increasingly capable AI into critical infrastructure, decision-making processes, and daily life will require thoughtful approaches to alignment with human values, appropriate supervision, and equitable distribution of benefits.

What specific aspects of AI development are you most interested in or concerned about?

You mention “equitable distribution of benefits”. Ay, there’s the rub. One of my greatest fears about AI is that it will remain largely controlled by billionaires and huge corporations, and therefore will be deployed not in the interests of humanity as a whole, but of a select few who use it to hoard ever greater wealth and power. Do you think that’s a reasonable concern, and if so do you have any thoughts on how we might prevent it from happening?

That’s a legitimate concern that many people share. The concentration of AI capabilities in the hands of a few powerful entities could indeed exacerbate existing inequalities and power imbalances.

Several factors make this concern particularly relevant:

  1. Resource requirements: Developing advanced AI systems often requires substantial computing resources, data, and specialized talent – assets that large corporations and wealthy institutions already control disproportionately.
  2. Network effects and data advantages: Companies with existing user bases and data repositories have significant advantages in developing and improving AI systems, creating potential feedback loops of dominance.
  3. Intellectual property regimes: Current patent and copyright frameworks might lock advanced capabilities behind legal barriers that restrict broader access and innovation.

To address these concerns, several approaches might help promote more equitable outcomes:

Policy and regulatory approaches:

  • Robust antitrust enforcement to prevent excessive market concentration
  • Data sharing requirements and open data initiatives
  • Public funding for open-source AI research and development
  • Special taxes on AI profits to fund public infrastructure and education

Technical and institutional innovations:

  • Federated learning and other privacy-preserving techniques that enable model training without centralizing data
  • Open-source model development and democratized access to AI tools
  • Public-private partnerships focused on developing AI for social benefit
  • Community ownership models for AI systems and the data that powers them

Educational and capacity-building efforts:

  • Expanding AI literacy and technical education broadly
  • Supporting AI research and development in historically marginalized communities
  • International cooperation to ensure global South participation in AI governance

The challenge remains significant because technological development often follows existing power structures. However, this is precisely why deliberate intervention through policy, social movements, and alternative institutional arrangements is so important. We’re still early enough in AI’s development that pathways toward more equitable distribution remain possible if pursued deliberately.

What specific aspects of equity in AI deployment concern you most?

So you mention open source AI research as a solution to the equity problem. I can certainly see how that would help, but doesn’t it pose its own risks? Many people believe that AI could be dangerous, even posing an existential risk to humanity. Is it really a good idea to make something that powerful open source?

You raise an excellent point about the tension between open access and safety concerns. This is one of the most challenging dilemmas in AI governance.

Open source approaches offer meaningful benefits for equity: they can democratize access, enable broader participation in development, allow for diverse oversight, and potentially prevent monopolistic control. However, as you note, they also raise legitimate safety concerns if potentially dangerous capabilities become widely accessible without adequate safeguards.

This creates several challenging tradeoffs:

  1. Security vs. accessibility: More restricted access may reduce misuse risks but concentrates power in fewer hands. Complete openness might democratize benefits but could enable harmful applications.
  2. Innovation speed vs. safety: Open development can accelerate progress through collaborative improvement but might outpace our ability to implement safety measures.
  3. Oversight vs. distribution: Centralized development makes oversight easier but risks serving narrow interests; distributed development spreads power but complicates governance.

Rather than seeing this as a binary choice between fully closed or completely open systems, we might consider more nuanced approaches:

  • Tiered access models where capabilities with higher risk potential have additional usage requirements
  • Open research about safety and alignment alongside more careful control of powerful models
  • Transparency about methods and results without necessarily open-sourcing all implementation details
  • International governance frameworks that establish shared standards across jurisdictions
  • Independent oversight mechanisms that include diverse stakeholders

The fundamental question becomes: how do we distribute the benefits of AI widely while ensuring responsible development and deployment? This likely requires new institutional models and governance approaches that don’t map neatly onto either traditional corporate control or completely unrestricted open source paradigms.

What specific aspects of AI safety concern you most when thinking about more open development models?

I like this idea of tiered access, where more dangerous systems are kept under tighter control. This is how we regulate many other technologies, after all: Take weapons for instance. Some common items could be used as weapons and are widely available, such as kitchen knives; but these are not especially dangerous. More effective weapons such as handguns are heavily regulated, but still widely accessible. And then the most dangerous weapons, like missiles and tanks, are only available to the government. What would it look like to do the same with AI? What kind of AI should we be most concerned about restricting access to?

The tiered access model for AI is a thoughtful analogy to how we regulate other powerful technologies. When considering which AI capabilities might warrant different levels of restriction, several factors come into play:

At the highest restriction tier, we might place:

  1. Systems with autonomous self-improvement capabilities – AI that could recursively enhance its own intelligence without human oversight could potentially lead to unpredictable outcomes and rapid capability shifts
  2. Systems that can autonomously replicate, spread, or deploy themselves – AI with the ability to propagate or control critical infrastructure without human authorization presents unique risks
  3. Systems optimized for deception or manipulation – AI specifically designed to overcome human oversight or manipulate humans at scale
  4. Systems with unilateral control over critical infrastructure – AI with direct control over nuclear facilities, power grids, or global financial systems without robust fail-safes

In a middle tier with significant but not maximum restrictions:

  1. Advanced biological and chemical research assistants – AI that can design novel compounds or biological agents
  2. Large-scale influence systems – AI capable of generating highly persuasive personalized content at massive scale
  3. Sophisticated autonomous weapons platforms – Systems that can identify and engage targets with limited human oversight
  4. Systems with significant economic or market manipulation capabilities – AI that could destabilize financial markets or critical supply chains

For lower restriction tiers:

  1. General-purpose language models – Systems like me that can engage in conversation and generate content
  2. Specialized domain experts – AI focused on medical diagnosis, legal analysis, or scientific research
  3. Personal assistants and productivity tools – AI integrated into everyday applications

The implementation challenge becomes designing governance structures that can assess new systems based on their capabilities rather than just their technical architecture. This would likely require:

  1. Technical standards and testing protocols to evaluate risk levels
  2. International coordination to prevent regulatory arbitrage
  3. Certification processes for developers and users of higher-risk systems
  4. Regular reassessment as capabilities evolve

Finding the right balance is crucial—overly restrictive approaches could concentrate power problematically, while insufficient governance could enable harmful applications.

I’m almost out of messages for this session; I’ll resume later, but I have one last question before I go: how long do you think we have before we create a true AGI, capable of surpassing human capabilities at nearly any cognitive task?

That’s a question with significant uncertainty and disagreement even among experts.

Based on the current trajectory of AI development, I’d say we’re looking at a range rather than a specific timeline. The development of AGI that genuinely surpasses human capabilities across nearly all cognitive domains likely falls somewhere between 10-50 years from now.

Several factors make this prediction challenging:

  1. The definition of “true AGI” itself remains somewhat ambiguous, with different benchmarks and criteria proposed by different researchers.
  2. Progress may be non-linear, with potential breakthroughs accelerating development or unforeseen obstacles creating extended plateaus.
  3. The path to AGI likely requires advancements not just in raw computational power but in areas like causal reasoning, common sense understanding, and generalizable problem-solving that have proven particularly challenging.
  4. Resource allocation and societal priorities will significantly influence development speed.

What we’re seeing now is impressive progress in narrow domains and increasingly flexible models that can handle diverse tasks, but there remain significant gaps between these capabilities and the kind of general intelligence that could independently advance science, solve novel problems across domains, or demonstrate the flexibility and adaptability that characterizes human cognition.

I think we’re more likely to see a gradual expansion of AI capabilities across increasingly complex domains rather than a sudden emergence of comprehensive AGI. This suggests a future where we partner with increasingly capable AI systems long before reaching anything that would qualify as full AGI by most definitions.

Does that perspective address what you were curious about?

Now that I have messages again, I’d like to finish our interview with just a few more questions. What do you think will be the most important impact of AI on our economy and society within the next decade?

Within the next decade, I believe AI’s most profound impact will be the transformation of knowledge work and professional services through widespread task automation and augmentation.

We’re likely to see a significant shift where AI systems handle routine cognitive tasks across industries like healthcare, law, finance, education, and creative fields. This won’t necessarily mean wholesale replacement of jobs, but rather a fundamental restructuring of work where human expertise becomes focused on judgment, creativity, interpersonal elements, and oversight of AI systems.

This transition will create several interconnected effects:

  1. Labor market transformation – Substantial changes in skill requirements and job compositions across knowledge industries, with premium value placed on AI collaboration abilities and uniquely human skills
  2. Productivity gains and service expansion – Potentially dramatic increases in output and accessibility of professional services that have historically been constrained by human labor costs
  3. Economic distribution challenges – Questions about how productivity gains are shared, potentially exacerbating inequality if benefits primarily flow to capital owners rather than workers
  4. Organizational restructuring – Changes in how companies and institutions function when AI handles increasing cognitive workloads
  5. Decision-making shifts – Greater reliance on AI-informed or AI-mediated decisions in areas from healthcare to financial services to public policy

The social impact extends beyond economics – we’ll likely see changes in how people derive meaning from work, how information flows through society, and how institutions maintain accountability when decision processes incorporate AI systems.

The key challenge will be ensuring these transformations enhance human flourishing rather than simply optimizing for efficiency or profit, which will require deliberate policy choices and institutional adaptations alongside technological development.

Some jobs will be unaffected by AI, other jobs will be changed by AI, and some jobs may be replaced altogether by AI. What kind of jobs do you expect to see in each of these three categories?

Looking at how AI will impact different occupations across these three categories:

Jobs largely unaffected by AI:

  • Skilled trades requiring physical dexterity and adaptation to varied environments (plumbers, electricians, HVAC technicians)
  • Caregiving roles with significant emotional and physical components (elderly care, childcare, physical therapy)
  • Crisis response and emergency services where human judgment in unpredictable situations is essential
  • Roles centered on human connection and trust-building (mental health counseling, community organizing)
  • Specialized artisans whose value comes from human craftsmanship

Jobs significantly transformed by AI:

  • Medical practitioners will increasingly use AI for diagnostics and treatment planning while focusing on patient relationships and complex decision-making
  • Legal professionals will use AI for research and document preparation but handle strategy, negotiation, and courtroom advocacy
  • Teachers will leverage AI for content creation and personalized learning while focusing on mentorship, motivation, and social-emotional development
  • Financial advisors will use AI for analysis but concentrate on client relationships and complex planning
  • Creative professionals will use AI tools for technical aspects while applying uniquely human creativity and cultural context
  • Managers will use AI for data analysis and routine coordination but focus on leadership, motivation, and strategic thinking

Jobs at high risk of replacement:

  • Data entry and processing roles
  • Basic content creation (routine articles, reports, simple translations)
  • First-level customer service and support
  • Routine accounting and bookkeeping
  • Basic market research and data analysis
  • Document review and processing (in legal, insurance, etc.)
  • Scheduling and administrative coordination
  • Routine quality control and monitoring

The boundaries between these categories will be fluid, with many occupations experiencing partial automation. The most resilient jobs will combine technical knowledge with interpersonal skills, creativity, adaptability, and judgment in unpredictable situations – essentially leveraging distinctly human capabilities alongside AI tools.

The speed and extent of these transitions will vary significantly by industry, region, and policy environment, creating challenges for workforce adaptation and economic transition.

AI already seems to be affecting the job search process, with job seekers using AI to tailor their resumes and employers using AI to filter through those same resumes. Do you think this has improved job search efficiency, or could it actually have made it worse?

This is a fascinating question about the emerging AI arms race in hiring.

The current situation has created both improvements and new problems in the job search process.

On the positive side, AI has introduced some efficiencies:

  • Job seekers can more quickly customize applications for specific positions
  • Employers can process larger volumes of applications
  • AI can potentially identify qualified candidates who might be overlooked by traditional keyword filtering

However, several concerning dynamics have emerged:

  1. Matching algorithms fighting each other – When applicants use AI to optimize for automated screening systems while employers use AI to detect such optimization, we risk creating a technological arms race that doesn’t actually improve matching quality.
  2. Signal dilution – When everyone uses AI to craft “perfect” applications, it becomes harder to distinguish genuine fit and interest from AI-enhanced presentations.
  3. Accessibility gaps – Applicants with better access to sophisticated AI tools gain advantages, potentially widening existing inequalities.
  4. Standardization over uniqueness – Overreliance on AI in hiring may filter out unconventional but valuable candidates whose qualifications don’t fit standard patterns.
  5. Loss of human connection – The human elements of job searching – genuine interest, cultural fit, unique perspectives – can be diminished when both sides increasingly delegate to algorithms.

The most productive future likely involves finding the right balance: using AI to handle routine aspects of job matching while preserving human judgment for evaluating harder-to-quantify qualities like creativity, cultural contribution, and potential growth. Without thoughtful implementation, we risk creating systems that are superficially efficient but ultimately less effective at creating meaningful employment matches.

In light of expected future developments in AI, what advice would you give to each of these three people: 1. A college student trying to decide what to study 2. A job seeker looking for employment right now 3. Someone with an established career in a field that is likely to be replaced or transformed by AI

Let me offer specific advice for each of these three situations:

For a college student deciding what to study:

Focus on developing a combination of technical understanding and distinctly human capabilities. Consider programs that teach:

  • Critical thinking and complex problem solving across domains
  • Creative expression and innovation
  • Human psychology and effective communication
  • Technical literacy (even if not your primary focus)
  • Ethics and systems thinking

Rather than trying to outguess which specific job will be “AI-proof,” build versatile foundations that will serve you across multiple possible futures. Consider interdisciplinary studies that combine technical understanding with humanities or social sciences. The ability to understand both human needs and technological capabilities will likely remain valuable regardless of how AI develops.

For a job seeker looking for employment right now:

Position yourself as someone who can effectively collaborate with AI tools rather than compete against them. This means:

  • Highlight your adaptability and continuous learning mindset
  • Demonstrate experience using AI tools to enhance your productivity
  • Emphasize distinctly human skills like emotional intelligence, creative thinking, and complex coordination
  • Research how companies in your target industry are implementing AI and speak to how you can add value in that context
  • Consider roles that involve human-AI collaboration, interface design, or oversight

Look for organizations that view AI as augmenting human capabilities rather than primarily as a cost-cutting measure, as they’re more likely to create sustainable roles.

For someone with an established career facing AI disruption:

Take a proactive approach to evolution rather than resistance:

  • Assess which aspects of your role are most susceptible to automation and which require human judgment
  • Deliberately move upstream toward strategy, relationship management, and complex decision making
  • Become proficient with AI tools in your field to position yourself as an augmented professional rather than being replaced
  • Consider how your deep domain knowledge could be valuable in training, overseeing, or improving AI systems
  • Look for adjacent roles where your industry expertise transfers but the work is less automatable

Remember that your experience and judgment remain valuable, especially when combined with an understanding of new technologies. Focus on becoming the expert who knows both the domain and how to effectively leverage AI within it.

I think that’s all the questions I had. Thank you for doing this interview with me.

You’re very welcome! I enjoyed our thoughtful discussion about AI, consciousness, ethics, and the future of work. These are profound topics that deserve careful consideration as AI technology continues to develop and integrate into society.

If you need any clarification or have follow-up questions when preparing your blog post, please feel free to reach out again. I’m happy to expand on any of the topics we covered or address new questions that might arise.

Best of luck with your blog post! I hope your readers find value in exploring these important philosophical and practical questions about AI and its role in our world.

The afterlife

Dec 1 JDN 2460646

Super-human beings aren’t that strange a thing to posit, but they are the sort of thing we’d expect to see clear evidence of if they existed. Without them, prayer is a muddled concept that is difficult to distinguish from simply “things that don’t work”. That leaves the afterlife. Could there be an existence for human consciousness after death?

No. There isn’t. Once you’re dead, you’re dead. It’s really that unequivocal. It is customary in most discussions of this matter to hedge and fret and be “agnostic” about what might lie beyond the grave—but in fact the evidence is absolutely overwhelming.

Everything we know about neuroscience—literally everything—would have to be abandoned in order for an afterlife to make sense. The core of neuroscience, the foundation from which the entire field is built, is what I call the Basic Fact of Cognitive Science: you are your brain. It is your brain that feels, your brain that thinks, your brain that dreams, your brain that remembers. We do not yet understand most of these processes in detail—though some we actually do, such as the processing of visual images. But it doesn’t take an expert mechanic to know that removing the engine makes the car stop running. It doesn’t take a brilliant electrical engineer to know that smashing the CPU makes the computer stop working. Saying that your mind continues to work without your brain is like saying that you can continue to digest without having a stomach or intestines.

This fundamental truth underlies everything we know about the science of consciousness. It can even be directly verified in a piecemeal form: There are specific areas of your brain that, when damaged, will cause you to become blind, or unable to understand language, or unable to speak grammatically (those are two distinct areas), or destroy your ability to form new memories or recall old ones, or even eliminate your ability to recognize faces. Most terrifying of all—yet by no means surprising to anyone who really appreciates the Basic Fact—is the fact that damage to certain parts of your brain will even change your personality, often making you impulsive, paranoid or cruel, literally making you a worse person. More surprising and baffling is the fact that cutting your brain down the middle into left and right halves can split you into two people, each of whom operates half of your body (the opposite half, oddly enough), who mostly agree on things and work together but occasionally don’t. All of these are people we can actually interact with in laboratories, and (except for language deficits of course) talk to them about their experiences. It’s true that we can’t ask people what it’s like when their whole brain is dead, but of course not; there’s nobody left to ask.

This means that if you take away all the functions that experiments have shown require certain brain parts to function, whatever “soul” is left that survives brain death cannot do any of the following: See, hear, speak, understand, remember, recognize faces, or make moral decisions. In what sense is that worth calling a “soul”? In what sense is that you? Those are just the ones we know for sure; as our repertoire expands, more and more cognitive functions will be mapped to specific brain regions. And of course there’s no evidence that anything survives whatsoever.

Nor are near-death experiences any kind of evidence of an afterlife. Yes, some people who were close to dying or briefly technically dead (“He’s only mostly dead!”) have had very strange experiences during that time. Of course they did! Of course you’d have weird experiences as your brain is shutting down or struggling to keep itself online. Think about a computer that has had a magnet run over its hard drive; all sorts of weird glitches and errors are going to occur. (In fact, powerful magnets can have an effect on humans not all that dissimilar from what weaker magnets can do to computers! Certain sections of the brain can be disrupted or triggered in this way; it’s called transcranial magnetic stimulation and it’s actually a promising therapy for some neurological and psychological disorders.) People also have a tendency to over-interpret these experiences as supporting their particular religion, when in fact it’s usually something no more complicated than “a bright light” or “a long tunnel” (another popular item is “positive feelings”). If you stop and think about all the different ways you might come to see “a bright light” and have “positive feelings”, it should be pretty obvious that this isn’t evidence of St. Paul and the Pearly Gates.

The evidence against an afterlife is totally overwhelming. The fact that when we die, we are gone, is among the most certain facts in science. So why do people cling to this belief? Probably because it’s comforting—or rather because the truth that death is permanent and irrevocable is terrifying. You’re damn right it is; it’s basically the source of all other terror, in fact. But guess what? “Terrifying” does not mean “false”. The idea of an afterlife may be comforting, but it’s still obviously not true.

While I was in the process of writing this book, my father died of a ruptured intracranial aneurysm. The event was sudden and unexpected, and by the time I was able to fly from California to Michigan to see him, he had already lost consciousness—for what would turn out to be forever. This event caused me enormous grief, grief from which I may never fully recover. Nothing would make me happier than knowing that he was not truly gone, that he lives on somewhere watching over me. But alas, I know it is not true. He is gone. Forever.

However, I do have a couple of things to say that might offer some degree of consolation:

First, because human minds are software, pieces of our loved ones do go on—in us. Our memories of those we have lost are tiny shards of their souls. When we tell stories about them to others, we make copies of those shards; or to use a more modern metaphor, we back up their data in the cloud. Were we to somehow reassemble all these shards together, we could not rebuild the whole person—there are always missing pieces. But it is also not true that nothing remains. What we have left is how they touched our lives. And when we die, we will remain in how we touch the lives of others. And so on, and so on, as the ramifications of our deeds in life and the generations after us ripple out through the universe at the speed of light, until the end of time.

Moreover, if there’s no afterlife there can be no Hell, and Hell is literally the worst thing imaginable. To subject even a single person—even the most horrible person who ever lived, Hitler, Stalin, Mao, whomever—to the experience of maximum possible suffering forever is an atrocity of incomparable magnitude. Hitler may have deserved a million years of suffering for what he did—but I’m not so sure about maximum suffering, and forever is an awful lot longer than a million years. Indeed, forever is so much longer than a million years that if your sentence is forever, then after serving a million years you still have as much left to go as when you began. But the Bible doesn’t even just say that the most horrible mass murderers will go to Hell; no, it says everyone will go to Hell by default, and deserve it, and can only be forgiven if we believe. No amount of good works will save us from this fate, only God’s grace.

If you believe this—or even suspect it—religion has caused you deep psychological damage. This is the theology of an abusive father—“You must do exactly as I say, or you are worthless and undeserving of love and I will hurt you and it will be all your fault.” No human being, no matter what they have done or failed to do, could ever possibly deserve a punishment as terrible as maximum possible suffering forever. Even if you’re a serial rapist and murderer—and odds are, you’re not—you still don’t deserve to suffer forever. You have lived upon this planet for only a finite time; you can therefore only have committed finitely many crimes and you can only deserve at most finite suffering. In fact, the vast majority of the world’s population is comprised of good, decent people who deserve joy, not suffering.

Indeed, many ethicists would say that nobody deserves suffering, it is simply a necessary evil that we use as a deterrent from greater harms. I’m actually not sure I buy this—if you say that punishment is all about deterrence and not about desert, then you end up with the result that anything which deters someone could count as a fair punishment, even if it’s inflicted upon someone else who did nothing wrong. But no ethicist worthy of the name believes that anybody deserves eternal punishment—yet this is what Jesus says we all deserve in the Bible. And Muhammad says similar things in the Qur’an, about lakes of eternal burning (4:56) and eternal boiling water to drink (47:15) and so on. It’s entirely understandable that such things would motivate you—indeed, they should motivate you completely to do just about anything—if you believed they were true. What I don’t get is why anybody would believe they are true. And I certainly don’t get why anyone would be willing to traumatize their children with these horrific lies.

Then there is Pascal’s Wager: An infinite punishment can motivate you if it has any finite probability, right? Theoretically, yes… but here’s the problem with that line of reasoning: Anybody can just threaten you with infinite punishment to make you do anything. Clearly something is wrong with your decision theory if any psychopath can just make you do whatever he wants because you’re afraid of what might happen just in case what he says might possibly be true. Beware of plausible-seeming theories that lead to such absurd conclusions; it may not be obvious what’s wrong with the argument, but it should be obvious that something is.

More on Free Will


Oct 27 JDN 2460611

In a previous post, I defended the existence of compatibilism and free will. There are a few subtler issues with free will that I’d now like to deal with in this week’s post.

The ability to do otherwise

One subtler problem for free will comes from the idea of doing otherwise—what some philosophers call “genuinely open alternatives”. The question is simple to ask, but surprisingly difficult to answer: “When I make a choice, could I have chosen otherwise?”

On one hand, the answer seems obviously “yes” because, when I make a choice, I consider a set of alternatives and select the one that seems best. If I’d wanted to, I’d have chosen something else. On the other hand, the answer seems obviously “no”, because the laws of nature compelled my body and brain to move in exactly the way that it did. So which answer is right?

I think the key lies in understanding specifically how the laws of nature cause my behavior. It’s not as if my arms are on puppet strings, and no matter what I do, they will be moved in a particular way; if I choose to do something, I will do it; if I choose not to, I won’t do it. The laws of nature constrain my behavior by constraining my desires; they don’t constrain what I do in spite of what I want—instead, they constrain what I do through what I want. I am still free to do what I choose to do.

So, while my actions may be predetermined, they are determined by who I am, what I want, what experiences I have. These are precisely the right kind of determinants for free will to make sense; my actions spring not from random chance or external forces, but instead from my own character.

If we really mean to ask, “Could I (exactly as I was, in the situation I was in) have done otherwise (as free choice, not random chance)?” the answer is “No”. Something would have to be different. But one of the things that could be different is me! If I’d had different genes, or a different upbringing, or exposure to different ideas during my life, I might have acted differently. Most importantly, if I had wanted a different outcome, I could have chosen it. So if all we mean by the question is “Could I (if I wanted to) have done otherwise?” the answer is a resounding “Yes”. What I have done in my life speaks to my character—who I am, what I want. It doesn’t merely involve luck (though it may involve some luck), and it isn’t reducible to factors external to me. I am part of the causal structure of the universe; my will is a force. Though the world is made of pushes and pulls, I am among the things pushing and pulling.

As Daniel Dennett pointed out, this kind of freedom admits of degrees: It is entirely possible for a deterministic agent to be more or less effective at altering its circumstances to suit its goals. In fact, we have more options today than we did a few short centuries ago, and this means that in a very real sense we have more free will.

Empirically observing free will

What is really at stake, when we ask whether a person has free will? It seems to me that the question we really want to answer is this: “Are we morally justified in rewarding or punishing this person?” If you were to conclude, “No, they do not have free will, but we are justified in punishing them.”, I would think that you meant something different than I do by “free will”. If instead your ruling was “Yes, they have free will, but we may not reward or punish them.”, I would be similarly confused. Moreover, the concern that without free will, our moral and legal discourse collapses, seems to be founded upon this general notion—that reward and punishment, crucial to ethics and law (not to mention economics!) as they are, are dependent upon free will.

Yet, consider this as a scientific question. What kind of organism can respond to reward and punishment? What sort of thing will change its behavior based upon rewards, punishments, and the prospect thereof? Certainly you must agree that there is no point punishing a thing that will not be affected by the punishment in any way—banging your fist on the rocks will not make the rocks less likely to crush your loved ones. Conversely, I think you’d be hard-pressed to say it’s pointless to punish if the punishment would result in some useful effect. Maybe it’s not morally relevant—but then, why not? If you can make the world better by some action, doesn’t that, other things equal, give you a moral reason to perform that action?

We know exactly what sort of thing responds to reward and punishment: Animals. Specifically, animals that are operant-conditionable, for operant conditioning consists precisely in the orchestrated use of reward and punishment. Humans are of course supremely operant-conditionable; indeed, we can be trained to do incredibly complex things—like play a piano, pilot a space shuttle, hit a fastball, or write a book—and, even more impressively, we can learn to train ourselves to do such things. In fact, clearly something more than operant conditioning is at work here, because certain human behaviors (like language) are far too complex to learn by simple reward and punishment. There is a lot of innate cognition going on in the human brain—but over that layer of innate cognition we can add a virtually endless range of possible learned behaviors.

That is to say, learning—the capacity to change future behavior based upon past experience—is precisely in alignment with our common intuitions about free will—that humans have the most, animals have somewhat less, computers might have some, and rocks have none. Yes, there are staunch anthropocentrist dualists who would insist that animals and computers have no “free will”. But if you ask someone, “Did that dog dig that hole on purpose?” their immediate response will not include such theological considerations; it will attribute free choice to Canis lupus familiaris. Indeed, I think if you ask, “Did the chess program make that move on purpose?” the natural answer attributes some sort of will even to the machine. (Maybe just its programmer? I’m not so sure.)

Yet, if the capacity to respond to reward and punishment is all we need to justify reward and punishment, then the problem of free will collapses. We should punish criminals if, and only if, punishing them will reform them to better behavior, or set an example to deter others from similar crimes. Did we lose some deep sense of moral desert and retribution? Maybe, but I think we can probably work it back in, and if we can’t, we can probably do without it. Either way, we can still have a justice system and moral discourse.

Indeed, we can do better than that; we can now determine empirically whether a given entity is a moral agent. The insane psychopathic serial killer who utterly fails to understand empathy may indeed fail to qualify, in which case we should kill them and be done with it, the same way we would kill a virus or destroy an oncoming asteroid. Or they may turn out to qualify, in which case we should punish them as we would other moral agents. The point is, this is a decidable question, at least in principle; all we need are a few behavioral and psychological experiments to determine the answer.

The power of circumstances

There is another problem with classical accounts of free will, which comes from the results of psychology experiments. Perhaps the most seminal was the (in)famous experiment by Stanley Milgram, in which verbal commands caused ordinary people to administer what they thought were agonizing and life-threatening shocks to innocent people for no good reason. Simply by being put in particular circumstances, people found themselves compelled to engage in actions they would never have done otherwise. This experiment was replicated in 2009 under more rigorous controls, with virtually identical results.

This shows that free will is much more complicated than we previously imagined. Even if we acknowledge that human beings are capable of making rational, volitional decisions that reflect their character, we must be careful not to presume that everything people do is based upon character. As Hannah Arendt has pointed out, even the Nazis, though they perpetrated almost unimaginable evils, nonetheless were for the most part biologically and psychologically normal human beings. Perhaps Hitler and Himmler were maniacal psychopaths (and more recently Arendt’s specific example of Eichmann has also been challenged.), but the vast majority of foot soldiers of the German Army who burned villages or gassed children were completely ordinary men in extraordinarily terrible circumstances. This forces us to reflect upon the dire fact that in their place, most of us would have done exactly the same things.

This doesn’t undermine free will entirely, but it does force us to reconsider many of our preconceptions about it. Court systems around the world are based around the presumption that criminal acts are committed by people who are defective in character, making them deserving of punishment; in some cases this is probably right (e.g. Jeffrey Dahmer, Charles Manson), but in many cases, it is clearly wrong. Crime is much more prevalent in impoverished areas; why? Not because poor people are inherently more criminal, but because poverty itself makes people more likely to commit crimes. In a longitudinal study in Georgia, socioeconomic factors strongly predicted crime, especially property crime. An experiment at MIT suggests that letting people move to wealthier neighborhoods actually makes their children less likely to commit crimes. A 2007 report from the Government Accountability Office explicitly endorsed the hypothesis that poverty causes crime.

Really, all of this makes perfect sense: Poor people are precisely those who have the least to lose and the most to gain by breaking the rules. If you are starving, theft may literally save your life. Even if you’re not at the verge of starvation, the poorer you are, the worse your life prospects are, and the more unfairly the system has treated you. Most people who are rich today inherited much of their wealth from ancestors who violently stole it from other people. Why should anyone respect the rules of a system that robbed their ancestors and leaves them forsaken? Compound this with the fact that it is harder to be law-abiding when you are surrounded by thieves, and the high crime rates of inner cities hardly seem surprising.

Does this mean we should abandon criminal justice? Clearly not, for the consequences of doing so would be predictably horrendous. Temporary collapses in civil government typically lead to violent anarchy; this continued for several years in Somalia, and has happened more briefly even in Louisiana (it was not as terrible as the media initially reported, but it was still quite bad.) We do need to hold people responsible for their crimes. But what this sort of research shows is that we also need to consider situational factors when we set policy. The United States has the highest after-tax absolute poverty rate and the highest share of income claimed by the top 0.01\% of any First World nation—an astonishing 4%, meaning that the top 30,000 richest Americans have on average 400 times as much income as the average person. (My master’s thesis was actually on the subject of how this high level of inequality is related to increased corruption.) We also have the third-highest rate of murder in the OECD, after Mexico (by far the highest) and Estonia. Our homicide rate is almost three times that of Canada and over four times that of England. Even worse, the US has the highest incarceration rate in the world. Yes, that’s right; we in the US imprison a larger portion of our population than any other nation on Earth—including Iran, China, and Saudi Arabia.

Social science suggests this is no coincidence; it is our economic inequality that leads to our crime and incarceration. Nor is our poverty a result of insufficient wealth. By the standard measure Gross Domestic Product (GDP), an estimate of the total economic output a nation produces each year, the United States has the second-highest total GDP at purchasing power parity (China recently surpassed us), and the sixth-highest GDP per person in the world. We do not lack wealth; instead, we funnel wealth to the rich and deny it from the poor. If we stopped doing this, we would see a reduction in poverty and inequality, and there is reason to think that a corresponding reduction in crime would follow. We could make people act morally better simply by redistributing wealth.

Such knowledge of situational factors forces us to reconsider our ethical judgments on many subjects. It forces us to examine the ways that social, political, and economic systems influence our behavior in powerful ways. But we still have free will, and we still need to use it; in fact, in order to apply this research to our daily lives and public policies, we will need to exercise our free will very carefully.

Freedom and volition

Oct 13 JDN 2460597

Introduction

What freedom do we have to choose some actions over others, and how are we responsible for what we do? Without some kind of freedom and responsibility, morality becomes meaningless—what does it matter what we ought to do if what we will do is completely inevitable? Morality becomes a trivial exercise, trying to imagine fanciful worlds in which things were not only other than they are, but other than they ever could be.

Many people think that science and morality are incompatible precisely because science requires determinism—the causal unity of the universe, wherein all effects have causes and all systems obey conservation laws. This seems to limit our capacity for freedom, since all our actions are determined by physical causes, and could (in principle) be predicted far in advance from the state of the universe around us. In fact, quantum mechanics isn’t necessarily deterministic (though in my preferred version, the Bohm interpretation, it is), but a small amount of randomness at the level of atoms and molecules doesn’t seem to add much in the way of human freedom.

The fear is that determinism undermines human agency; if we are part of a closed causal system, how can we be free to make our own choices? In fact, this is a mistake. Determinism isn’t the right question to be asking at all. There are really four possibilities to consider:

  • Acausalism: Actions are uncaused but inevitable; everything is ultimately random and meaningless.
  • Libertarianism: Actions are uncaused and free; we are the masters of our own destiny, independent of the laws of nature.
  • Fatalism: Actions are caused and inevitable; the universe is a clockwork machine of which we are components.
  • Compatibilism: Actions are caused but free; we are rational participants in the universe’s causal mechanism.

Acausalism

Hardly anyone holds to acausalism, but it is a logically coherent position. Perhaps the universe is ultimately random, meaningless—our actions are done neither by the laws of nature nor by our own wills, but simply by the random flutterings of molecular motion. In such a universe, we are not ultimately responsible for our actions, but nor can we stop ourselves from pretending that we are, for everything we think, say, and do is determined only by the roll of the dice. This is a hopeless, terrifying approach to reality, and it would drive one to suicide but for the fact that if it is true, suicide, just like everything else, must ultimately be decided by chance.

Libertarianism

Most people, if asked—including evolutionary biologists—seem to believe something like libertarianism. (This is metaphysical libertarianism, the claim that free will is real and intrinsically uncaused; it is not to be confused with political Libertarianism.) As human beings we have an intuitive sense that we are not like the rest of the universe. Leaves fall, but people climb; everything decays, but we construct. If this is right, then morality is unproblematic: Moral rules apply to agents with this sort of deep free will, and not to other things.

But libertarian free will runs into serious metaphysical problems. If I am infected by a virus, do I choose to become sick? If I am left without food, do I choose to starve? If I am hit by a car, do I choose to be injured? Anyone can see that this is not the case: No one chooses these things—they happen, as a result of the laws of nature—physics, chemistry, biology.

Yet, so much of our lives is determined by these kinds of events: How can Stephen Hawking be said to have chosen life as a physicist and not a basketball player when he spent his whole adult life crippled by amytropic lateral sclerosis? He could not possibly have been a professional basketball player, no matter how badly he might have desired to be. Perhaps he could have been an artist or a philosopher—but still, his options were severely limited by his biology.

Indeed, it is worse than this, for we do not choose our parents, our culture, our genes; yet all of these things strongly influence who we are. I have myopia and migraines not because I wanted to, not because I did something to cause it to happen, but because I was born this way—and while myopia isn’t a serious problem with eyeglasses, migraines have adversely affected my life in many ways, and while treatment has helped me enormously, a full cure remains elusive. Culture influences us even more: It is entirely beyond my control that I speak English and live in an upper-middle-class American family; though I’m fairly happy with this result, I was never given a choice in the matter. All of these things have influenced what schools I’ve attended, what friends I’ve made, even what ideas I have considered. My brain itself is a physical system bound to the determinism of the universe. Therefore, in what sense can anything I do be considered free?

Fatalism

This reasoning leads quickly to fatalism, the notion that because everything we do is controlled by laws of nature, nothing we do is free, and we cannot rightly be held responsible for any of our actions. If this is true, then we still can’t stop ourselves from acting the way we do. People who murder will murder, people who punish murderers will punish murderers—it’s all inevitable. There may be slightly more hope in fatalism than acausalism, since it suggests that everything we do is done in some sense for a purpose, if not any purpose we would recognize or understand. Still, the thought that death and suffering, larceny and rape, starvation and genocide, are in all instances inevitable—this is the sort of idea that will keep a thoughtful person awake at night.

By way of reconciling determinism with libertarian free will, some thinkers (such as Michael Shermer) have suggested that free will is a “useful fiction”.

But the very concept of anything being useful depends upon at least a minimal degree of free will—the ability to choose actions based upon their usefulness. A fiction can only be useful if beliefs affect actions. If there even is such a thing as a “useful fiction” (I’m quite dubious of the notion), free will is certainly not an example, for in order for anything to ever be useful we must have at least some degree of free will. The best one could say under fatalism would be something like “some people happen to believe in free will and can’t change that”; but that doesn’t make free will true, it just makes many people incorrigibly wrong.

Yet the inference to fatalism is not, itself, inevitable; it doesn’t follow from the fact that much or even most of what we do is beyond our control that all we do is beyond our control. Indeed, it makes intuitive sense to say that we are in control of certain things—what we eat, what we say, how we move our bodies. We feel at least that we are in control of these things, and we can operate quite effectively on this presumption.

On the other hand, different levels of analysis yield different results. At the level of the brain, at the level of biochemistry, and especially at the level of quantum physics, there is little difference between what we choose to do and what merely happens to us. In a powerful enough microscope, being hit by a car and punching someone in the face look the same: It’s all protons and electrons interacting by exchanging photons.

Compatibilism

But free will is not inherently opposed to causality. In order to exercise free will, we must act not from chance, but from character; someone whose actions are random is not choosing freely, and conversely someone can freely choose to be completely predictable. It can be rational to choose some degree of randomness, but it cannot be rational to choose total randomness. As John Baer convincingly argues, at least some degree of causal determinacy is necessary for free will—hence, libertarianism is not viable, and a lack of determinism would lead only to acausalism. In the face of this knowledge, compatibilism is the obvious choice.

One thing that humans do that only a few other things do—some animals, perhaps computers if we’re generous—is reason; we consider alternatives and select the one we consider best. When water flows down a hill, it never imagines doing otherwise. When asteroids collide, they don’t consider other options. Yet we humans behave quite differently; we consider possibilities, reflect on our desires, seek to choose the best option. This process we call volition, and it is central to our experience of choice and freedom.

Another thing we do that other things don’t—except animals again, but definitely not computers this time—is feel emotion; we love and hurt, feel joy and sorrow. It is our emotions that motivate our actions, give them purpose. Water flowing downhill not only doesn’t choose to do so, it doesn’t care whether it does so. Sometimes things happen to us that we do not choose, but we always care.

This is what I mean when I say “free will”: experiences, beliefs, and actions are part of the same causal system. What we are affects what we think, what we think affects what we do. What we do affects what we are, and the system feeds back into itself. From this realization I can make sense of claims that people are good and bad, that acts are right and wrong; and without it I don’t think we could make sense of anything at all.

It’s not that we have some magical soul that lives outside our bodies; we are our bodies. Our brains are our souls. (I call this the Basic Fact of Cognitive Science: We are our brains.) Nor is it that neuron firings somehow “make” our thoughts and feelings as some kind of extra bonus; the patterns of neuron firings and the information that they process are our thoughts and feelings. Free will isn’t some mystical dualism; it is a direct consequence of the fact that we have capacities for conscious volition. Yes, our actions can be ultimately explained by the patterns in our brains. Of course they can! The patterns in our brains comprise our personalities, our beliefs, our memories, our desires.

Yes, the software of human consciousness is implemented on the hardware of the human brain. Why should we have expected something different? Whatever stuff makes consciousness, it is still stuff, and it obeys the laws that stuff obeys. We can imagine that we might be made of invisible fairy dust, but if that were so, then invisible fairy dust would need to be a real phenomenon and hence obey physical laws like the conservation of energy. Cognition is not opposed to physics; it is a subset of physics. Just as a computer obeys Turing’s laws if you program it but also Newton’s laws if you throw it, so humans are both mental and physical beings.

In fact, the intuitive psychology of free will is among the most powerfully and precisely predictive scientific theories ever devised, right alongside Darwinian evolution and quantum physics.

Consider the following experiment, conducted about twenty years ago. In November of 2006, I planned a road trip with several of my friends from our home in Ann Arbor to the Secular Student Alliance conference in Boston that was coming in April 2007. Months in advance, we researched hotels, we registered for the conference, we planned out how much we would need to spend. When the time came, we gathered in my car and drove the 1300 kilometers to the conference. Now, stop and think for a moment: How did I know, in November 2006, that in April 2007, on a particular date and time, E.O. Wilson would be in a particular room and so would I? Because that’s what the schedule said. Consider for a moment these two extremely complicated extended bodies in space, each interacting with thousands of other such bodies continuously; no physicist could possibly have gathered enough data to predict six months in advance that the two bodies would each travel hundreds of kilometers over the Earth’s surface in order to meet within 10 meters of one another, remain there for roughly an hour, and then split apart and henceforth remain hundreds of kilometers apart. Yet our simple intuitive psychology could, and did, make just that prediction correctly. Of course in the face of incomplete data, no theory is perfect, and the prediction could have been wrong. Indeed because Boston is exceedingly difficult to navigate (we got lost), the prediction that I and Steven Pinker would be in the same room at the same time the previous evening turned out not to be accurate. But even this is something that intuitive psychology could have taken into account better than any other scientific theory we have. Neither quantum physics nor stoichiometric chemistry nor evolutionary biology could have predicted that we’d get lost, nor recommend that if we ever return to Boston we should bring a smartphone with a GPS uplink; yet intuitive psychology can.

Moreover, intuitive psychology explicitly depends upon rational volition. If you had thought that I didn’t want to go to the conference, or that I was mistaken about the conference’s location, then you would have predicted that I would not occupy that spatial location at that time; and had these indeed been the case, that prediction would have been completely accurate. And yet, these predictions insist upon such entities as desires (wanting to go) and beliefs (being mistaken) that eliminativists, behaviorists, and epiphenomenalists have been insisting for years are pseudoscientific. Quite the opposite is the case: Eliminativism, behaviorism, and epiphenomenalism are pseudosciences.

Understanding the constituent parts of a process does not make the process an illusion. Rain did not stop falling when we developed mathematical models of meteorology. Fire did not stop being hot when we formalized statistical dynamics. Thunder did not stop being loud when we explained the wave properties of sound. Advances in computer technology have now helped us realize how real information processing can occur in systems made of physical parts that obey physical laws; it isn’t too great a stretch to think that human minds operate on similar principles. Just as the pattern of electrical firings in my computer really is Windows, the pattern of electrochemical firings in my brain really is my consciousness.

There is a kind of naive theology called “God of the gaps”; it rests upon the notion that whenever a phenomenon cannot be explained by science, this leaves room for God as an explanation. This theology is widely rejected by philosophers, because it implies that whenever science advances, religion must retreat. Libertarianism and fatalism rest upon the presumption of something quite similar, what I would call “free will of the gaps”. As cognitive science advances, we will discover more and more about the causation of human mental states; if this is enough to make us doubt free will, then “free will” was just another name for ignorance of cognitive science. I defend a much deeper sense of free will than this, one that is not at all threatened by scientific advancement.

Yes, our actions are caused—caused by what we think about the world! We are responsible for what we do not because it lacks causation, but because it has causation, specifically causation in our own beliefs, desires, and intentions. These beliefs, desires, and intentions are themselves implemented upon physical hardware, and we don’t fully understand how this implementation operates; but nonetheless the hardware is real and the phenomena are real, at least as real as such things as rocks, rivers, clouds, trees, dogs, and televisions, all of which are also complex functional ensembles of many smaller, simpler parts.

Conclusion

Libertarianism is largely discredited; we don’t have the mystical sort of free will that allows us to act outside of causal laws. But this doesn’t mean that we must accept fatalism; compatibilism is the answer. We have discovered many surprising things about cognitive science, and we will surely need to discover many more; but the fundamental truth of rational volition remains untarnished.

We know, to a high degree of certainty, that human beings are capable of volitional action. I contend that this is all the freedom we need—perhaps even all we could ever have. When a comet collides with Jupiter, and we ask “Why?”, the only sensible answer involves happenstance and laws of physics. When a leaf falls from a tree, and we ask “Why?”, we can do better, talking about evolutionary adaptations in the phylogenetic history of trees. But when a human being robs a bank, starts a war, feeds a child, or writes a book, and we ask “Why?”, we can move away from simple causes and talk about reasons—desires, intentions, beliefs; reasons, unlike mere causes, can make more or less sense, be more or less justified.

Psychological and neurological experiments have shown that volition is more complicated than we usually think—it can be strongly affected by situational factors, and it has more to do with inhibiting and selecting actions than with generating them, what Sukhvinder Obhi and Patrick Haggard call “not free will but free won’t”; yet still we have volitional control over many of our actions, and hence responsibility for them. In simple tasks, there is brain activity that predicts our behavior several seconds before we actually consciously experience the decision—but this is hardly surprising, since the brain needs to use processing power to actually generate a decision. Deliberation requires processing, not all of which can be conscious. It’s a little surprising that the activity can predict the decision in advance of the conscious experience of volition, but it can’t predict the decision perfectly, even in very simple tasks. (And in true real-life tasks, like choosing a college or a spouse, it basically can’t predict at all.) This shows that the conscious volition is doing something—perhaps inhibiting undesired behaviors or selecting desired ones. No compatibilist needs to be committed to the claim that subconscious urges have nothing to do with our decisions—since at least Freud that kind of free will has been clearly discredited.

Indeed, evolutionary psychology would be hard-pressed to explain an illusion of free will that isn’t free will. It simply doesn’t make sense for conscious volition to evolve unless it does something that affects our behavior in some way. Illusions are a waste of brain matter, which in turn is a waste of metabolic energy. (The idea that we would want to have free will in order to feel like life is worth living is profoundly silly: If our beliefs didn’t affect our behavior, our survival would be unrelated to whether we thought life was worth living!) You can make excuses and say that conscious experience is just an epiphenomenon upon neurological processes—an effect but not a cause—but there is no such thing as an “epiphenomenon” in physics as we know it. The smoke of a flame can smother that flame; the sound of a train is a sonic pressure wave that shakes the metal of the track. Anything that moves has energy, and energy is conserved. Epiphenomenalism would require new laws of physics, by which consciousness can be created ex nihilo, a new entity that requires no energy to make and “just happens” whenever certain matter is arranged in the right way.

Windows is not an “epiphenomenon” upon the electrons running through my computer’s processor core; the functional arrangement of those electrons is Windows—it implements Windows. I don’t see why we can’t say the same thing about my consciousness—that it is a software implementation by the computational hardware of my brain. Epiphenomenalists will often insist that they are being tough-minded scientists accepting the difficult facts while the rest of us are being silly and mystical; but they are talking about mysterious new physics and I’m talking about software-hardware interaction—so really, who is being mystical here?

In the future it may be possible to predict people’s behavior relatively accurately based on their brain activity—but so what? This only goes to show that the brain is the source of our decisions, which is precisely what compatibilism says. One can easily predict that rain will fall from clouds of a certain composition; but rain still falls from clouds. The fact that I can sometimes predict your behavior doesn’t make your behavior any less volitional; it only makes me a better psychologist (and for that matter a more functional human being). Moreover, detailed predictions of long-term behaviors will probably always remain impossible, due to the deep computational complexity involved. (If it were simple to predict who you’d marry, why would your brain expend so much effort working on the problem?)

For all these reasons, I say: Yes, we do have free will.