We ignorant, incompetent gods

May 21 JDN 2460086

A review of Homo Deus

The real problem of humanity is the following: We have Paleolithic emotions, medieval institutions and godlike technology.

E.O. Wilson

Homo Deus is a very good read—and despite its length, a quick one; as you can see, I read it cover to cover in a week. Yuval Noah Harari’s central point is surely correct: Our technology is reaching a threshold where it grants us unprecedented power and forces us to ask what it means to be human.

Biotechnology and artificial intelligence are now advancing so rapidly that advancements in other domains, such as aerospace and nuclear energy, seem positively mundane. Who cares about making flight or electricity a bit cleaner when we will soon have the power to modify ourselves or we’ll all be replaced by machines?

Indeed, we already have technology that would have seemed to ancient people like the powers of gods. We can fly; we can witness or even control events thousands of miles away; we can destroy mountains; we can wipeout entire armies in an instant; we can even travel into outer space.

Harari rightly warns us that our not-so-distant descendants are likely to have powers that we would see as godlike: Immortality, superior intelligence, self-modification, the power to create life.

And where it is scary to think about what they might do with that power if they think the way we do—as ignorant and foolish and tribal as we are—Harari points out that it is equally scary to think about what they might do if they don’t think the way we do—for then, how do they think? If their minds are genetically modified or even artificially created, who will they be? What values will they have, if not ours? Could they be better? What if they’re worse?

It is of course difficult to imagine values better than our own—if we thought those values were better, we’d presumably adopt them. But we should seriously consider the possibility, since presumably most of us believe that our values today are better than what most people’s values were 1000 years ago. If moral progress continues, does it not follow that people’s values will be better still 1000 years from now? Or at least that they could be?

I also think Harari overestimates just how difficult it is to anticipate the future. This may be a useful overcorrection; the world is positively infested with people making overprecise predictions about the future, often selling them for exorbitant fees (note that Harari was quite well-compensated for this book as well!). But our values are not so fundamentally alien from those of our forebears, and we have reason to suspect that our descendants’ values will be no more different from ours.

For instance, do you think that medieval people thought suffering and death were good? I assure you they did not. Nor did they believe that the supreme purpose in life is eating cheese. (They didn’t even believe the Earth was flat!) They did not have the concept of GDP, but they could surely appreciate the value of economic prosperity.

Indeed, our world today looks very much like a medieval peasant’s vision of paradise. Boundless food in endless variety. Near-perfect security against violence. Robust health, free from nearly all infectious disease. Freedom of movement. Representation in government! The land of milk and honey is here; there they are, milk and honey on the shelves at Walmart.

Of course, our paradise comes with caveats: Not least, we are by no means free of toil, but instead have invented whole new kinds of toil they could scarcely have imagined. If anything I would have to guess that coding a robot or recording a video lecture probably isn’t substantially more satisfying than harvesting wheat or smithing a sword; and reconciling receivables and formatting spreadsheets is surely less. Our tasks are physically much easier, but mentally much harder, and it’s not obvious which of those is preferable. And we are so very stressed! It’s honestly bizarre just how stressed we are, given the abudance in which we live; there is no reason for our lives to have stakes so high, and yet somehow they do. It is perhaps this stress and economic precarity that prevents us from feeling such joy as the medieval peasants would have imagined for us.

Of course, we don’t agree with our ancestors on everything. The medieval peasants were surely more religious, more ignorant, more misogynistic, more xenophobic, and more racist than we are. But projecting that trend forward mostly means less ignorance, less misogyny, less racism in the future; it means that future generations should see the world world catch up to what the best of us already believe and strive for—hardly something to fear. The values that I believe are surely not what we as a civilization act upon, and I sorely wish they were. Perhaps someday they will be.

I can even imagine something that I myself would recognize as better than me: Me, but less hypocritical. Strictly vegan rather than lacto-ovo-vegetarian, or at least more consistent about only buying free range organic animal products. More committed to ecological sustainability, more willing to sacrifice the conveniences of plastic and gasoline. Able to truly respect and appreciate all life, even humble insects. (Though perhaps still not mosquitoes; this is war. They kill more of us than any other animal, including us.) Not even casually or accidentally racist or sexist. More courageous, less burnt out and apathetic. I don’t always live up to my own ideals. Perhaps someday someone will.

Harari fears something much darker, that we will be forced to give up on humanist values and replace them with a new techno-religion he calls Dataism, in which the supreme value is efficient data processing. I see very little evidence of this. If it feels like data is worshipped these days, it is only because data is profitable. Amazon and Google constantly seek out ever richer datasets and ever faster processing because that is how they make money. The real subject of worship here is wealth, and that is nothing new. Maybe there are some die-hard techno-utopians out there who long for us all to join the unified oversoul of all optimized data processing, but I’ve never met one, and they are clearly not the majority. (Harari also uses the word ‘religion’ in an annoyingly overbroad sense; he refers to communism, liberalism, and fascism as ‘religions’. Ideologies, surely; but religions?)

Harari in fact seems to think that ideologies are strongly driven by economic structures, so maybe he would even agree that it’s about profit for now, but thinks it will become religion later. But I don’t really see history fitting this pattern all that well. If monotheism is directly tied to the formation of organized bureaucracy and national government, then how did Egypt and Rome last so long with polytheistic pantheons? If atheism is the natural outgrowth of industrialized capitalism, then why are Africa and South America taking so long to get the memo? I do think that economic circumstances can constrain culture and shift what sort of ideas become dominant, including religious ideas; but there clearly isn’t this one-to-one correspondence he imagines. Moreover, there was never Coalism or Oilism aside from the greedy acquisition of these commodities as part of a far more familiar ideology: capitalism.

He also claims that all of science is now, or is close to, following a united paradigm under which everything is a data processing algorithm, which suggests he has not met very many scientists. Our paradigms remain quite varied, thank you; and if they do all have certain features in common, it’s mainly things like rationality, naturalism and empiricism that are more or less inherent to science. It’s not even the case that all cognitive scientists believe in materialism (though it probably should be); there are still dualists out there.

Moreover, when it comes to values, most scientists believe in liberalism. This is especially true if we use Harari’s broad sense (on which mainline conservatives and libertarians are ‘liberal’ because they believe in liberty and human rights), but even in the narrow sense of center-left. We are by no means converging on a paradigm where human life has no value because it’s all just data processing; maybe some scientists believe that, but definitely not most of us. If scientists ran the world, I can’t promise everything would be better, but I can tell you that Bush and Trump would never have been elected and we’d have a much better climate policy in place by now.

I do share many of Harari’s fears of the rise of artificial intelligence. The world is clearly not ready for the massive economic disruption that AI is going to cause all too soon. We still define a person’s worth by their employment, and think of ourselves primarily as collection of skills; but AI is going to make many of those skills obsolete, and may make many of us unemployable. It would behoove us to think in advance about who we truly are and what we truly want before that day comes. I used to think that creative intellectual professions would be relatively secure; ChatGPT and Midjourney changed my mind. Even writers and artists may not be safe much longer.

Harari is so good at sympathetically explaining other views he takes it to a fault. At times it is actually difficult to know whether he himself believes something and wants you to, or if he is just steelmanning someone else’s worldview. There’s a whole section on ‘evolutionary humanism’ where he details a worldview that is at best Nietschean and at worst Nazi, but he makes it sound so seductive. I don’t think it’s what he believes, in part because he has similarly good things to say about liberalism and socialism—but it’s honestly hard to tell.

The weakest part of the book is when Harari talks about free will. Like most people, he just doesn’t get compatibilism. He spends a whole chapter talking about how science ‘proves we have no free will’, and it’s just the same old tired arguments hard determinists have always made.

He talks about how we can make choices based on our desires, but we can’t choose our desires; well of course we can’t! What would that even mean? If you could choose your desires, what would you choose them based on, if not your desires? Your desire-desires? Well, then, can you choose your desire-desires? What about your desire-desire-desires?

What even is this ultimate uncaused freedom that libertarian free will is supposed to consist in? No one seems capable of even defining it. (I’d say Kant got the closest: He defined it as the capacity to act based upon what ought rather than what is. But of course what we believe about ‘ought’ is fundamentally stored in our brains as a particular state, a way things are—so in the end, it’s an ‘is’ we act on after all.)

Maybe before you lament that something doesn’t exist, you should at least be able to describe that thing as a coherent concept? Woe is me, that 2 plus 2 is not equal to 5!

It is true that as our technology advances, manipulating other people’s desires will become more and more feasible. Harari overstates the case on so-called robo-rats; they aren’t really mind-controlled, it’s more like they are rewarded and punished. The rat chooses to go left because she knows you’ll make her feel good if she does; she’s still freely choosing to go left. (Dangling a carrot in front of a horse is fundamentally the same thing—and frankly, paying a wage isn’t all that different.) The day may yet come where stronger forms of control become feasible, and woe betide us when it does. Yet this is no threat to the concept of free will; we already knew that coercion was possible, and mind control is simply a more precise form of coercion.

Harari reports on a lot of interesting findings in neuroscience, which are important for people to know about, but they do not actually show that free will is an illusion. What they do show is that free will is thornier than most people imagine. Our desires are not fully unified; we are often ‘of two minds’ in a surprisingly literal sense. We are often tempted by things we know are wrong. We often aren’t sure what we really want. Every individual is in fact quite divisible; we literally contain multitudes.

We do need a richer account of moral responsibility that can deal with the fact that human beings often feel multiple conflicting desires simultaneously, and often experience events differently than we later go on to remember them. But at the end of the day, human consciousness is mostly unified, our choices are mostly rational, and our basic account of moral responsibility is mostly valid.

I think for now we should perhaps be less worried about what may come in the distant future, what sort of godlike powers our descendants may have—and more worried about what we are doing with the godlike powers we already have. We have the power to feed the world; why aren’t we? We have the power to save millions from disease; why don’t we? I don’t see many people blindly following this ‘Dataism’, but I do see an awful lot blinding following a 19th-century vision of capitalism.

And perhaps if we straighten ourselves out, the future will be in better hands.

Selectivity is a terrible measure of quality

May 23 JDN 2459358

How do we decide which universities and research journals are the best? There are a vast number of ways we could go about this—and there are in fact many different ranking systems out there, though only a handful are widely used. But one primary criterion which seems to be among the most frequently used is selectivity.

Selectivity is a very simple measure: What proportion of people who try to get in, actually get in? For universities this is admission rates for applicants; for journals it is acceptance rates for submitted papers.

The top-rated journals in economics have acceptance rates of 1-7%. The most prestigious universities have acceptance rates of 4-10%. So a reasonable ballpark is to assume a 95% chance of not getting accepted in either case. Of course, some applicants are more or less qualified, and some papers are more or less publishable; but my guess is that most applicants are qualified and most submitted papers are publishable. So these low acceptance rates mean refusing huge numbers of qualified people.


Selectivity is an objective, numeric score that can be easily generated and compared, and is relatively difficult to fake. This may accouunt for its widespread appeal. And it surely has some correlation with genuine quality: Lots of people are likely to apply to a school because it is good, and lots of people are likely to submit to a journal because it is good.

But look a little bit closer, and it becomes clear that selectivity is really a terrible measure of quality.


One, it is extremely self-fulfilling. Once a school or a journal becomes prestigious, more people will try to get in there, and that will inflate its selectivity rating. Harvard is extremely selective because Harvard is famous and high-rated. Why is Harvard so high-rated? Well, in part because Harvard is extremely selective.

Two, it incentivizes restricting the number of applicants accepted.

Ivy League schools have vast endowments, and could easily afford to expand their capacity, thus employing more faculty and educating more students. But that would require reducing their acceptance rates and hence jeopardizing their precious selectivity ratings. If the goal is to give as many people as possible the highest quality education, then selectivity is a deeply perverse incentive: It specifically incentivizes not educating too many students.

Similarly, most journals include something in their rejection letters about “limited space”, which in the age of all-digital journals is utter nonsense. Journals could choose to publish ten, twenty, fifty times as many papers as they currently do—or half, or a tenth. They could publish everything that gets submitted, or only publish one paper a year. It’s an entirely arbitrary decision with no real constraints. They choose what proportion of papers to publish entirely based primarily on three factors that have absolutely nothing to do with limited space: One, they want to publish enough papers to make it seem like they are putting out regular content; two, they want to make sure they publish anything that will turn out to be a major discovery (though they honestly seem systematically bad at predicting that); and three, they want to publish as few papers as possible within those constraints to maximize their selectivity.

To be clear, I’m not saying that journals should publish everything that gets submitted. Actually I think too many papers already get published—indeed, too many get written. The incentives in academia are to publish as many papers in top journals as possible, rather than to actually do the most rigorous and ground-breaking research. The best research often involves spending long periods of time making very little visible progress, and it does not lend itself to putting out regular publications to impress tenure committees and grant agencies.

The number of scientific papers published each year has grown at about 5% per year since 1900. The number of peer-reviewed journals has grown at an increasing rate, from about 3% per year for most of the 20th century to over 6% now. These are far in excess of population growth, technological advancement, or even GDP growth; this many scientific papers is obviously unsustainable. There are now 300 times as many scientific papers published per year as there were in 1900—while the world population has only increased by about 5-fold during that time. Yes, the number of scientists has also increased—but not that fast. About 8 million people are scientists, publishing an average of 2 million articles per year—one per scientist every four years. But the number of scientist jobs grows at just over 1%—basically tracking population growth or the job market in general. If papers published continue to grow at 5% while the number of scientists increases at 1%, then in 100 years each scientist will have to publish 48 times as many papers as today, or about 1 every month.


So the problem with research journals isn’t so much that journals aren’t accepting enough papers, as that too many people are submitting papers. Of course the real problem is that universities have outsourced their hiring decisions to journal editors. Rather than actually evaluating whether someone is a good teacher or a good researcher (or accepting that they can’t and hiring randomly), universities have trusted in the arbitrary decisions of research journals to decide whom they should hire.

But selectivity as a measure of quality means that journals have no reason not to support this system; they get their prestige precisely from the fact that scientists are so pressured to publish papers. The more papers get submitted, the better the journals look for rejecting them.

Another way of looking at all this is to think about what the process of acceptance or rejection entails. It is inherently a process of asymmetric information.

If we had perfect information, what would the acceptance rate of any school or journal be? 100%, regardless of quality. Only the applicants who knew they would get accepted would apply. So the total number of admitted students and accepted papers would be exactly the same, but all the acceptance rates would rise to 100%.

Perhaps that’s not realistic; but what if the application criteria were stricter? For instance, instead of asking you your GPA and SAT score, Harvard’s form could simply say: “Anyone with a GPA less than 4.0 or an SAT score less than 1500 need not apply.” That’s practically true anyway. But Harvard doesn’t have an incentive to say it out loud, because then applicants who know they can’t meet that standard won’t bother applying, and Harvard’s precious selectivity number will go down. (These are far from sufficient, by the way; I was valedictorian and had a 1590 on my SAT and still didn’t get in.)

There are other criteria they’d probably be even less willing to emphasize, but are no less significant: “If your family income is $20,000 or less, there is a 95% chance we won’t accept you.” “Other things equal, your odds of getting in are much better if you’re Black than if you’re Asian.”

For journals it might be more difficult to express the criteria clearly, but they could certainly do more than they do. Journals could more strictly delineate what kind of papers they publish: This one only for pure theory, that one only for empirical data, this one only for experimental results. They could choose more specific content niches rather than literally dozens of journals all being ostensibly about “economics in general” (the American Economic Review, the Quarterly Journal of Economics, the Journal of Political Economy, the Review of Economic Studies, the European Economic Review, the International Economic Review, Economic Inquiry… these are just the most prestigious). No doubt there would still have to be some sort of submission process and some rejections—but if they really wanted to reduce the number of submissions they could easily do so. The fact is, they want to have a large number of submissions that they can reject.

What this means is that rather than being a measure of quality, selectivity is primarily a measure of opaque criteria. It’s possible to imagine a world where nearly every school and every journal accept less than 1% of applicants; this would occur if the criteria for acceptance were simply utterly unknown and everyone had to try hundreds of places before getting accepted.


Indeed, that’s not too dissimilar to how things currently work in the job market or the fiction publishing market. The average job opening receives a staggering 250 applications. In a given year, a typical literary agent receives 5000 submissions and accepts 10 clients—so about one in every 500.

For fiction writing I find this somewhat forgivable, if regrettable; the quality of a novel is a very difficult thing to assess, and to a large degree inherently subjective. I honestly have no idea what sort of submission guidelines one could put on an agency page to explain to authors what distinguishes a good novel from a bad one (or, not quite the same thing, a successful one from an unsuccessful one).

Indeed, it’s all the worse because a substantial proportion of authors don’t even follow the guidelines that they do include! The most common complaint I hear from agents and editors at writing conferences is authors not following their submission guidelines—such basic problems as submitting content from the wrong genre, not formatting it correctly, having really egregious grammatical errors. Quite frankly I wish they’d shut up about it, because I wanted to hear what would actually improve my chances of getting published, not listen to them rant about the thousands of people who can’t bother to follow directions. (And I’m pretty sure that those people aren’t likely to go to writing conferences and listen to agents give panel discussions.)

But for the job market? It’s really not that hard to tell who is qualified for most jobs. If it isn’t something highly specialized, most people could probably do it, perhaps with a bit of training. If it is something highly specialized, you can restrict your search to people who already have the relevant education or training. In any case, having experience in that industry is obviously a plus. Beyond that, it gets much harder to assess quality—but also much less necessary. Basically anyone with an advanced degree in the relevant subject or a few years of experience at that job will probably do fine, and you’re wasting effort by trying to narrow the field further. If it is very hard to tell which candidate is better, that usually means that the candidates really aren’t that different.

To my knowledge, not a lot of employers or fiction publishers pride themselves on their selectivity. Indeed, many fiction publishers have a policy of simply refusing unsolicited submissions, relying upon literary agents to pre-filter their submissions for them. (Indeed, even many agents refuse unsolicited submissions—which raises the question: What is a debut author supposed to do?) This is good, for if they did—if Penguin Random House (or whatever that ludicrous all-absorbing conglomerate is calling itself these days; ah, what was it like in that bygone era, when anti-trust enforcement was actually a thing?) decided to start priding itself on its selectivity of 0.05% or whatever—then the already massively congested fiction industry would probably grind to a complete halt.

This means that by ranking schools and journals based on their selectivity, we are partly incentivizing quality, but mostly incentivizing opacity. The primary incentive is for them to attract as many applicants as possible, even knowing full well that they will reject most of these applicants. They don’t want to be too clear about what they will accept or reject, because that might discourage unqualified applicants from trying and thus reduce their selectivity rate. In terms of overall welfare, every rejected application is wasted human effort—but in terms of the institution’s selectivity rating, it’s a point in their favor.

Fake skepticism

Jun 3 JDN 2458273

“You trust the mainstream media?” “Wake up, sheeple!” “Don’t listen to what so-called scientists say; do your own research!”

These kinds of statements have become quite ubiquitous lately (though perhaps the attitudes were always there, and we only began to hear them because of the Internet and social media), and are often used to defend the most extreme and bizarre conspiracy theories, from moon-landing denial to flat Earth. The amazing thing about these kinds of statements is that they can be used to defend literally anything, as long as you can find some source with less than 100% credibility that disagrees with it. (And what source has 100% credibility?)

And that, I think, should tell you something. An argument that can prove anything is an argument that proves nothing.

Reversed stupidity is not intelligence. The fact that the mainstream media, or the government, or the pharmaceutical industry, or the oil industry, or even gangsters, fanatics, or terrorists believes something does not make it less likely to be true.

In fact, the vast majority of beliefs held by basically everyone—including the most fanatical extremists—are true. I could list such consensus true beliefs for hours: “The sky is blue.” “2+2=4.” “Ice is colder than fire.”

Even if a belief is characteristic of a specifically evil or corrupt organization, that does not necessarily make it false (though it usually is evidence of falsehood in a Bayesian sense). If only terrible people belief X, then maybe you shouldn’t believe X. But if both good and bad people believe X, the fact that bad people believe X really shouldn’t matter to you.

People who use this kind of argument often present themselves as being “skeptics”. They imagine that they have seen through the veil of deception that blinds others.

In fact, quite the opposite is the case: This is fake skepticism. These people are not uniquely skeptical; they are uniquely credulous. If you think the Earth is flat because you don’t trust the mainstream scientific community, that means you do trust someone far less credible than the mainstream scientific community.

Real skepticism is difficult. It requires concerted effort and investigation, and typically takes years. To really seriously challenge the expert consensus in a field, you need to become an expert in that field. Ideally, you should get a graduate degree in that field and actually start publishing your heterodox views. Failing that, you should at least be spending hundreds or thousands of hours doing independent research. If you are unwilling or unable to do that, you are not qualified to assess the validity of the expert consensus.

This does not mean the expert consensus is always right—remarkably often, it isn’t. But it means you aren’t allowed to say it’s wrong, because you don’t know enough to assess that.

This is not elitism. This is not an argument from authority. This is a basic respect for the effort and knowledge that experts spend their lives acquiring.

People don’t like being told that they are not as smart as other people—even though, with any variation at all, that’s got to be true for a certain proportion of people. But I’m not even saying experts are smarter than you. I’m saying they know more about their particular field of expertise.

Do you walk up to construction workers on the street and critique how they lay concrete? When you step on an airplane, do you explain to the captain how to read an altimeter? When you hire a plumber, do you insist on using the snake yourself?

Probably not. And why not? Because you know these people have training; they do this for a living. Yeah, well, scientists do this for a living too—and our training is much longer. To be a plumber, you need a high school diploma and an apprenticeship that usually lasts about four years. To be a scientist, you need a PhD, which means four years of college plus an additional five or six years of graduate school.

To be clear, I’m not saying you should listen to experts speaking outside their expertise. Some of the most idiotic, arrogant things ever said by human beings have been said by physicists opining on biology or economists ranting about politics. Even within a field, some people have such narrow expertise that you can’t really trust them even on things that seem related—like macroeconomists with idiotic views on trade, or ecologists who clearly don’t understand evolution.

This is also why one of the great challenges of being a good interdisciplinary scientist is actually obtaining enough expertise in both fields you’re working in; it isn’t literally twice the work (since there is overlap—or you wouldn’t be doing it—and you do specialize in particular interdisciplinary subfields), but it’s definitely more work, and there are definitely a lot of people on each side of the fence who may never take you seriously no matter what you do.

How do you tell who to trust? This is why I keep coming back to the matter of expert consensus. The world is much too complicated for anyone, much less everyone, to understand it all. We must be willing to trust the work of others. The best way we have found to decide which work is trustworthy is by the norms and institutions of the scientific community itself. Since 97% of climatologists say that climate change is caused by humans, they’re probably right. Since 99% of biologists believe humans evolved by natural selection, that’s probably what happened. Since 87% of economists oppose tariffs, tariffs probably aren’t a good idea.

Can we be certain that the consensus is right? No. There is precious little in this universe that we can be certain about. But as in any game of chance, you need to play the best odds, and my money will always be on the scientific consensus.

Social construction is not fact—and it is not fiction

July 30, JDN 2457965

With the possible exception of politically-charged issues (especially lately in the US), most people are fairly good at distinguishing between true and false, fact and fiction. But there are certain types of ideas that can’t be neatly categorized into fact versus fiction.

First, there are subjective feelings. You can feel angry, or afraid, or sad—and really, truly feel that way—despite having no objective basis for the emotion coming from the external world. Such emotions are usually irrational, but even knowing that doesn’t make them automatically disappear. Distinguishing subjective feelings from objective facts is simple in principle, but often difficult in practice: A great many things simply “feel true” despite being utterly false. (Ask an average American which is more likely to kill them, a terrorist or the car in their garage; I bet quite a few will get the wrong answer. Indeed, if you ask them whether they’re more likely to be shot by someone else or to shoot themselves, almost literally every gun owner is going to get that answer wrong—or they wouldn’t be gun owners.)

The one I really want to focus on today is social constructions. This is a term that has been so thoroughly overused and abused by postmodernist academics (“science is a social construction”, “love is a social construction”, “math is a social construction”, “sex is a social construction”, etc.) that it has almost lost its meaning. Indeed, many people now react with automatic aversion to the term; upon hearing it, they immediately assume—understandably—that whatever is about to follow is nonsense.

But there is actually a very important core meaning to the term “social construction” that we stand to lose if we throw it away entirely. A social construction is something that exists only because we all believe in it.

Every part of that definition is important:

First, a social construction is something that exists: It’s really there, objectively. If you think it doesn’t exist, you’re wrong. It even has objective properties; you can be right or wrong in your beliefs about it, even once you agree that it exists.

Second, a social construction only exists because we all believe in it: If everyone in the world suddenly stopped believing in it, like Tinker Bell it would wink out of existence. The “we all” is important as well; a social construction doesn’t exist simply because one person, or a few people, believe in it—it requires a certain critical mass of society to believe in it. Of course, almost nothing is literally believed by everyone, so it’s more that a social construction exists insofar as people believe in it—and thus can attain a weaker or stronger kind of existence as beliefs change.

The combination of these two features makes social constructions a very weird sort of entity. They aren’t merely subjective beliefs; you can’t be wrong about what you are feeling right now (though you can certainly lie about it), but you can definitely be wrong about the social constructions of your society. But we can’t all be wrong about the social constructions of our society; once enough of our society stops believing in them, they will no longer exist. And when we have conflict over a social construction, its existence can become weaker or stronger—indeed, it can exist to some of us but not to others.

If all this sounds very bizarre and reminds you of postmodernist nonsense that might come from the Wisdom of Chopra randomizer, allow me to provide a concrete and indisputable example of a social construction that is vitally important to economics: Money.

The US dollar is a social construction. It has all sorts of well-defined objective properties, from its purchasing power in the market to its exchange rate with other currencies (also all social constructions). The markets in which it is spent are social constructions. The laws which regulate those markets are social constructions. The government which makes those laws is a social construction.

But it is not social constructions all the way down. The paper upon which the dollar was printed is a physical object with objective factual existence. It is an artifact—it was made by humans, and wouldn’t exist if we didn’t—but now that we’ve made it, it exists and would continue to exist regardless of whether we believe in it or even whether we continue to exist. The cotton from which it was made is also partly artificial, bred over centuries from a lifeform that evolved over millions of years. But the carbon atoms inside that cotton were made in a star, and that star existed and fused its carbon billions of years before any life on Earth existed, much less humans in particular. This is why the statements “math is a social construction” and “science is a social construction” are so ridiculous. Okay, sure, the institutions of science and mathematics are social constructions, but that’s trivial; nobody would dispute that, and it’s not terribly interesting. (What, you mean if everyone stopped going to MIT, there would be no MIT!?) The truths of science and mathematics were true long before we were even here—indeed, the fundamental truths of mathematics could not have failed to be true in any possible universe.

But the US dollar did not exist before human beings created it, and unlike the physical paper, the purchasing power of that dollar (which is, after all, mainly what we care about) is entirely socially constructed. If everyone in the world suddenly stopped accepting US dollars as money, the US dollar would cease to be money. If even a few million people in the US suddenly stopped accepting dollars, its value would become much more precarious, and inflation would be sure to follow.

Nor is this simply because the US dollar is a fiat currency. That makes it more obvious, to be sure; a fiat currency attains its value solely through social construction, as its physical object has negligible value. But even when we were on the gold standard, our currency was representative; the paper itself was still equally worthless. If you wanted gold, you’d have to exchange for it; and that process of exchange is entirely social construction.

And what about gold coins, one of the oldest form of money? There now the physical object might actually be useful for something, but not all that much. It’s shiny, you can make jewelry out of it, it doesn’t corrode, it can be used to replace lost teeth, it has anti-inflammatory properties—and millennia later we found out that its dense nucleus is useful for particle accelerator experiments and it is a very reliable electrical conductor useful for making microchips. But all in all, gold is really not that useful. If gold were priced based on its true usefulness, it would be extraordinarily cheap; cheaper than water, for sure, as it’s much less useful than water. Yet very few cultures have ever used water as currency (though some have used salt). Thus, most of the value of gold is itself socially constructed; you value gold not to use it, but to impress other people with the fact that you own it (or indeed to sell it to them). Stranded alone on a desert island, you’d do anything for fresh water, but gold means nothing to you. And a gold coin actually takes on additional socially-constructed value; gold coins almost always had seignorage, additional value the government received from minting them over and above the market price of the gold itself.

Economics, in fact, is largely about social constructions; or rather I should say it’s about the process of producing and distributing artifacts by means of social constructions. Artifacts like houses, cars, computers, and toasters; social constructions like money, bonds, deeds, policies, rights, corporations, and governments. Of course, there are also services, which are not quite artifacts since they stop existing when we stop doing them—though, crucially, not when we stop believing in them; your waiter still delivered your lunch even if you persist in the delusion that the lunch is not there. And there are natural resources, which existed before us (and may or may not exist after us). But these are corner cases; mostly economics is about using laws and money to distribute goods, which means using social constructions to distribute artifacts.

Other very important social constructions include race and gender. Not melanin and sex, mind you; human beings have real, biological variation in skin tone and body shape. But the concept of a race—especially the race categories we ordinarily use—is socially constructed. Nothing biological forced us to regard Kenyan and Burkinabe as the same “race” while Ainu and Navajo are different “races”; indeed, the genetic data is screaming at us in the opposite direction. Humans are sexually dimorphic, with some rare exceptions (only about 0.02% of people are intersex; about 0.3% are transgender; and no more than 5% have sex chromosome abnormalities). But the much thicker concept of gender that comes with a whole system of norms and attitudes is all socially constructed.

It’s one thing to say that perhaps males are, on average, more genetically predisposed to be systematizers than females, and thus men are more attracted to engineering and women to nursing. That could, in fact, be true, though the evidence remains quite weak. It’s quite another to say that women must not be engineers, even if they want to be, and men must not be nurses—yet the latter was, until very recently, the quite explicit and enforced norm. Standards of clothing are even more obviously socially-constructed; in Western cultures (except the Celts, for some reason), flared garments are “dresses” and hence “feminine”; in East Asian cultures, flared garments such as kimono are gender-neutral, and gender is instead expressed through clothing by subtler aspects such as being fastened on the left instead of the right. In a thousand different ways, we mark our gender by what we wear, how we speak, even how we walk—and what’s more, we enforce those gender markings. It’s not simply that males typically speak in lower pitches (which does actually have a biological basis); it’s that males who speak in higher pitches are seen as less of a man, and that is a bad thing. We have a very strict hierarchy, which is imposed in almost every culture: It is best to be a man, worse to be a woman who acts like a woman, worse still to be a woman who acts like a man, and worst of all to be a man who acts like a woman. What it means to “act like a man” or “act like a woman” varies substantially; but the core hierarchy persists.

Social constructions like these ones are in fact some of the most important things in our lives. Human beings are uniquely social animals, and we define our meaning and purpose in life largely through social constructions.

It can be tempting, therefore, to be cynical about this, and say that our lives are built around what is not real—that is, fiction. But while this may be true for religious fanatics who honestly believe that some supernatural being will reward them for their acts of devotion, it is not a fair or accurate description of someone who makes comparable sacrifices for “the United States” or “free speech” or “liberty”. These are social constructions, not fictions. They really do exist. Indeed, it is only because we are willing to make sacrifices to maintain them that they continue to exist. Free speech isn’t maintained by us saying things we want to say; it is maintained by us allowing other people to say things we don’t want to hear. Liberty is not protected by us doing whatever we feel like, but by not doing things we would be tempted to do that impose upon other people’s freedom. If in our cynicism we act as though these things are fictions, they may soon become so.

But it would be a lot easier to get this across to people, I think, if folks would stop saying idiotic things like “science is a social construction”.

Argumentum ab scientia is not argumentum baculo: The difference between authority and expertise

May 7, JDN 2457881

Americans are, on the whole, suspicious of authority. This is a very good thing; it shields us against authoritarianism. But it comes with a major downside, which is a tendency to forget the distinction between authority and expertise.

Argument from authority is an informal fallacy, argumentum baculo. The fact that something was said by the Pope, or the President, or the General Secretary of the UN, doesn’t make it true. (Aside: You’re probably more familiar with the phrase argumentum ad baculum, which is terrible Latin. That would mean “argument toward a stick”, when clearly the intended meaning was “argument by means of a stick”, which is argumentum baculo.)

But argument from expertise, argumentum ab scientia, is something quite different. The world is much too complicated for any one person to know everything about everything, so we have no choice but to specialize our knowledge, each of us becoming an expert in only a few things. So if you are not an expert in a subject, when someone who is an expert in that subject tells you something about that subject, you should probably believe them.

You should especially be prepared to believe them when the entire community of experts is in consensus or near-consensus on a topic. The scientific consensus on climate change is absolutely overwhelming. Is this a reason to believe in climate change? You’re damn right it is. Unless you have years of education and experience in understanding climate models and atmospheric data, you have no basis for challenging the expert consensus on this issue.

This confusion has created a deep current of anti-intellectualism in our culture, as Isaac Asimov famously recognized:

There is a cult of ignorance in the United States, and there always has been. The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that “my ignorance is just as good as your knowledge.”

This is also important to understand if you have heterodox views on any scientific topic. The fact that the whole field disagrees with you does not prove that you are wrong—but it does make it quite likely that you are wrong. Cranks often want to compare themselves to Galileo or Einstein, but here’s the thing: Galileo and Einstein didn’t act like cranks. They didn’t expect the scientific community to respect their ideas before they had gathered compelling evidence in their favor.

When behavioral economists found that neoclassical models of human behavior didn’t stand up to scrutiny, did they shout from the rooftops that economics is all a lie? No, they published their research in peer-reviewed journals, and talked with economists about the implications of their results. There may have been times when they felt ignored or disrespected by the mainstream, but they pressed on, because the data was on their side. And ultimately, the mainstream gave in: Daniel Kahneman won the Nobel Prize in Economics.

Experts are not always right, that is true. But they are usually right, and if you think they are wrong you’d better have a good reason to think so. The best reasons are the sort that come about when you yourself have spent the time and effort to become an expert, able to challenge the consensus on its own terms.

Admittedly, that is a very difficult thing to do—and more difficult than it should be. I have seen firsthand how difficult and painful the slow grind toward a PhD can be, and how many obstacles will get thrown in your way, ranging from nepotism and interdepartmental politics, to discrimination against women and minorities, to mismatches of interest between students and faculty, all the way to illness, mental health problems, and the slings and arrows of outrageous fortune in general. If you have particularly heterodox ideas, you may face particularly harsh barriers, and sometimes it behooves you to hold your tongue and toe the lie awhile.

But this is no excuse not to gain expertise. Even if academia itself is not available to you, we live in an age of unprecedented availability of information—it’s not called the Information Age for nothing. A sufficiently talented and dedicated autodidact can challenge the mainstream, if their ideas are truly good enough. (Perhaps the best example of this is the mathematician savant Srinivasa Ramanujan. But he’s… something else. I think he is about as far from the average genius as the average genius is from the average person.) No, that won’t be easy either. But if you are really serious about advancing human understanding rather than just rooting for your political team (read: tribe), you should be prepared to either take up the academic route or attack it as an autodidact from the outside.

In fact, most scientific fields are actually quite good about admitting what they don’t know. A total consensus that turns out to be wrong is actually a very rare phenomenon; much more common is a clash of multiple competing paradigms where one ultimately wins out, or they end up replaced by a totally new paradigm or some sort of synthesis. In almost all cases, the new paradigm wins not because it becomes fashionable or the ancien regime dies out (as Planck cynically claimed) but because overwhelming evidence is observed in its favor, often in the form of explaining some phenomenon that was previously impossible to understand. If your heterodox theory doesn’t do that, then it probably won’t win, because it doesn’t deserve to.

(Right now you might think of challenging me: Does my heterodox theory do that? Does the tribal paradigm explain things that either total selfishness or total altruism cannot? I think it’s pretty obvious that it does. I mean, you are familiar with a little thing called “racism”, aren’t you? There is no explanation for racism in neoclassical economics; to understand it at all you have to just impose it as an arbitrary term on the utility function. But at that point, why not throw in whatever you please? Maybe some people enjoy bashing their heads against walls, and other people take great pleasure in the taste of arsenic. Why would this particular self- (not to mention other-) destroying behavior be universal to all human societies?)

In practice, I think most people who challenge the mainstream consensus aren’t genuinely interested in finding out the truth—certainly not enough to actually go through the work of doing it. It’s a pattern you can see in a wide range of fringe views: Anti-vaxxers, 9/11 truthers, climate denialists, they all think the same way. The mainstream disagrees with my preconceived ideology, therefore the mainstream is some kind of global conspiracy to deceive us. The overwhelming evidence that vaccination is safe and (wildly) cost-effective, 9/11 was indeed perpetrated by Al Qaeda and neither planned nor anticipated by anyone in the US government , and the global climate is being changed by human greenhouse gas emissions—these things simply don’t matter to them, because it was never really about the truth. They knew the answer before they asked the question. Because their identity is wrapped up in that political ideology, they know it couldn’t possibly be otherwise, and no amount of evidence will change their mind.

How do we reach such people? That, I don’t know. I wish I did. But I can say this much: We can stop taking them seriously when they say that the overwhelming scientific consensus against them is just another “appeal to authority”. It’s not. It never was. It’s an argument from expertise—there are people who know this a lot better than you, and they think you’re wrong, so you’re probably wrong.

What good are macroeconomic models? How could they be better?

Dec 11, JDN 2457734

One thing that I don’t think most people know, but which immediately obvious to any student of economics at the college level or above, is that there is a veritable cornucopia of different macroeconomic models. There are growth models (the Solow model, the Harrod-Domar model, the Ramsey model), monetary policy models (IS-LM, aggregate demand-aggregate supply), trade models (the Mundell-Fleming model, the Heckscher-Ohlin model), large-scale computational models (dynamic stochastic general equilibrium, agent-based computational economics), and I could go on.

This immediately raises the question: What are all these models for? What good are they?

A cynical view might be that they aren’t useful at all, that this is all false mathematical precision which makes economics persuasive without making it accurate or useful. And with such a proliferation of models and contradictory conclusions, I can see why such a view would be tempting.

But many of these models are useful, at least in certain circumstances. They aren’t completely arbitrary. Indeed, one of the litmus tests of the last decade has been how well the models held up against the events of the Great Recession and following Second Depression. The Keynesian and cognitive/behavioral models did rather well, albeit with significant gaps and flaws. The Monetarist, Real Business Cycle, and most other neoclassical models failed miserably, as did Austrian and Marxist notions so fluid and ill-defined that I’m not sure they deserve to even be called “models”. So there is at least some empirical basis for deciding what assumptions we should be willing to use in our models. Yet even if we restrict ourselves to Keynesian and cognitive/behavioral models, there are still a great many to choose from, which often yield inconsistent results.

So let’s compare with a science that is uncontroversially successful: Physics. How do mathematical models in physics compare with mathematical models in economics?

Well, there are still a lot of models, first of all. There’s the Bohr model, the Schrodinger equation, the Dirac equation, Newtonian mechanics, Lagrangian mechanics, Bohmian mechanics, Maxwell’s equations, Faraday’s law, Coulomb’s law, the Einstein field equations, the Minkowsky metric, the Schwarzschild metric, the Rindler metric, Feynman-Wheeler theory, the Navier-Stokes equations, and so on. So a cornucopia of models is not inherently a bad thing.

Yet, there is something about physics models that makes them more reliable than economics models.

Partly it is that the systems physicists study are literally two dozen orders of magnitude or more smaller and simpler than the systems economists study. Their task is inherently easier than ours.

But it’s not just that; their models aren’t just simpler—actually they often aren’t. The Navier-Stokes equations are a lot more complicated than the Solow model. They’re also clearly a lot more accurate.

The feature that models in physics seem to have that models in economics do not is something we might call nesting, or maybe consistency. Models in physics don’t come out of nowhere; you can’t just make up your own new model based on whatever assumptions you like and then start using it—which you very much can do in economics. Models in physics are required to fit consistently with one another, and usually inside one another, in the following sense:

The Dirac equation strictly generalizes the Schrodinger equation, which strictly generalizes the Bohr model. Bohmian mechanics is consistent with quantum mechanics, which strictly generalizes Lagrangian mechanics, which generalizes Newtonian mechanics. The Einstein field equations are consistent with Maxwell’s equations and strictly generalize the Minkowsky, Schwarzschild, and Rindler metrics. Maxwell’s equations strictly generalize Faraday’s law and Coulomb’s law.
In other words, there are a small number of canonical models—the Dirac equation, Maxwell’s equations and the Einstein field equation, essentially—inside which all other models are nested. The simpler models like Coulomb’s law and Newtonian mechanics are not contradictory with these canonical models; they are contained within them, subject to certain constraints (such as macroscopic systems far below the speed of light).

This is something I wish more people understood (I blame Kuhn for confusing everyone about what paradigm shifts really entail); Einstein did not overturn Newton’s laws, he extended them to domains where they previously had failed to apply.

This is why it is sensible to say that certain theories in physics are true; they are the canonical models that underlie all known phenomena. Other models can be useful, but not because we are relativists about truth or anything like that; Newtonian physics is a very good approximation of the Einstein field equations at the scale of many phenomena we care about, and is also much more mathematically tractable. If we ever find ourselves in situations where Newton’s equations no longer apply—near a black hole, traveling near the speed of light—then we know we can fall back on the more complex canonical model; but when the simpler model works, there’s no reason not to use it.

There are still very serious gaps in the knowledge of physics; in particular, there is a fundamental gulf between quantum mechanics and the Einstein field equations that has been unresolved for decades. A solution to this “quantum gravity problem” would be essentially a guaranteed Nobel Prize. So even a canonical model can be flawed, and can be extended or improved upon; the result is then a new canonical model which we now regard as our best approximation to truth.

Yet the contrast with economics is still quite clear. We don’t have one or two or even ten canonical models to refer back to. We can’t say that the Solow model is an approximation of some greater canonical model that works for these purposes—because we don’t have that greater canonical model. We can’t say that agent-based computational economics is approximately right, because we have nothing to approximate it to.

I went into economics thinking that neoclassical economics needed a new paradigm. I have now realized something much more alarming: Neoclassical economics doesn’t really have a paradigm. Or if it does, it’s a very informal paradigm, one that is expressed by the arbitrary judgments of journal editors, not one that can be written down as a series of equations. We assume perfect rationality, except when we don’t. We assume constant returns to scale, except when that doesn’t work. We assume perfect competition, except when that doesn’t get the results we wanted. The agents in our models are infinite identical psychopaths, and they are exactly as rational as needed for the conclusion I want.

This is quite likely why there is so much disagreement within economics. When you can permute the parameters however you like with no regard to a canonical model, you can more or less draw whatever conclusion you want, especially if you aren’t tightly bound to empirical evidence. I know a great many economists who are sure that raising minimum wage results in large disemployment effects, because the models they believe in say that it must, even though the empirical evidence has been quite clear that these effects are small if they are present at all. If we had a canonical model of employment that we could calibrate to the empirical evidence, that couldn’t happen anymore; there would be a coefficient I could point to that would refute their argument. But when every new paper comes with a new model, there’s no way to do that; one set of assumptions is as good as another.

Indeed, as I mentioned in an earlier post, a remarkable number of economists seem to embrace this relativism. “There is no true model.” they say; “We do what is useful.” Recently I encountered a book by the eminent economist Deirdre McCloskey which, though I confess I haven’t read it in its entirety, appears to be trying to argue that economics is just a meaningless language game that doesn’t have or need to have any connection with actual reality. (If any of you have read it and think I’m misunderstanding it, please explain. As it is I haven’t bought it for a reason any economist should respect: I am disinclined to incentivize such writing.)

Creating such a canonical model would no doubt be extremely difficult. Indeed, it is a task that would require the combined efforts of hundreds of researchers and could take generations to achieve. The true equations that underlie the economy could be totally intractable even for our best computers. But quantum mechanics wasn’t built in a day, either. The key challenge here lies in convincing economists that this is something worth doing—that if we really want to be taken seriously as scientists we need to start acting like them. Scientists believe in truth, and they are trying to find it out. While not immune to tribalism or ideology or other human limitations, they resist them as fiercely as possible, always turning back to the evidence above all else. And in their combined strivings, they attempt to build a grand edifice, a universal theory to stand the test of time—a canonical model.

The replication crisis, and the future of science

Aug 27, JDN 2457628 [Sat]

After settling in a little bit in Irvine, I’m now ready to resume blogging, but for now it will be on a reduced schedule. I’ll release a new post every Saturday, at least for the time being.

Today’s post was chosen by Patreon vote, though only one person voted (this whole Patreon voting thing has not been as successful as I’d hoped). It’s about something we scientists really don’t like to talk about, but definitely need to: We are in the middle of a major crisis of scientific replication.

Whenever large studies are conducted attempting to replicate published scientific results, their ability to do so is almost always dismal.

Psychology is the one everyone likes to pick on, because their record is particularly bad. Only 39% of studies were really replicated with the published effect size, though a further 36% were at least qualitatively but not quantitatively similar. Yet economics has its own replication problem, and even medical research is not immune to replication failure.

It’s important not to overstate the crisis; the majority of scientific studies do at least qualitatively replicate. We are doing better than flipping a coin, which is better than one can say of financial forecasters.
There are three kinds of replication, and only one of them should be expected to give near-100% results. That kind is reanalysiswhen you take the same data and use the same methods, you absolutely should get the exact same results. I favor making reanalysis a routine requirement of publication; if we can’t get your results by applying your statistical methods to your data, then your paper needs revision before we can entrust it to publication. A number of papers have failed on reanalysis, which is absurd and embarrassing; the worst offender was probably Rogart-Reinhoff, which was used in public policy decisions around the world despite having spreadsheet errors.

The second kind is direct replication—when you do the exact same experiment again and see if you get the same result within error bounds. This kind of replication should work something like 90% of the time, but in fact works more like 60% of the time.

The third kind is conceptual replication—when you do a similar experiment designed to test the same phenomenon from a different perspective. This kind of replication should work something like 60% of the time, but actually only works about 20% of the time.

Economists are well equipped to understand and solve this crisis, because it’s not actually about science. It’s about incentives. I facepalm every time I see another article by an aggrieved statistician about the “misunderstanding” of p-values; no, scientist aren’t misunderstanding anything. They know damn well how p-values are supposed to work. So why do they keep using them wrong? Because their jobs depend on doing so.

The first key point to understand here is “publish or perish”; academics in an increasingly competitive system are required to publish their research in order to get tenure, and frequently required to get tenure in order to keep their jobs at all. (Or they could become adjuncts, who are paid one-fifth as much.)

The second is the fundamentally defective way our research journals are run (as I have discussed in a previous post). As private for-profit corporations whose primary interest is in raising more revenue, our research journals aren’t trying to publish what will genuinely advance scientific knowledge. They are trying to publish what will draw attention to themselves. It’s a similar flaw to what has arisen in our news media; they aren’t trying to convey the truth, they are trying to get ratings to draw advertisers. This is how you get hours of meaningless fluff about a missing airliner and then a single chyron scroll about a war in Congo or a flood in Indonesia. Research journals haven’t fallen quite so far because they have reputations to uphold in order to attract scientists to read them and publish in them; but still, their fundamental goal is and has always been to raise attention in order to raise revenue.

The best way to do that is to publish things that are interesting. But if a scientific finding is interesting, that means it is surprising. It has to be unexpected or unusual in some way. And above all, it has to be positive; you have to have actually found an effect. Except in very rare circumstances, the null result is never considered interesting. This adds up to making journals publish what is improbable.

In particular, it creates a perfect storm for the abuse of p-values. A p-value, roughly speaking, is the probability you would get the observed result if there were no effect at all—for instance, the probability that you’d observe this wage gap between men and women in your sample if in the real world men and women were paid the exact same wages. The standard heuristic is a p-value of 0.05; indeed, it has become so enshrined that it is almost an explicit condition of publication now. Your result must be less than 5% likely to happen if there is no real difference. But if you will only publish results that show a p-value of 0.05, then the papers that get published and read will only be the ones that found such p-values—which renders the p-values meaningless.

It was never particularly meaningful anyway; as we Bayesians have been trying to explain since time immemorial, it matters how likely your hypothesis was in the first place. For something like wage gaps where we’re reasonably sure, but maybe could be wrong, the p-value is not too unreasonable. But if the theory is almost certainly true (“does gravity fall off as the inverse square of distance?”), even a high p-value like 0.35 is still supportive, while if the theory is almost certainly false (“are human beings capable of precognition?”—actual study), even a tiny p-value like 0.001 is still basically irrelevant. We really should be using much more sophisticated inference techniques, but those are harder to do, and don’t provide the nice simple threshold of “Is it below 0.05?”

But okay, p-values can be useful in many cases—if they are used correctly and you see all the results. If you have effect X with p-values 0.03, 0.07, 0.01, 0.06, and 0.09, effect X is probably a real thing. If you have effect Y with p-values 0.04, 0.02, 0.29, 0.35, and 0.74, effect Y is probably not a real thing. But I’ve just set it up so that these would be published exactly the same. They each have two published papers with “statistically significant” results. The other papers never get published and therefore never get seen, so we throw away vital information. This is called the file drawer problem.

Researchers often have a lot of flexibility in designing their experiments. If their only goal were to find truth, they would use this flexibility to test a variety of scenarios and publish all the results, so they can be compared holistically. But that isn’t their only goal; they also care about keeping their jobs so they can pay rent and feed their families. And under our current system, the only way to ensure that you can do that is by publishing things, which basically means only including the parts that showed up as statistically significant—otherwise, journals aren’t interested. And so we get huge numbers of papers published that tell us basically nothing, because we set up such strong incentives for researchers to give misleading results.

The saddest part is that this could be easily fixed.

First, reduce the incentives to publish by finding other ways to evaluate the skill of academics—like teaching for goodness’ sake. Working papers are another good approach. Journals already get far more submissions than they know what to do with, and most of these papers will never be read by more than a handful of people. We don’t need more published findings, we need better published findings—so stop incentivizing mere publication and start finding ways to incentivize research quality.

Second, eliminate private for-profit research journals. Science should be done by government agencies and nonprofits, not for-profit corporations. (And yes, I would apply this to pharmaceutical companies as well, which should really be pharmaceutical manufacturers who make cheap drugs based off of academic research and carry small profit margins.) Why? Again, it’s all about incentives. Corporations have no reason to want to find truth and every reason to want to tilt it in their favor.

Third, increase the number of tenured faculty positions. Instead of building so many new grand edifices to please your plutocratic donors, use your (skyrocketing) tuition money to hire more professors so that you can teach more students better. You can find even more funds if you cut the salaries of your administrators and football coaches. Come on, universities; you are the one industry in the world where labor demand and labor supply are the same people a few years later. You have no excuse for not having the smoothest market clearing in the world. You should never have gluts or shortages.

Fourth, require pre-registration of research studies (as some branches of medicine already do). If the study is sound, an optimal rational agent shouldn’t care in the slightest whether it had a positive or negative result, and if our ape brains won’t let us think that way, we need to establish institutions to force it to happen. They shouldn’t even see the effect size and p-value before they make the decision to publish it; all they should care about is that the experiment makes sense and the proper procedure was conducted.
If we did all that, the replication crisis could be almost completely resolved, as the incentives would be realigned to more closely match the genuine search for truth.

Alas, I don’t see universities or governments or research journals having the political will to actually make such changes, which is very sad indeed.

The Expanse gets the science right—including the economics

JDN 2457502

Despite constantly working on half a dozen projects at once (literally—preparing to start my PhD, writing this blog, working at my day job, editing a novel, preparing to submit a nonfiction book, writing another nonfiction book with three of my friends as co-authors, and creating a card game—that’s seven actually), I do occasionally find time to do things for fun. One I’ve been doing lately is catching up on The Expanse on DVR (I’m about halfway through the first season so far).

If you’re not familiar with The Expanse, it has been fairly aptly described as Battlestar Galactica meets Game of Thrones, though I think that particular comparison misrepresents the tone and attitudes of the series, because both BG and GoT are so dark and cynical (“It’s a nice day… for a… red wedding!”). I think “Star Trek meets Game of Thrones” might be better actually—the extreme idealism of Star Trek would cancel out the extreme cynicism of Game of Thrones, with the result being a complex mix of idealism and cynicism that more accurately reflects the real world (a world where Mahatma Gandhi and Adolf Hitler lived at the same time). That complex, nuanced world (or should I say worlds?) is where The Expanse takes place. ST is also more geopolitical than BG and The Expanse is nothing if not geopolitical.

But The Expanse is not just psychologically realistic—it is also scientifically and economically realistic. It may in fact be the hardest science fiction I have ever encountered, and is definitely the hardest science fiction I’ve seen in a television show. (There are a few books that might be slightly harder, as well as some movies based on them.)

The only major scientific inaccuracy I’ve been able to find so far is the use of sound effects in space, and actually even these can be interpreted as reflecting an omniscient narrator perspective that would hear any sounds that anyone would hear, regardless of what planet or ship they might be on. The sounds the audience hears all seem to be sounds that someone would hear—there’s simply no particular person who would hear all of them. When people are actually thrown into hard vacuum, we don’t hear them make any noise.

Like Firefly (and for once I think The Expanse might actually be good enough to deserve that comparison), there is no FTL, no aliens, no superhuman AI. Human beings are bound within our own solar system, and travel between planets takes weeks or months depending on your energy budget. They actually show holograms projecting the trajectory of various spacecraft and the trajectories actually make good sense in terms of orbital mechanics. Finally screenwriters had the courage to give us the terrifying suspense and inevitability of an incoming nuclear missile rounding a nearby asteroid and intercepting your trajectory, where you have minutes to think about it but not nearly enough delta-v to get out of its blast radius. That is what space combat will be like, if we ever have space combat (as awesome as it is to watch, I strongly hope that we will not ever actually do it). Unlike what Star Trek would have you believe, space is not a 19th century ocean.

They do have stealth in space—but it requires technology that even to them is highly advanced. Moreover it appears to only work for relatively short periods and seems most effective against civilian vessels that would likely lack state-of-the-art sensors, both of which make it a lot more plausible.

Computers are more advanced in the 2200s then they were in the 2000s, but not radically so, at most a million times faster, about what we gained since the 1980s. I’m guessing a smartphone in The Expanse runs at a few petaflops. Essentially they’re banking on Moore’s Law finally dying sometime in the mid 21st century, but then, so am I. Perhaps a bit harder to swallow is that no one has figured out good enough heuristics to match human cognition; but then, human cognition is very tightly optimized.

Spacecraft don’t have artificial gravity except for the thrust of their engines, and people float around as they should when ships are freefalling. They actually deal with the fact that Mars and Ceres have lower gravity than Earth, and the kinds of health problems that result from this. (One thing I do wish they’d done is had the Martian cruiser set a cruising acceleration of Mars-g—about 38% Earth-g—that would feel awkward and dizzying to their Earther captives. Instead they basically seem to assume that Martians still like to use Earth-g for space transit, but that does make some sense in terms of both human health and simply transit time.) It doesn’t seem like people move around quite awkwardly enough in the very low gravity of Ceres—which should be only about 3% Earth-g—but they do establish that electromagnetic boots are ubiquitous and that could account for most of this.

They fight primarily with nuclear missiles and kinetic weapons, and the damage done by nuclear missiles is appropriately reduced by the fact that vacuum doesn’t transmit shockwaves. (Nuclear missiles would still be quite damaging in space by releasing large amounts of wide-spectrum radiation; but they wouldn’t cause the total devastation they do within atmosphere.) Oddly they decided not to go with laser weapons as far as I can tell, which actually seems to me like they’ve underestimated advancement; laser weapons have a number of advantages that would be particularly useful in space, once we can actually make them affordable and reliable enough for widespread deployment. There could also be a three-tier system, where missiles are used at long range, railguns at medium range, and lasers at short range. (Yes, short range—the increased speed of lasers would be only slight compared to a good railgun, and would be more than offset by the effect of diffraction. At orbital distances, a laser is a shotgun.) Then again, it could well work out that railguns are just better—depending on how vessels are structured, puncturing their hulls with kinetic rounds could well be more useful than burning them up with infrared lasers.

But I think what really struck me about the realism of The Expanse is how it even makes the society realistic (in a way that, say, Firefly really doesn’t—we wanted a Western and we got a Western!).

The only major offworld colonies are Mars and Ceres, both of which seem to be fairly well-established, probably originally colonized as much as a century ago. Different societies have formed on each world; Earth has largely united under the United Nations (one of the lead characters is an undersecretary for the UN), but meanwhile Mars has split off into its own independent nation (“Martian” is now an ethnicity like “German” rather than meaning “extraterrestrial”), and the asteroid belt colonists, while formally still under Earth’s government, think of themselves as a different culture (“Belters”) and are seeking independence. There are some fairly obvious—but deftly managed rather than heavy-handed—parallels between the Belter independence movement and real-world independence movements, particularly Palestine (it’s hard not to think of the PLO when they talk about the OPA). Both Mars and the Belt have their own languages, while Earth’s languages have largely coalesced around English as the language of politics and commerce. (If the latter seems implausible, I remind you that the majority of the Internet and all international air traffic control are in English.) English is the world’s lingua franca (which is a really bizarre turn of phrase because it’s the Latin for French).

There is some of the conniving and murdering of Game of Thrones, but it is at a much more subdued level, and all of the major factions display both merits and flaws. There is no clear hero and no clear villain, just conflict and misunderstanding between a variety of human beings each with their own good and bad qualities. There does seem to be a sense that the most idealistic characters suffer for their idealism much as the Starks often do, but unlike the Starks they usually survive and learn from the experience. Indeed, some of the most cynical also seem to suffer for their cynicism—in the episode I just finished, the grizzled UN Colonel assumed the worst of his adversary and ended up branded “the butcher of Anderson Station”.

Cost of living on Ceres is extraordinarily high because of the limited living space (the apartments look a lot like the tiny studios of New York or San Francisco), and above all the need to constantly import air and water from Earth. A central plot point in the first episode is that a ship carrying comet ice—i.e., water—to Ceres is lost in a surprise attack by unknown adversaries with advanced technology, and the result is a deepening of an already dire water shortage, exacerbating the Belter’s craving for rebellion.

Air and water are recyclable, so it wouldn’t be that literally every drink and every breath needs to be supplied from outside—indeed that would clearly be cost-prohibitive. But recycling is never perfect, and Ceres also appears to have a growing population, both of which would require a constant input of new resources to sustain. It makes perfect sense that the most powerful people on Ceres are billionaire tycoons who own water and air transport corporations.

The police on Ceres (of which another lead character is a detective) are well-intentioned but understaffed, underfunded and moderately corrupt, similar to what we seem to find in large inner-city police departments like the NYPD and LAPD. It felt completely right when they responded to an attempt to kill a police officer with absolutely overwhelming force and little regard for due process and procedure—for this is what real-world police departments almost always do.

But why colonize the asteroid belt at all? Mars is a whole planet, there is plenty there—and in The Expanse they are undergoing terraforming at a very plausible rate (there’s a moving scene where a Martian says to an Earther, “We’re trying to finish building our garden before you finish paving over yours.”). Mars has as much land as Earth, and it has water, abundant metals, and CO2 you could use to make air.Even just the frontier ambition could be enough to bring us to Mars.

But why go to Ceres? The explanation The Expanse offers is a very sensible one: Mining, particularly so-called “rare earth metals”. Gold and platinum might have been profitable to mine at first, but once they became plentiful the market would probably collapse or at least drop off to a level where they aren’t particularly expensive or interesting—because they aren’t useful for very much. But neodymium, scandium, and prometheum are all going to be in extremely high demand in a high-tech future based on nuclear-powered spacecraft, and given that we’re already running out of easily accessible deposits on Earth, by the 2200s there will probably be basically none left. The asteroid belt, however, will have plenty for centuries to come.

As a result Ceres is organized like a mining town, or perhaps an extractive petrostate (metallostate?); but due to lightspeed interplanetary communication—very important in the series—and some modicum of free speech it doesn’t appear to have attained more than a moderate level of corruption. This also seems realistic; the “end-of-history” thesis is often overstated, but the basic idea that some form of democracy and welfare-state capitalism is fast becoming the only viable model of governance does seem to be true, and that is almost certainly the model of governance we would export to other planets. In such a system corruption can only get so bad before it is shown on the mass media and people won’t take it anymore.

The show doesn’t deal much with absolute dollar (or whatever currency) numbers, which is probably wise; but nominal incomes on Ceres are likely extremely high even though the standard of living is quite poor, because the tiny living space and need to import air and water would make prices (literally?) astronomical. Most people on Ceres seem to have grown up there, but the initial attraction could have been something like the California Gold Rush, where rumors of spectacularly high incomes clashed with similarly spectacular expenses incurred upon arrival. “Become a millionaire!” “Oh, by the way, your utility bill this month is $112,000.”

Indeed, even the poor on Ceres don’t seem that poor, which is a very nice turn toward realism that a lot of other science fiction shows seem unprepared to make. In Firefly, the poor are poor—they can barely afford food and clothing, and have no modern conveniences whatsoever. (“Jaynestown”, perhaps my favorite episode, depicts this vividly.) But even the poor in the US today are rarely that poor; our minimalistic and half-hearted welfare state has a number of cracks one can fall through, but as long as you get the benefits you’re supposed to get you should be able to avoid starvation and homelessness. Similarly I find it hard to believe that any society with high enough productivity to routinely build interstellar spacecraft the way we build container ships would not have at least the kind of welfare state that provides for the most basic needs. Chronic dehydration is probably still a problem for Belters, because water would be too expensive to subsidize in this way; but they all seem to have fairly nice clothes, home appliances, and smartphones, and that seems right to me. At one point a character loses his arm, and the “cheap” solution is a cybernetic prosthetic—the “expensive” one would be to grow him a new arm. As today but perhaps even more so, poverty in The Expanse is really about inequality—the enormous power granted to those who have millions of times as much as others. (Another show that does this quite well, though is considerably softer as far as the physics, is Continuum. If I recall correctly, Alec Sadler in 2079 is literally a trillionaire.)

Mars also appears to be a democracy, and actually quite a thriving one. In many ways Mars appears to be surpassing Earth economically and technologically. This suggests that Mars was colonized with our best and brightest, but not necessarily; Australians have done quite well for themselves despite being founded as a penal colony. Mars colonization would also have a way of justifying their frontier idealism that no previous frontiers have granted: No indigenous people to displace, no local ecology to despoil, and no gifts from the surrounding environment. You really are working entirely out of your own hard work and know-how (and technology and funding from Earth of course) to establish a truly new world on the open and unspoiled frontier. You’re not naive or a hypocrite, it’s the real truth. That kind of realistic idealism could make the Martian Dream a success in ways even the American Dream never quite was.

In all it is a very compelling series, and should appeal to people like me who crave geopolitical nuance in fiction. But it also has its moments of huge space battles with exploding star cruisers, so there’s that.

What does correlation have to do with causation?

JDN 2457345

I’ve been thinking of expanding the topics of this blog into some basic statistics and econometrics. It has been said that there are “Lies, damn lies, and statistics”; but in fact it’s almost the opposite—there are truths, whole truths, and statistics. Almost everything in the world that we know—not merely guess, or suppose, or intuit, or believe, but actually know, with a quantifiable level of certainty—is done by means of statistics. All sciences are based on them, from physics (when they say the Higgs discovery is a “5-sigma event”, that’s a statistic) to psychology, ecology to economics. Far from being something we cannot trust, they are in a sense the only thing we can trust.

The reason it sometimes feels like we cannot trust statistics is that most people do not understand statistics very well; this creates opportunities for both accidental confusion and willful distortion. My hope is therefore to provide you with some of the basic statistical knowledge you need to combat the worst distortions and correct the worst confusions.

I wasn’t quite sure where to start on this quest, but I suppose I have to start somewhere. I figured I may as well start with an adage about statistics that I hear commonly abused: “Correlation does not imply causation.”

Taken at its original meaning, this is definitely true. Unfortunately, it can be easily abused or misunderstood.

In its original meaning, the formal sense of the word “imply” meaning logical implication, to “imply” something is an extremely strong statement. It means that you logically entail that result, that if the antecedent is true, the consequent must be true, on pain of logical contradiction. Logical implication is for most practical purposes synonymous with mathematical proof. (Unfortunately, it’s not quite synonymous, because of things like Gödel’s incompleteness theorems and Löb’s theorem.)

And indeed, correlation does not logically entail causation; it’s quite possible to have correlations without any causal connection whatsoever, simply by chance. One of my former professors liked to brag that from 1990 to 2010 whether or not she ate breakfast had a statistically significant positive correlation with that day’s closing price for the Dow Jones Industrial Average.

How is this possible? Did my professor actually somehow influence the stock market by eating breakfast? Of course not; if she could do that, she’d be a billionaire by now. And obviously the Dow’s price at 17:00 couldn’t influence whether she ate breakfast at 09:00. Could there be some common cause driving both of them, like the weather? I guess it’s possible; maybe in good weather she gets up earlier and people are in better moods so they buy more stocks. But the most likely reason for this correlation is much simpler than that: She tried a whole bunch of different combinations until she found two things that correlated. At the usual significance level of 0.05, on average you need to try about 20 combinations of totally unrelated things before two of them will show up as correlated. (My guess is she used a number of different stock indexes and varied the starting and ending year. That’s a way to generate a surprisingly large number of degrees of freedom without it seeming like you’re doing anything particularly nefarious.)

But how do we know they aren’t actually causally related? Well, I suppose we don’t. Especially if the universe is ultimately deterministic and nonlocal (as I’ve become increasingly convinced by the results of recent quantum experiments), any two data sets could be causally related somehow. But the point is they don’t have to be; you can pick any randomly-generated datasets, pair them up in 20 different ways, and odds are, one of those ways will show a statistically significant correlation.

All of that is true, and important to understand. Finding a correlation between eating grapefruit and getting breast cancer, or between liking bitter foods and being a psychopath, does not necessarily mean that there is any real causal link between the two. If we can replicate these results in a bunch of other studies, that would suggest that the link is real; but typically, such findings cannot be replicated. There is something deeply wrong with the way science journalists operate; they like to publish the new and exciting findings, which 9 times out of 10 turn out to be completely wrong. They never want to talk about the really important and fascinating things that we know are true because we’ve been confirming them over hundreds of different experiments, because that’s “old news”. The journalistic desire to be new and first fundamentally contradicts the scientific requirement of being replicated and confirmed.

So, yes, it’s quite possible to have a correlation that tells you absolutely nothing about causation.

But this is exceptional. In most cases, correlation actually tells you quite a bit about causation.

And this is why I don’t like the adage; “imply” has a very different meaning in common speech, meaning merely to suggest or evoke. Almost everything you say implies all sorts of things in this broader sense, some more strongly than others, even though it may logically entail none of them.

Correlation does in fact suggest causation. Like any suggestion, it can be overridden. If we know that 20 different combinations were tried until one finally yielded a correlation, we have reason to distrust that correlation. If we find a correlation between A and B but there is no logical way they can be connected, we infer that it is simply an odd coincidence.

But when we encounter any given correlation, there are three other scenarios which are far more likely than mere coincidence: A causes B, B causes A, or some other factor C causes A and B. These are also not mutually exclusive; they can all be true to some extent, and in many cases are.

A great deal of work in science, and particularly in economics, is based upon using correlation to infer causation, and has to be—because there is simply no alternative means of approaching the problem.

Yes, sometimes you can do randomized controlled experiments, and some really important new findings in behavioral economics and development economics have been made this way. Indeed, much of the work that I hope to do over the course of my career is based on randomized controlled experiments, because they truly are the foundation of scientific knowledge. But sometimes, that’s just not an option.

Let’s consider an example: In my master’s thesis I found a strong correlation between the level of corruption in a country (as estimated by the World Bank) and the proportion of that country’s income which goes to the top 0.01% of the population. Countries that have higher levels of corruption also tend to have a larger proportion of income that accrues to the top 0.01%. That correlation is a fact; it’s there. There’s no denying it. But where does it come from? That’s the real question.

Could it be pure coincidence? Well, maybe; but when it keeps showing up in several different models with different variables included, that becomes unlikely. A single p < 0.05 will happen about 1 in 20 times by chance; but five in a row should happen less than 1 in 1 million times (assuming they’re independent, which, to be fair, they usually aren’t).

Could it be some artifact of the measurement methods? It’s possible. In particular, I was concerned about the possibility of Halo Effect, in which people tend to assume that something which is better (or worse) in one way is automatically better (or worse) in other ways as well. People might think of their country as more corrupt simply because it has higher inequality, even if there is no real connection. But it would have taken a very large halo bias to explain this effect.

So, does corruption cause income inequality? It’s not hard to see how that might happen: More corrupt individuals could bribe leaders or exploit loopholes to make themselves extremely rich, and thereby increase inequality.

Does inequality cause corruption? This also makes some sense, since it’s a lot easier to bribe leaders and manipulate regulations when you have a lot of money to work with in the first place.

Does something else cause both corruption and inequality? Also quite plausible. Maybe some general cultural factors are involved, or certain economic policies lead to both corruption and inequality. I did try to control for such things, but I obviously couldn’t include all possible variables.

So, which way does the causation run? Unfortunately, I don’t know. I tried some clever statistical techniques to try to figure this out; in particular, I looked at which tends to come first—the corruption or the inequality—and whether they could be used to predict each other, a method called Granger causality. Those results were inconclusive, however. I could neither verify nor exclude a causal connection in either direction. But is there a causal connection? I think so. It’s too robust to just be coincidence. I simply don’t know whether A causes B, B causes A, or C causes A and B.

Imagine trying to do this same study as a randomized controlled experiment. Are we supposed to create two societies and flip a coin to decide which one we make more corrupt? Or which one we give more income inequality? Perhaps you could do some sort of experiment with a proxy for corruption (cheating on a test or something like that), and then have unequal payoffs in the experiment—but that is very far removed from how corruption actually works in the real world, and worse, it’s prohibitively expensive to make really life-altering income inequality within an experimental context. Sure, we can give one participant $1 and the other $1,000; but we can’t give one participant $10,000 and the other $10 million, and it’s the latter that we’re really talking about when we deal with real-world income inequality. I’m not opposed to doing such an experiment, but it can only tell us so much. At some point you need to actually test the validity of your theory in the real world, and for that we need to use statistical correlations.

Or think about macroeconomics; how exactly are you supposed to test a theory of the business cycle experimentally? I guess theoretically you could subject an entire country to a new monetary policy selected at random, but the consequences of being put into the wrong experimental group would be disastrous. Moreover, nobody is going to accept a random monetary policy democratically, so you’d have to introduce it against the will of the population, by some sort of tyranny or at least technocracy. Even if this is theoretically possible, it’s mind-bogglingly unethical.

Now, you might be thinking: But we do change real-world policies, right? Couldn’t we use those changes as a sort of “experiment”? Yes, absolutely; that’s called a quasi-experiment or a natural experiment. They are tremendously useful. But since they are not truly randomized, they aren’t quite experiments. Ultimately, everything you get out of a quasi-experiment is based on statistical correlations.

Thus, abuse of the adage “Correlation does not imply causation” can lead to ignoring whole subfields of science, because there is no realistic way of running experiments in those subfields. Sometimes, statistics are all we have to work with.

This is why I like to say it a little differently:

Correlation does not prove causation. But correlation definitely can suggest causation.

Why being a scientist means confronting your own ignorance

I read an essay today arguing that scientists should be stupid. Or more precisely, ignorant. Or even more precisely, they should recognize their ignorance when all others ignore and turn away.

What does it feel like to be wrong?

It doesn’t feel like anything. Most people are wrong most of the time without realizing it. (Explained brilliantly in this TED talk.)

What does it feel like to be proven wrong, to find out you were confused or ignorant?

It hurts, a great deal. And most people flinch away from this. They would rather continue being wrong than experience the feeling of being proven wrong.

But being proven wrong is the only way to become less wrong. Being proven ignorant is the only way to truly attain knowledge.

I once heard someone characterize the scientific temperament as “being comfortable not knowing”. No, no, no! Just the opposite, in fact. The unscientific temperament is being comfortable not knowing, being fine with your infinite ignorance as long as you can go about your day. The scientific temperament is being so deeply  uncomfortable not knowing that it overrides the discomfort everyone feels when their beliefs are proven wrong. It is to have a drive to actually know—not to think you know, not to feel as if you know, not to assume you know and never think about it, but to actually know—that is so strong it pushes you through all the pain and doubt and confusion of actually trying to find out.

An analogy I like to use is The Armor of Truth. Suppose you were presented with a piece of armor, The Armor of Truth, which is claimed to be indestructible. You will have the chance to wear this armor into battle; if it is indeed indestructible, you will be invincible and will surely prevail. But what if it isn’t? What if it has some weakness you aren’t aware of? Then it could fail and you could die.

How would you go about determining whether The Armor of Truth is really what it is claimed to be? Would you test it with things you expect it to survive? Would you brush it with feathers, pour glasses of water on it, poke it with your finger? Would you seek to confirm your belief in its indestructibility? (As confirmation bias would have you do?) No, you would test it with things you expect to destroy it; you’d hit it with everything you have. You’re fire machine guns at it, drop bombs on it, pour acid on it, place it in a nuclear testing site. You’d do everything you possibly could to falsify your belief in the armor’s indestructibility. And only when you failed, only after you had tried everything you could think of to destroy the armor and it remained undented and unscratched, would you begin to believe that it is truly indestructible. (Popper was exaggerating when he said all science is based on falsification; but he was not exaggerating very much.)

Science is The Armor of Truth, and we wear it into battle—but now the analogy begins to break down, for our beliefs are within us, they are part of us. We’d like to be able to point the machineguns at armor far away from us, but instead it is as if we are forced to wear the armor as the guns are fired. When a break in the armor is found and a bullet passes through—a belief we dearly held is proven false—it hurts us, and we wish we could find another way to test it. But we can’t; and if we fail to test it now, it will only endanger us later—confront a false belief with reality enough and it will eventually fail. A scientist is someone who accepts this and wears the armor bravely as the test guns blaze.

Being a scientist means confronting your own ignorance: Not accepting it, but also not ignoring it; confronting it. Facing it down. Conquering it. Destroying it.