Social science is broken. Can we fix it?

May 16 JDN 2459349

Social science is broken. I am of course not the first to say so. The Atlantic recently published an article outlining the sorry state of scientific publishing, and several years ago Slate Star Codex published a lengthy post (with somewhat harsher language than I generally use on this blog) showing how parapsychology, despite being obviously false, can still meet the standards that most social science is expected to meet. I myself discussed the replication crisis in social science on this very blog a few years back.

I was pessimistic then about the incentives of scientific publishing be fixed any time soon, and I am even more pessimistic now.

Back then I noted that journals are often run by for-profit corporations that care more about getting attention than getting the facts right, university administrations are incompetent and top-heavy, and publish-or-perish creates cutthroat competition without providing incentives for genuinely rigorous research. But these are widely known facts, even if so few in the scientific community seem willing to face up to them.

Now I am increasingly concerned that the reason we aren’t fixing this system is that the people with the most power to fix it don’t want to. (Indeed, as I have learned more about political economy I have come to believe this more and more about all the broken institutions in the world. American democracy has its deep flaws because politicians like it that way. China’s government is corrupt because that corruption is profitable for many of China’s leaders. Et cetera.)

I know economics best, so that is where I will focus; but most of what I’m saying here would also apply to other social sciences such as sociology and psychology as well. (Indeed it was psychology that published Daryl Bem.)

Rogoff and Reinhart’s 2010 article “Growth in a Time of Debt”, which was a weak correlation-based argument to begin with, was later revealed (by an intrepid grad student! His name is Thomas Herndon.) to be based upon deep, fundamental errors. Yet the article remains published, without any notice of retraction or correction, in the American Economic Review, probably the most prestigious journal in economics (and undeniably in the vaunted “Top Five”). And the paper itself was widely used by governments around the world to justify massive austerity policies—which backfired with catastrophic consequences.

Why wouldn’t the AER remove the article from their website? Or issue a retraction? Or at least add a note on the page explaining the errors? If their primary concern were scientific truth, they would have done something like this. Their failure to do so is a silence that speaks volumes, a hound that didn’t bark in the night.

It’s rational, if incredibly selfish, for Rogoff and Reinhart themselves to not want a retraction. It was one of their most widely-cited papers. But why wouldn’t AER’s editors want to retract a paper that had been so embarrassingly debunked?

And so I came to realize: These are all people who have succeeded in the current system. Their work is valued, respected, and supported by the system of scientific publishing as it stands. If we were to radically change that system, as we would necessarily have to do in order to re-align incentives toward scientific truth, they would stand to lose, because they would suddenly be competing against other people who are not as good at satisfying the magical 0.05, but are in fact at least as good—perhaps even better—actual scientists than they are.

I know how they would respond to this criticism: I’m someone who hasn’t succeeded in the current system, so I’m biased against it. This is true, to some extent. Indeed, I take it quite seriously, because while tenured professors stand to lose prestige, they can’t really lose their jobs even if there is a sudden flood of far superior research. So in directly economic terms, we would expect the bias against the current system among grad students, adjuncts, and assistant professors to be larger than the bias in favor of the current system among tenured professors and prestigious researchers.

Yet there are other motives aside from money: Norms and social status are among the most powerful motivations human beings have, and these biases are far stronger in favor of the current system—even among grad students and junior faculty. Grad school is many things, some good, some bad; but one of them is a ritual gauntlet that indoctrinates you into the belief that working in academia is the One True Path, without which your life is a failure. If your claim is that grad students are upset at the current system because we overestimate our own qualifications and are feeling sour grapes, you need to explain our prevalence of Impostor Syndrome. By and large, grad students don’t overestimate our abilities—we underestimate them. If we think we’re as good at this as you are, that probably means we’re better. Indeed I have little doubt that Thomas Herndon is a better economist than Kenneth Rogoff will ever be.

I have additional evidence that insider bias is important here: When Paul Romer—Nobel laureate—left academia he published an utterly scathing criticism of the state of academic macroeconomics. That is, once he had escaped the incentives toward insider bias, he turned against the entire field.

Romer pulls absolutely no punches: He literally compares the standard methods of DSGE models to “phlogiston” and “gremlins”. And the paper is worth reading, because it’s obviously entirely correct. He pulls no punches and every single one lands on target. It’s also a pretty fun read, at least if you have the background knowledge to appreciate the dry in-jokes. (Much like “Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity.” I still laugh out loud every time I read the phrase “hegemonic Zermelo-Frankel axioms”, though I realize most people would be utterly nonplussed. For the unitiated, these are the Zermelo-Frankel axioms. Can’t you just see the colonialist imperialism in sentences like “\forall x \forall y (\forall z, z \in x \iff z \in y) \implies x = y”?)

In other words, the Upton Sinclair Principle seems to be applying here: “It is difficult to get a man to understand something when his salary depends upon not understanding it.” The people with the most power to change the system of scientific publishing are journal editors and prestigious researchers, and they are the people for whom the current system is running quite swimmingly.

It’s not that good science can’t succeed in the current system—it often does. In fact, I’m willing to grant that it almost always does, eventually. When the evidence has mounted for long enough and the most adamant of the ancien regime finally retire or die, then, at last, the paradigm will shift. But this process takes literally decades longer than it should. In principle, a wrong theory can be invalidated by a single rigorous experiment. In practice, it generally takes about 30 years of experiments, most of which don’t get published, until the powers that be finally give in.

This delay has serious consequences. It means that many of the researchers working on the forefront of a new paradigm—precisely the people that the scientific community ought to be supporting most—will suffer from being unable to publish their work, get grant funding, or even get hired in the first place. It means that not only will good science take too long to win, but that much good science will never get done at all, because the people who wanted to do it couldn’t find the support they needed to do so. This means that the delay is in fact much longer than it appears: Because it took 30 years for one good idea to take hold, all the other good ideas that would have sprung from it in that time will be lost, at least until someone in the future comes up with them.

I don’t think I’ll ever forget it: At the AEA conference a few years back, I went to a luncheon celebrating Richard Thaler, one of the founders of behavioral economics, whom I regard as one of the top 5 greatest economists of the 20th century (I’m thinking something like, “Keynes > Nash > Thaler > Ramsey > Schelling”). Yes, now he is being rightfully recognized for his seminal work; he won a Nobel, and he has an endowed chair at Chicago, and he got an AEA luncheon in his honor among many other accolades. But it was not always so. Someone speaking at the luncheon offhandedly remarked something like, “Did we think Richard would win a Nobel? Honestly most of us weren’t sure he’d get tenure.” Most of the room laughed; I had to resist the urge to scream. If Richard Thaler wasn’t certain to get tenure, then the entire system is broken. This would be like finding out that Erwin Schrodinger or Niels Bohr wasn’t sure he would get tenure in physics.

A. Gary Schilling, a renowned Wall Street economist (read: One Who Has Turned to the Dark Side), once remarked (the quote is often falsely attributed to Keynes): “markets can remain irrational a lot longer than you and I can remain solvent.” In the same spirit, I would say this: the scientific community can remain wrong a lot longer than you and I can extend our graduate fellowships and tenure clocks.

How much should we give?

Nov 4 JDN 2458427

How much should we give of ourselves to others?

I’ve previously struggled with this basic question when it comes to donating money; I have written multiple posts on it now, some philosophical, some empirical, and some purely mathematical.

But the question is broader than this: We don’t simply give money. We also give effort. We also give emotion. Above all, we also give time. How much should we be volunteering? How many protest marches should we join? How many Senators should we call?

It’s easy to convince yourself that you aren’t doing enough. You can always point to some hour when you weren’t doing anything particularly important, and think about all the millions of lives that hang in the balance on issues like poverty and climate change, and then feel a wave of guilt for spending that hour watching Netflix or playing video games instead of doing one more march. This, however, is clearly unhealthy: You won’t actually make yourself into a more effective activist, you’ll just destroy yourself psychologically and become no use to anybody.

I previously argued for a sort of Kantian notion that we should commit to giving our fair share, defined as the amount we would have to give if everyone gave that amount. This is quite appealing, and if I can indeed get anyone to donate 1% of their income as a result, I will be quite glad. (If I can get 100 people to do so, that’s better than I could ever have done myself—a good example of highly cost-effective slacktivism.)

Lately I have come to believe that this is probably inadequate. We know that not everyone will take this advice, which means that by construction it won’t be good enough to actually solve global problems.

This means I must make a slightly greater demand: Define your fair share as the amount you would have to give if everyone among people who are likely to give gave that amount.

Unfortunately, this question is considerably harder. It may not even have a unique answer. The number of people willing to give an amount n is obviously dependent upon the amount x itself, and we are nowhere close to knowing what that function n(x) looks like.

So let me instead put some mathematical constraints on it, by choosing an elasticity. Instead of an elasticity of demand or elasticity of supply, we could call this an elasticity of contribution.

Presumably the elasticity is negative: The more you ask of people, the fewer people you’ll get to contribute.

Suppose that the elasticity is something like -0.5, where contribution is relatively inelastic. This means that if you increase the amount you ask for by 2%, you’ll only decrease the number of contributors by 1%. In that case, you should be like Peter Singer and ask for everything. At that point, you’re basically counting on Bill Gates to save us, because nobody else is giving anything. The total amount contributed n(x) * x is increasing in x.

On the other hand, suppose that elasticity is something like 2, where contribution is relatively elastic. This means that if you increase the amount you ask for by 2%, you will decrease the number of contributors by 4%. In that case, you should ask for very little. You’re asking everyone in the world to give 1% of their income, as I did earlier. The total amount contributed n(x) * x is now decreasing in x.

But there is also a third option: What if the elasticity is exactly -1, unit elastic? Then if you increase the amount you ask for by 2%, you’ll decrease the number of contributors by 2%. Then it doesn’t matter how much you ask for: The total amount contributed n(x) * x is constant.

Of course, there’s no guarantee that the elasticity is constant over all possible choices of x—indeed, it would be quite surprising if it were. A quite likely scenario is that contribution is inelastic for small amounts, then passes through a regime where it is nearly unit elastic, and finally it becomes elastic as you start asking for really large amounts of money.

The simplest way to model that is to just assume that n(x) is linear in x, something like n = N – k x.

There is a parameter N that sets the maximum number of people who will ever donate, and a parameter k that sets how rapidly the number of contributors drops off as the amount asked for increases.

The first-order condition for maximizing n(x) * x is then quite simple: x = N/(2k)

This actually turns out to be the precisely the point at which the elasticity of contribution is -1.

The total amount you can get under that condition is N2/(4k)

Of course, I have no idea what N and k are in real life, so this isn’t terribly helpful. But what I really want to know is whether we should be asking for more money from each person, or asking for less money and trying to get more people on board.

In real life we can sometimes do both: Ask each person to give more than they are presently giving, whatever they are presently giving. (Just be sure to run your slogans by a diverse committee, so you don’t end up with “I’ve upped my standards. Now, up yours!”) But since we’re trying to find a benchmark level to demand of ourselves, let’s ignore that for now.

About 25% of American adults volunteer some of their time, averaging 140 hours of volunteer work per year. This is about 1.6% of all the hours in a year, or 2.4% of all waking hours. Total monetary contributions in the US reached $400 billion for the first time this year; this is about 2.0% of GDP. So the balance between volunteer hours and donations is actually pretty even. It would probably be better to tilt it a bit more toward donations, but it’s really not bad. About 60% of US households made some sort of charitable contribution, though only half of these received the charitable tax deduction.

This suggests to me that the quantity of people who give is probably about as high as it’s going to get—and therefore we need to start talking more about the amount of money. We may be in the inelastic regime, where the way to increase total contributions is to demand more from each individual.

Our goal is to increase the total contribution to poverty eradication by about 1% of GDP in both the US and Europe. So if 60% of people give, and currently total contributions are about 2.0% of GDP, this means that the average contribution is about 3.3% of the contributor’s gross income. Therefore I should tell them to donate 4.3%, right? Not quite; some of them might drop out entirely, and the rest will have to give more to compensate.
Without knowing the exact form of the function n(x), I can’t say precisely what the optimal value is. But it is most likely somewhat larger than 4.3%; 5% would be a nice round number in the right general range. This would raise contributions in the US to 2.6% of GDP, or about $500 billion. That’s a 20% increase over the current level, which is large, but feasible.

Accomplishing a similar increase in Europe would then give us a total of $200 billion per year in additional funds to fight global poverty; this might not quite be enough to end world hunger (depending on which estimate you use), but it would definitely have a large impact.

I asked you before to give 1%. I am afraid I must now ask for more. Set a target of 5%. You don’t have to reach it this year; you can gradually increase your donations each year for several years (I call this “Save More Lives Tomorrow”, after Thaler’s highly successful program “Save More Tomorrow”). This is in some sense more than your fair share; I’m relying on the assumption that half the population won’t actually give anything. But ultimately this isn’t about what’s fair to us. It’s about solving global problems.

Happy Capybara Day! Or the power of culture

JDN 2457131 EDT 14:33.

Did you celebrate Capybara Day yesterday? You didn’t? Why not? We weren’t able to find any actual capybaras this year, but maybe next year we’ll be able to plan better and find a capybara at a zoo; unfortunately the nearest zoo with a capybara appears to be in Maryland. But where would we be without a capybara to consult annually on the stock market?

Right now you are probably rather confused, perhaps wondering if I’ve gone completely insane. This is because Capybara Day is a holiday of my own invention, one which only a handful of people have even heard about.

But if you think we’d never have a holiday so bizarre, think again: For all I did was make some slight modifications to Groundhog Day. Instead of consulting a groundhog about the weather every February 2, I proposed that we consult a capybara about the stock market every April 17. And if you think you have some reason why groundhogs are better at predicting the weather (perhaps because they at least have some vague notion of what weather is) than capybaras are at predicting the stock market (since they have no concept of money or numbers), think about this: Capybara Day could produce extremely accurate predictions, provided only that people actually believed it. The prophecy of rising or falling stock prices could very easily become self-fulfilling. If it were a cultural habit of ours to consult capybaras about the stock market, capybaras would become good predictors of the stock market.

That might seem a bit far-fetched, but think about this: Why is there a January Effect? (To be fair, some researchers argue that there isn’t, and the apparent correlation between higher stock prices and the month of January is simply an illusion, perhaps the result of data overfitting.)

But I think it probably is real, and moreover has some very obvious reasons behind it. In this I’m in agreement with Richard Thaler, a founder of cognitive economics who wrote about such anomalies in the 1980s. December is a time when two very culturally-important events occur: The end of the year, during which many contracts end, profits are assessed, and tax liabilities are determined; and Christmas, the greatest surge of consumer spending and consumer debt.

The first effect means that corporations are very likely to liquidate assets—particularly assets that are running at a loss—in order to minimize their tax liabilities for the year, which will drive down prices. The second effect means that consumers are in search of financing for extravagant gift purchases, and those who don’t run up credit cards may instead sell off stocks. This is if anything a more rational way of dealing with the credit constraint, since interest rates on credit cards are typically far in excess of stock returns. But this surge of selling due to credit constraints further depresses prices.

In January, things return to normal; assets are repurchased, debt is repaid. This brings prices back up to where they were, which results in a higher than normal return for January.

Neoclassical economists are loath to admit that such a seasonal effect could exist, because it violates their concept of how markets work—and to be fair, the January Effect is actually weak enough to be somewhat ambiguous. But actually it doesn’t take much deviation from neoclassical models to explain the effect: Tax policies and credit constraints are basically enough to do it, so you don’t even need to go that far into understanding human behavior. It’s perfectly rational to behave this way given the distortions that are created by taxes and credit limits, and the arbitrage opportunity is one that you can only take advantage of if you have large amounts of credit and aren’t worried about minimizing your tax liabilities. It’s important to remember just how strong the assumptions of models like CAPM truly are; in addition to the usual infinite identical psychopaths, CAPM assumes there are no taxes, no transaction costs, and unlimited access to credit. I’d say it’s amazing that it works at all, but actually, it doesn’t—check out this graph of risk versus return and tell me if you think CAPM is actually giving us any information at all about how stock markets behave. It frankly looks like you could have drawn a random line through a scatter plot and gotten just as good a fit. Knowing how strong its assumptions are, we would not expect CAPM to work—and sure enough, it doesn’t.

Of course, that leaves the question of why our tax policy would be structured in this way—why make the year end on December 31 instead of some other date? And for that, you need to go back through hundreds of years of history, the Gregorian calendar, which in turn was influenced by Christianity, and before that the Julian calendar—in other words, culture.

Culture is one of the most powerful forces that influences human behavior—and also one of the strangest and least-understood. Economic theory is basically silent on the matter of culture. Typically it is ignored entirely, assumed to be irrelevant against the economic incentives that are the true drivers of human action. (There’s a peculiar emotion many neoclassical economists express that I can best describe as self-righteous cynicism, the attitude that we alone—i.e., economists—understand that human beings are not the noble and altruistic creatures many imagine us to be, nor beings of art and culture, but simply cold, calculating machines whose true motives are reducible to profit incentives—and all who think otherwise are being foolish and naïve; true enlightenment is understanding that human beings are infinite identical psychopaths. This is the attitude epitomized by the economist who once sent me an email with “altruism” written in scare quotes.)

Occasionally culture will be invoked as an external (in jargon, exogenous) force, to explain some aspect of human behavior that is otherwise so totally irrational that even invoking nonsensical preferences won’t make it go away. When a suicide bomber blows himself up in a crowd of people, it’s really pretty hard to explain that in terms of rational profit incentives—though I have seen it tried. (It could be self-interest at a larger scale, like families or nations—but then, isn’t that just the tribal paradigm I’ve been arguing for all along?)

But culture doesn’t just motivate us to do extreme or wildly irrational things. It motivates us all the time, often in quite beneficial ways; we wait in line, hold doors for people walking behind us, tip waiters who serve us, and vote in elections, not because anyone pressures us directly to do so (unlike say Australia we do not have compulsory voting) but because it’s what we feel we ought to do. There is a sense of altruism—and altruism provides the ultimate justification for why it is right to do these things—but the primary motivator in most cases is culture—that’s what people do, and are expected to do, around here.

Indeed, even when there is a direct incentive against behaving a certain way—like criminal penalties against theft—the probability of actually suffering a direct penalty is generally so low that it really can’t be our primary motivation. Instead, the reason we don’t cheat and steal is that we think we shouldn’t, and a major part of why we think we shouldn’t is that we have cultural norms against it.

We can actually observe differences in cultural norms across countries in the laboratory. In this 2008 study by Massimo Castro (PDF) comparing British and Italian people playing an economic game called the public goods game in which you can pay a cost yourself to benefit the group as a whole, it was found not only that people were less willing to benefit groups of foreigners than groups of compatriots, British people were overall more generous than Italian people. This 2010 study by Gachter et. al. (actually Joshua Greene talked about it last week) compared how people play the game in various cities, they found three basic patterns: In Western European and American cities such as Zurich, Copenhagen and Boston, cooperation started out high and remained high throughout; people were just cooperative in general. In Asian cities such as Chengdu and Seoul, cooperation started out low, but if people were punished for not cooperating, cooperation would improve over time, eventually reaching about the same place as in the highly cooperative cities. And in Mediterranean cities such as Istanbul, Athens, and Riyadh, cooperation started low and stayed low—even when people could be punished for not cooperating, nobody actually punished them. (These patterns are broadly consistent with the World Bank corruption ratings of these regions, by the way; Western Europe shows very low corruption, while Asia and the Mediterranean show high corruption. Of course this isn’t all that’s going on—and Asia isn’t much less corrupt than the Middle East, while this experiment might make you think so.)

Interestingly, these cultural patterns showed Melbourne as behaving more like an Asian city than a Western European one—perhaps being in the Pacific has worn off on Australia more than they realize.

This is very preliminary, cutting-edge research I’m talking about, so be careful about drawing too many conclusions. But in general we’ve begun to find some fairly clear cultural differences in economic behavior across different societies. While this would not be at all surprising to a sociologist or anthropologist, it’s the sort of thing that economists have insisted for years is impossible.

This is the frontier of cognitive economics, in my opinion. We know that culture is a very powerful motivator of our behavior, and it is time for us to understand how it works—and then, how it can be changed. We know that culture can be changed—cultural norms do change over time, sometimes remarkably rapidly; but we have only a faint notion of how or why they change. Changing culture has the power to do things that simply changing policy cannot, however; policy requires enforcement, and when the enforcement is removed the behavior will often disappear. But if a cultural norm can be imparted, it could sustain itself for a thousand years without any government action at all.