Signaling and the Curse of Knowledge

Jan 3 JDN 2459218

I received several books for Christmas this year, and the one I was most excited to read first was The Sense of Style by Steven Pinker. Pinker is exactly the right person to write such a book: He is both a brilliant linguist and cognitive scientist and also an eloquent and highly successful writer. There are two other books on writing that I rate at the same tier: On Writing by Stephen King, and The Art of Fiction by John Gardner. Don’t bother with style manuals from people who only write style manuals; if you want to learn how to write, learn from people who are actually successful at writing.

Indeed, I knew I’d love The Sense of Style as soon as I read its preface, containing some truly hilarious takedowns of Strunk & White. And honestly Strunk & White are among the best standard style manuals; they at least actually manage to offer some useful advice while also being stuffy, pedantic, and often outright inaccurate. Most style manuals only do the second part.

One of Pinker’s central focuses in The Sense of Style is on The Curse of Knowledge, an all-too-common bias in which knowing things makes us unable to appreciate the fact that other people don’t already know it. I think I succumbed to this failing most greatly in my first book, Special Relativity from the Ground Up, in which my concept of “the ground” was above most people’s ceilings. I was trying to write for high school physics students, and I think the book ended up mostly being read by college physics professors.

The problem is surely a real one: After years of gaining expertise in a subject, we are all liable to forget the difficulty of reaching our current summit and automatically deploy concepts and jargon that only a small group of experts actually understand. But I think Pinker underestimates the difficulty of escaping this problem, because it’s not just a cognitive bias that we all suffer from time to time. It’s also something that our society strongly incentivizes.

Pinker points out that a small but nontrivial proportion of published academic papers are genuinely well written, using this to argue that obscurantist jargon-laden writing isn’t necessary for publication; but he didn’t seem to even consider the fact that nearly all of those well-written papers were published by authors who already had tenure or even distinction in the field. I challenge you to find a single paper written by a lowly grad student that could actually get published without being full of needlessly technical terminology and awkward passive constructions: “A murian model was utilized for the experiment, in an acoustically sealed environment” rather than “I tested using mice and rats in a quiet room”. This is not because grad students are more thoroughly entrenched in the jargon than tenured professors (quite the contrary), nor that grad students are worse writers in general (that one could really go either way), but because grad students have more to prove. We need to signal our membership in the tribe, whereas once you’ve got tenure—or especially once you’ve got an endowed chair or something—you have already proven yourself.

Pinker seems to briefly touch this insight (p. 69), without fully appreciating its significance: “Even when we have an inlkling that we are speaking in a specialized lingo, we may be reluctant to slip back into plain speech. It could betray to our peers the awful truth that we are still greenhorns, tenderfoots, newbies. And if our readers do know the lingo, we might be insulting their intelligence while spelling it out. We would rather run the risk of confusing them while at least appearing to be soophisticated than take a chance at belaboring the obvious while striking them as naive or condescending.”

What we are dealing with here is a signaling problem. The fact that one can write better once one is well-established is the phenomenon of countersignaling, where one who has already established their status stops investing in signaling.

Here’s a simple model for you. Suppose each person has a level of knowledge x, which they are trying to demonstrate. They know their own level of knowledge, but nobody else does.

Suppose that when we observe someone’s knowledge, we get two pieces of information: We have an imperfect observation of their true knowledge which is x+e, the real value of x plus some amount of error e. Nobody knows exactly what the error is. To keep the model as simple as possible I’ll assume that e is drawn from a uniform distribution between -1 and 1.

Finally, assume that we are trying to select people above a certain threshold: Perhaps we are publishing in a journal, or hiring candidates for a job. Let’s call that threshold z. If x < z-1, then since e can never be larger than 1, we will immediately observe that they are below the threshold and reject them. If x > z+1, then since e can never be smaller than -1, we will immediately observe that they are above the threshold and accept them.

But when z-1 < x < z+1, we may think they are above the threshold when they actually are not (if e is positive), or think they are not above the threshold when they actually are (if e is negative).

So then let’s say that they can invest in signaling by putting some amount of visible work in y (like citing obscure papers or using complex jargon). This additional work may be costly and provide no real value in itself, but it can still be useful so long as one simple condition is met: It’s easier to do if your true knowledge x is high.

In fact, for this very simple model, let’s say that you are strictly limited by the constraint that y <= x. You can’t show off what you don’t know.

If your true value x > z, then you should choose y = x. Then, upon observing your signal, we know immediately that you must be above the threshold.

But if your true value x < z, then you should choose y = 0, because there’s no point in signaling that you were almost at the threshold. You’ll still get rejected.

Yet remember before that only those with z-1 < x < z+1 actually need to bother signaling at all. Those with x > z+1 can actually countersignal, by also choosing y = 0. Since you already have tenure, nobody doubts that you belong in the club.

This means we’ll end up with three groups: Those with x < z, who don’t signal and don’t get accepted; those with z < x < z+1, who signal and get accepted; and those with x > z+1, who don’t signal but get accepted. Then life will be hardest for those who are just above the threshold, who have to spend enormous effort signaling in order to get accepted—and that sure does sound like grad school.

You can make the model more sophisticated if you like: Perhaps the error isn’t uniformly distributed, but some other distribution with wider support (like a normal distribution, or a logistic distribution); perhaps the signalling isn’t perfect, but itself has some error; and so on. With such additions, you can get a result where the least-qualified still signal a little bit so they get some chance, and the most-qualified still signal a little bit to avoid a small risk of being rejected. But it’s a fairly general phenomenon that those closest to the threshold will be the ones who have to spend the most effort in signaling.

This reveals a disturbing overlap between the Curse of Knowledge and Impostor Syndrome: We write in impenetrable obfuscationist jargon because we are trying to conceal our own insecurity about our knowledge and our status in the profession. We’d rather you not know what we’re talking about than have you realize that we don’t know what we’re talking about.

For the truth is, we don’t know what we’re talking about. And neither do you, and neither does anyone else. This is the agonizing truth of research that nearly everyone doing research knows, but one must be either very brave, very foolish, or very well-established to admit out loud: It is in the nature of doing research on the frontier of human knowledge that there is always far more that we don’t understand about our subject than that we do understand.

I would like to be more open about that. I would like to write papers saying things like “I have no idea why it turned out this way; it doesn’t make sense to me; I can’t explain it.” But to say that the profession disincentivizes speaking this way would be a grave understatement. It’s more accurate to say that the profession punishes speaking this way to the full extent of its power. You’re supposed to have a theory, and it’s supposed to work. If it doesn’t actually work, well, maybe you can massage the numbers until it seems to, or maybe you can retroactively change the theory into something that does work. Or maybe you can just not publish that paper and write a different one.

Here is a graph of one million published z-scores in academic journals:

It looks like a bell curve, except that almost all the values between -2 and 2 are mysteriously missing.

If we were actually publishing all the good science that gets done, it would in fact be a very nice bell curve. All those missing values are papers that never got published, or results that were excluded from papers, or statistical analyses that were massaged, in order to get a p-value less than the magical threshold for publication of 0.05. (For the statistically uninitiated, a z-score less than -2 or greater than +2 generally corresponds to a p-value less than 0.05, so these are effectively the same constraint.)

I have literally never read a single paper published in an academic journal in the last 50 years that said in plain language, “I have no idea what’s going on here.” And yet I have read many papers—probably most of them, in fact—where that would have been an appropriate thing to say. It’s actually quite a rare paper, at least in the social sciences, that actually has a theory good enough to really precisely fit the data and not require any special pleading or retroactive changes. (Often the bar for a theory’s success is lowered to “the effect is usually in the right direction”.) Typically results from behavioral experiments are bizarre and baffling, because people are a little screwy. It’s just that nobody is willing to stake their career on being that honest about the depth of our ignorance.

This is a deep shame, for the greatest advances in human knowledge have almost always come from people recognizing the depth of their ignorance. Paradigms never shift until people recognize that the one they are using is defective.

This is why it’s so hard to beat the Curse of Knowledge: You need to signal that you know what you’re talking about, and the truth is you probably don’t, because nobody does. So you need to sound like you know what you’re talking about in order to get people to listen to you. You may be doing nothing more than educated guesses based on extremely limited data, but that’s actually the best anyone can do; those other people saying they have it all figured out are either doing the same thing, or they’re doing something even less reliable than that. So you’d better sound like you have it all figured out, and that’s a lot more convincing when you “utilize a murian model” than when you “use rats and mice”.

Perhaps we can at least push a little bit toward plainer language. It helps to be addressing a broader audience: it is both blessing and curse that whatever I put on this blog is what you will read, without any gatekeepers in my path. I can use plainer language here if I so choose, because no one can stop me. But of course there’s a signaling risk here as well: The Internet is a public place, and potential employers can read this as well, and perhaps decide they don’t like me speaking so plainly about the deep flaws in the academic system. Maybe I’d be better off keeping my mouth shut, at least for awhile. I’ve never been very good at keeping my mouth shut.

Once we get established in the system, perhaps we can switch to countersignaling, though even this doesn’t always happen. I think there are two reasons this can fail: First, you can almost always try to climb higher. Once you have tenure, aim for an endowed chair. Once you have that, try to win a Nobel. Second, once you’ve spent years of your life learning to write in a particular stilted, obscurantist, jargon-ridden way, it can be very difficult to change that habit. People have been rewarding you all your life for writing in ways that make your work unreadable; why would you want to take the risk of suddenly making it readable?

I don’t have a simple solution to this problem, because it is so deeply embedded. It’s not something that one person or even a small number of people can really fix. Ultimately we will need to, as a society, start actually rewarding people for speaking plainly about what they don’t know. Admitting that you have no clue will need to be seen as a sign of wisdom and honesty rather than a sign of foolishness and ignorance. And perhaps even that won’t be enough: Because the fact will still remain that knowing what you know that other people don’t know is a very difficult thing to do.

Motivation under trauma

May 3 JDN 2458971

Whenever I ask someone how they are doing lately, I get the same answer: “Pretty good, under the circumstances.” There seems to be a general sense that—at least among the sort of people I interact with regularly—that our own lives are still proceeding more or less normally, as we watch in horror the crises surrounding us. Nothing in particular is going wrong for us specifically. Everything is fine, except for the things that are wrong for everyone everywhere.

One thing that seems to be particularly difficult for a lot of us is the sense that we suddenly have so much time on our hands, but can’t find the motivation to actually use this time productively. So many hours of our lives were wasted on commuting or going to meetings or attending various events we didn’t really care much about but didn’t want to feel like we had missed out on. But now that we have these hours back, we can’t find the strength to use them well.

This is because we are now, as an entire society, experiencing a form of trauma. One of the most common long-term effects of post-traumatic stress disorder is a loss of motivation. Faced with suffering we have no power to control, we are made helpless by this traumatic experience; and this makes us learn to feel helpless in other domains.

There is a classic experiment about learned helplessness; like many old classic experiments, its ethics are a bit questionable. Though unlike many such experiments (glares at Zimbardo), its experimental rigor was ironclad. Dogs were divided into three groups. Group 1 was just a control, where the dogs were tied up for a while and then let go. Dogs in groups 2 and 3 were placed into a crate with a floor that could shock them. Dogs in group 2 had a lever they could press to make the shocks stop. Dogs in group 3 did not. (They actually gave the group 2 dogs control over the group 3 dogs to make the shock times exactly equal; but the dogs had no way to know that, so as far as they knew the shocks ended at random.)

Later, dogs from both groups were put into another crate, where they no longer had a lever to press, but they could jump over a barrier to a different part of the crate where the shocks wouldn’t happen. The dogs from group 2, who had previously had some control over their own pain, were able to quickly learn to do this. The dogs from group 3, who had previously felt pain apparently at random, had a very hard time learning this, if they could ever learn it at all. They’d just lay there and suffer the shocks, unable to bring themselves to even try to leap the barrier.

The group 3 dogs just knew there was nothing they could do. During their previous experience of the trauma, all their actions were futile, and so in this new trauma they were certain that their actions would remain futile. When nothing you do matters, the only sensible thing to do is nothing; and so they did. They had learned to be helpless.

I think for me, chronic migraines were my first crate. For years of my life there was basically nothing I could do to prevent myself from getting migraines—honestly the thing that would have helped most would have been to stop getting up for high school that started at 7:40 AM every morning. Eventually I found a good neurologist and got various treatments, as well as learned about various triggers and found ways to avoid most of them. (Let me know if you ever figure out a way to avoid stress.) My migraines are now far less frequent than they were when I was a teenager, though they are still far more frequent than I would prefer.

Yet, I think I still have not fully unlearned the helplessness that migraines taught me. Every time I get another migraine despite all the medications I’ve taken and all the triggers I’ve religiously avoided, this suffering beyond my control acts as another reminder of the ultimate caprice of the universe. There are so many things in our lives that we cannot control that it can be easy to lose sight of what we can.

This pandemic is a trauma that the whole world is now going through. And perhaps that unity of experience will ultimately save us—it will make us see the world and each other a little differently than we did before.

There are a few things you can do to reduce your own risk of getting or spreading the COVID-19 infection, like washing your hands regularly, avoiding social contact, and wearing masks when you go outside. And of course you should do these things. But the truth really is that there is very little any one of us can do to stop this global pandemic. We can watch the numbers tick up almost in real-time—as of this writing, 1 million cases and over 50,000 deaths in the US, 3 million cases and over 200,000 deaths worldwide—but there is very little we can do to change those numbers.

Sometimes we really are helpless. The challenge we face is not to let this genuine helplessness bleed over and make us feel helpless about other aspects of our lives. We are currently sitting in a crate with no lever, where the shocks will begin and end beyond our control. But the day will come when we are delivered to a new crate, and given the chance to leap over a barrier; we must find the strength to take that leap.

For now, I think we can forgive ourselves for getting less done than we might have hoped. We’re still not really out of that first crate.

Do I want to stay in academia?

Apr 5 JDN 2458945

This is a very personal post. You’re not going to learn any new content today; but this is what I needed to write about right now.

I am now nearly finished with my dissertation. It only requires three papers (which, quite honestly, have very little to do with one another). I just got my second paper signed off on, and my third is far enough along that I can probably finish it in a couple of months.

I feel like I ought to be more excited than I am. Mostly what I feel right now is dread.

Yes, some of that dread is the ongoing pandemic—though I am pleased to report that the global number of cases of COVID-19 has substantially undershot the estimates I made last week, suggesting that at least most places are getting the virus under control. The number of cases and number of deaths has about doubled in the past week, which is a lot better than doubling every two days as it was at the start of the pandemic. And that’s all I want to say about COVID-19 today, because I’m sure you’re as tired of the wall-to-wall coverage of it as I am.

But most of the dread is about my own life, mainly my career path. More and more I’m finding that the world of academic research just isn’t working for me. The actual research part I like, and I’m good at it; but then it comes time to publish, and the journal system is so fundamentally broken, so agonizingly capricious, and has such ludicrous power over the careers of young academics that I’m really not sure I want to stay in this line of work. I honestly think I’d prefer they just flip a coin when you graduate and you get a tenure-track job if you get heads. Or maybe journals could roll a 20-sided die for each paper submitted and publish the papers that get 19 or 20. At least then the powers that be couldn’t convince themselves that their totally arbitrary and fundamentally unjust selection process was actually based on deep wisdom and selecting the most qualified individuals.

In any case I’m fairly sure at this point that I won’t have any publications in peer-reviewed journals by the time I graduate. It’s possible I still could—I actually still have decent odds with two co-authored papers, at least—but I certainly do not expect to. My chances of getting into a top journal at this point are basically negligible.

If I weren’t trying to get into academia, that fact would be basically irrelevant. I think most private businesses and government agencies are fairly well aware of the deep defects in the academic publishing system, and really don’t put a whole lot of weight on its conclusions. But in academia, publication is everything. Specifically, publication in top journals.

For this reason, I am now seriously considering leaving academia once I graduate. The more contact I have with the academic publishing system the more miserable I feel. The idea of spending another six or seven years desperately trying to get published in order to satisfy a tenure committee sounds about as appealing right now as having my fingernails pulled out one by one.

This would mean giving up on a lifelong dream. It would mean wondering why I even bothered with the PhD, when the first MA—let alone the second—would probably have been enough for most government or industry careers. And it means trying to fit myself into a new mold that I may find I hate just as much for different reasons: A steady 9-to-5 work schedule is a lot harder to sustain when waking up before 10 AM consistently gives you migraines. (In theory, there are ways to get special accommodations for that sort of thing; in practice, I’m sure most employers would drag their feet as much as possible, because in our culture a phase-delayed circadian rhythm is tantamount to being lazy and therefore worthless.)

Or perhaps I should aim for a lecturer position, perhaps at a smaller college, that isn’t so obsessed with research publication. This would still dull my dream, but would not require abandoning it entirely.

I was asked a few months ago what my dream job is, and I realized: It is almost what I actually have. It is so tantalizingly close to what I am actually headed for that it is painful. The reality is a twisted mirror of the dream.

I want to teach. I want to do research. I want to write. And I get to do those things, yes. But I want to them without the layers of bureaucracy, without the tiers of arbitrary social status called ‘prestige’, without the hyper-competitive and capricious system of journal publication. Honestly I want to do them without grading or dealing with publishers at all—though I can at least understand why some mechanisms for evaluating student progress and disseminating research are useful, even if our current systems for doing so are fundamentally defective.

It feels as though I have been running a marathon, but was only given a vague notion of the route beforehand. There were a series of flags to follow: This way to the bachelor’s, this way to the master’s, that way to advance to candidacy. Then when I come to the last set of flags, the finish line now visible at the horizon, I see that there is an obstacle course placed in my way, with obstacles I was never warned about, much less trained for. A whole new set of skills, maybe even a whole different personality, is necessary to surpass these new obstacles, and I feel utterly unprepared.

It is as if the last mile of my marathon must bedone on horseback, and I’ve never learned to ride a horse—no one ever told me I would need to ride a horse. (Or maybe they did and I didn’t listen?) And now every time I try to mount one, I fall off immediately; and the injuries I sustain seem to be worse every time. The bruises I thought would heal only get worse. The horses I must ride are research journals, and the injuries when I fall are psychological—but no less real, all too real. With each attempt I keep hoping that my fear will fade, but instead it only intensifies.

It’s the same pain, the same fear, that pulled me away from fiction writing. I want to go back, I hope to go back—but I am not strong enough now, and cannot be sure I ever will be. I was told that working in a creative profession meant working hard and producing good output; it turns out it doesn’t mean that at all. A successful career in a creative field actually means satisfying the arbitrary desires of a handful of inscrutable gatekeepers. It means rolling the dice over, and over, and over again, each time a little more painful than the last. And it turns out that this just isn’t something I’m good at. It’s not what I’m cut out for. And maybe it never will be.

An incompetent narcissist would surely fare better than I, willing to re-submit whatever refuse they produce a thousand times because they are certain they deserve to succeed. For, deep down, I never feel that I deserve it. Others tell me I do, and I try to believe them; but the only validation that feels like it will be enough is the kind that comes directly from those gatekeepers, the kind that I can never get. And truth be told, maybe if I do finally get that, it still won’t be enough. Maybe nothing ever will be.

If I knew that it would get easier one day, that the pain would, if not go away, at least retreat to a dull roar I could push aside, then maybe I could stay on this path. But this cannot be the rest of my life. If this is really what it means to have an academic career, maybe I don’t want one after all.

Or maybe it’s not academia that’s broken. Maybe it’s just me.

Reflections on Past and Future

Jan 19 JDN 2458868

This post goes live on my birthday. Unfortunately, I won’t be able to celebrate much, as I’ll be in the process of moving. We moved just a few months ago, and now we’re moving again, because this apartment turned out to be full of mold that keeps triggering my migraines. Our request for a new apartment was granted, but the university housing system gives very little time to deal with such things: They told us on Tuesday that we needed to commit by Wednesday, and then they set our move-in date for that Saturday.

Still, a birthday seems like a good time to reflect on how my life is going, and where I want it to go next. As for how old I am? This is the probably the penultimate power of two I’ll reach.

The biggest change in my life over the previous year was my engagement. Our wedding will be this October. (We have the venue locked in; invitations are currently in the works.) This was by no means unanticipated; really, folks had been wondering when we’d finally get around to it. Yet it still feels strange, a leap headlong into adulthood for someone of a generation that has been saddled with a perpetual adolescence. The articles on “Millennials” talking about us like we’re teenagers still continue, despite the fact that there are now Millenials with college-aged children. Thanks to immigration and mortality, we now outnumber Boomers. Based on how each group voted in 2016, this bodes well for the 2020 election. (Then again, a lot of young people stay home on Election Day.)

I don’t doubt that graduate school has contributed to this feeling of adolescence: If we count each additional year of schooling as a grade, I would now be in the 22nd grade. Yet from others my age, even those who didn’t go to grad school, I’ve heard similar experiences about getting married, buying homes, or—especially—having children of their own: Society doesn’t treat us like adults, so we feel strange acting like adults. 30 is the new 23.

Perhaps as life expectancy continues to increase and educational attainment climbs ever higher, future generations will continue to experience this feeling ever longer, until we’re like elves in a Tolkienesque fantasy setting, living to 1000 but not considered a proper adult until we hit 100. I wonder if people will still get labeled by generation when there are 40 generations living simultaneously, or if we’ll find some other category system to stereotype by.

Another major event in my life this year was the loss of our cat Vincent. He was quite old by feline standards, and had been sick for a long time; so his demise was not entirely unexpected. Still, it’s never easy to lose a loved one, even if they are covered in fur and small enough to fit under an airplane seat.

Most of the rest of my life has remained largely unchanged: Still in grad school, still living in the same city, still anxious about my uncertain career prospects. Trump is still President, and still somehow managing to outdo his own high standards of unreasonableness. I do feel some sense of progress now, some glimpses of the light at the end of the tunnel. I can vaguely envision finishing my dissertation some time this year, and I’m hoping that in a couple years I’ll have settled into a job that actually pays well enough to start paying down my student loans, and we’ll have a good President (or at least Biden).

I’ve reached the point where people ask me what I am going to do next with my life. I want to give an answer, but the problem is, this is almost entirely out of my control. I’ll go wherever I end up getting job offers. Based on the experience of past cohorts, most people seem to apply to about 200 positions, interview for about 20, and get offers from about 2. So asking me where I’ll work in five years is like asking me what number I’m going to roll on a 100-sided die. I could probably tell you what order I would prioritize offers in, more or less; but even that would depend a great deal on the details. There are difficult tradeoffs to be made: Take a private sector offer with higher pay, or stay in academia for more autonomy and security? Accept a postdoc or adjunct position at a prestigious university, or go for an assistant professorship at a lower-ranked college?

I guess I can say that I do still plan to stay in academia, though I’m less certain of that than I once was; I will definitely cast a wider net. I suppose the job market isn’t like that for most people? I imagine most people at least know what city they’ll be living in. (I’m not even positive what country—opportunities for behavioral economics actually seem to be generally better in Europe and Australia than they are in the US.)

But perhaps most people simply aren’t as cognizant of how random and contingent their own career paths truly were. The average number of job changes per career is 12. You may want to think that you chose where you ended up, but for the most part you landed where the wind blew you. This can seem tragic in a way, but it is also a call for compassion: “There but for the grace of God go I.”

Really, all I can do now is hang on and try to enjoy the ride.

Darkest Before the Dawn: Bayesian Impostor Syndrome

Jan 12 JDN 2458860

At the time of writing, I have just returned from my second Allied Social Sciences Association Annual Meeting, the AEA’s annual conference (or AEA and friends, I suppose, since there several other, much smaller economics and finance associations are represented as well). This one was in San Diego, which made it considerably cheaper for me to attend than last year’s. Alas, next year’s conference will be in Chicago. At least flights to Chicago tend to be cheap because it’s a major hub.

My biggest accomplishment of the conference was getting some face-time and career advice from Colin Camerer, the Caltech economist who literally wrote the book on behavioral game theory. Otherwise I would call the conference successful, but not spectacular. Some of the talks were much better than others; I think I liked the one by Emmanuel Saez best, and I also really liked the one on procrastination by Matthew Gibson. I was mildly disappointed by Ben Bernanke’s keynote address; maybe I would have found it more compelling if I were more focused on macroeconomics.

But while sitting through one of the less-interesting seminars I had a clever little idea, which may help explain why Impostor Syndrome seems to occur so frequently even among highly competent, intelligent people. This post is going to be more technical than most, so be warned: Here There Be Bayes. If you fear yon algebra and wish to skip it, I have marked below a good place for you to jump back in.

Suppose there are two types of people, high talent H and low talent L. (In reality there is of course a wide range of talents, so I could assign a distribution over that range, but it would complicate the model without really changing the conclusions.) You don’t know which one you are; all you know is a prior probability h that you are high-talent. It doesn’t matter too much what h is, but for concreteness let’s say h = 0.50; you’ve got to be in the top 50% to be considered “high-talent”.

You are engaged in some sort of activity that comes with a high risk of failure. Many creative endeavors fit this pattern: Perhaps you are a musician looking for a producer, an actor looking for a gig, an author trying to secure an agent, or a scientist trying to publish in a journal. Or maybe you’re a high school student applying to college, or a unemployed worker submitting job applications.

If you are high-talent, you’re more likely to succeed—but still very likely to fail. And even low-talent people don’t always fail; sometimes you just get lucky. Let’s say the probability of success if you are high-talent is p, and if you are low-talent, the probability of success is q. The precise value depends on the domain; but perhaps p = 0.10 and q = 0.02.

Finally, let’s suppose you are highly rational, a good and proper Bayesian. You update all your probabilities based on your observations, precisely as you should.

How will you feel about your talent, after a series of failures?

More precisely, what posterior probability will you assign to being a high-talent individual, after a series of n+k attempts, of which k met with success and n met with failure?

Since failure is likely even if you are high-talent, you shouldn’t update your probability too much on a failurebut each failure should, in fact, lead to revising your probability downward.

Conversely, since success is rare, it should cause you to revise your probability upward—and, as will become important, your revisions upon success should be much larger than your revisions upon failure.

We begin as any good Bayesian does, with Bayes’ Law:

P[H|(~S)^n (S)^k] = P[(~S)^n (S)^k|H] P[H] / P[(~S)^n (S)^k]

In words, this reads: The posterior probability of being high-talent, given that you have observed k successes and n failures, is equal to the probability of observing such an outcome, given that you are high-talent, times the prior probability of being high-skill, divided by the prior probability of observing such an outcome.

We can compute the probabilities on the right-hand side using the binomial distribution:

P[H] = h

P[(~S)^n (S)^k|H] = (n+k C k) p^k (1-p)^n

P[(~S)^n (S)^k] = (n+k C k) p^k (1-p)^n h + (n+k C k) q^k (1-q)^n (1-h)

Plugging all this back in and canceling like terms yields:

P[H|(~S)^n (S)^k] = 1/(1 + [1-h/h] [q/p]^k [(1-q)/(1-p)]^n)

This turns out to be particularly convenient in log-odds form:

L[X] = ln [ P(X)/P(~X) ]

L[(~S)^n) (S)^k|H] = ln [h/(1-h)] + k ln [p/q] + n ln [(1-p)/(1-q)]

Since p > q, ln[p/q] is a positive number, while ln[(1-p)/(1-q)] is a negative number. This corresponds to the fact that you will increase your posterior when you observe a success (k increases by 1) and decrease your posterior when you observe a failure (n increases by 1).

But when p and q are small, it turns out that ln[p/q] is much larger in magnitude than ln[(1-p)/(1-q)]. For the numbers I gave above, p = 0.10 and q = 0.02, ln[p/q] = 1.609 while ln[(1-p)/(1-q)] = -0.085. You will therefore update substantially more upon a success than on a failure.

Yet successes are rare! This means that any given success will most likely be first preceded by a sequence of failures. This results in what I will call the darkest-before-dawn effect: Your opinion of your own talent will tend to be at its very worst in the moments just preceding a major success.

I’ve graphed the results of a few simulations illustrating this: On the X-axis is the number of overall attempts made thus far, and on the Y-axis is the posterior probability of being high-talent. The simulated individual undergoes randomized successes and failures with the probabilities I chose above.

Bayesian_Impostor_full

There are 10 simulations on that one graph, which may make it a bit confusing. So let’s focus in on two runs in particular, which turned out to be run 6 and run 10:

[If you skipped over the math, here’s a good place to come back. Welcome!]

Bayesian_Impostor_focus

Run 6 is a lucky little devil. They had an immediate success, followed by another success in their fourth attempt. As a result, they quickly update their posterior to conclude that they are almost certainly a high-talent individual, and even after a string of failures beyond that they never lose faith.

Run 10, on the other hand, probably has Impostor Syndrome. Failure after failure after failure slowly eroded their self-esteem, leading them to conclude that they are probably a low-talent individual. And then, suddenly, a miracle occurs: On their 20th attempt, at last they succeed, and their whole outlook changes; perhaps they are high-talent after all.

Note that all the simulations are of high-talent individuals. Run 6 and run 10 are equally competent. Ex ante, the probability of success for run 6 and run 10 was exactly the same. Moreover, both individuals are completely rational, in the sense that they are doing perfect Bayesian updating.

And yet, if you compare their self-evaluations after the 19th attempt, they could hardly look more different: Run 6 is 85% sure that they are high-talent, even though they’ve been in a slump for the last 13 attempts. Run 10, on the other hand, is 83% sure that they are low-talent, because they’ve never succeeded at all.

It is darkest just before the dawn: Run 10’s self-evaluation is at its very lowest right before they finally have a success, at which point their self-esteem surges upward, almost to baseline. With just one more success, their opinion of themselves would in fact converge to the same as Run 6’s.

This may explain, at least in part, why Impostor Syndrome is so common. When successes are few and far between—even for the very best and brightest—then a string of failures is the most likely outcome for almost everyone, and it can be difficult to tell whether you are so bright after all. Failure after failure will slowly erode your self-esteem (and should, in some sense; you’re being a good Bayesian!). You’ll observe a few lucky individuals who get their big break right away, and it will only reinforce your fear that you’re not cut out for this (whatever this is) after all.

Of course, this model is far too simple: People don’t just come in “talented” and “untalented” varieties, but have a wide range of skills that lie on a continuum. There are degrees of success and failure as well: You could get published in some obscure field journal hardly anybody reads, or in the top journal in your discipline. You could get into the University of Northwestern Ohio, or into Harvard. And people face different barriers to success that may have nothing to do with talent—perhaps why marginalized people such as women, racial minorities, LGBT people, and people with disabilities tend to have the highest rates of Impostor Syndrome. But I think the overall pattern is right: People feel like impostors when they’ve experienced a long string of failures, even when that is likely to occur for everyone.

What can be done with this information? Well, it leads me to three pieces of advice:

1. When success is rare, find other evidence. If truly “succeeding” (whatever that means in your case) is unlikely on any given attempt, don’t try to evaluate your own competence based on that extremely noisy signal. Instead, look for other sources of data: Do you seem to have the kinds of skills that people who succeed in your endeavors have—preferably based on the most objective measures you can find? Do others who know you or your work have a high opinion of your abilities and your potential? This, perhaps is the greatest mistake we make when falling prey to Impostor Syndrome: We imagine that we have somehow “fooled” people into thinking we are competent, rather than realizing that other people’s opinions of us are actually evidence that we are in fact competent. Use this evidence. Update your posterior on that.

2. Don’t over-update your posterior on failures—and don’t under-update on successes. Very few living humans (if any) are true and proper Bayesians. We use a variety of heuristics when judging probability, most notably the representative and availability heuristics. These will cause you to over-respond to failures, because this string of failures makes you “look like” the kind of person who would continue to fail (representative), and you can’t conjure to mind any clear examples of success (availability). Keeping this in mind, your update upon experiencing failure should be small, probably as small as you can make it. Conversely, when you do actually succeed, even in a small way, don’t dismiss it. Don’t look for reasons why it was just luck—it’s always luck, at least in part, for everyone. Try to update your self-evaluation more when you succeed, precisely because success is rare for everyone.

3. Don’t lose hope. The next one really could be your big break. While astronomically baffling (no, it’s darkest at midnight, in between dusk and dawn!), “it is always darkest before the dawn” really does apply here. You are likely to feel the worst about yourself at the very point where you are about to finally succeed. The lowest self-esteem you ever feel will be just before you finally achieve a major success. Of course, you can’t know if the next one will be it—or if it will take five, or ten, or twenty more tries. And yes, each new failure will hurt a little bit more, make you doubt yourself a little bit more. But if you are properly grounded by what others think of your talents, you can stand firm, until that one glorious day comes and you finally make it.

Now, if I could only manage to take my own advice….

Unsolved problems

Oct 20 JDN 2458777

The beauty and clearness of the dynamical theory, which asserts heat and light to be modes of motion, is at present obscured by two clouds. The first came into existence with the undulatory theory of light, and was dealt with by Fresnel and Dr. Thomas Young; it involved the question, how could the earth move through an elastic solid, such as essentially is the luminiferous ether? The second is the Maxwell-Boltzmann doctrine regarding the partition of energy.


~ Lord Kelvin, April 27, 1900

The above quote is part of a speech where Kelvin basically says that physics is a completed field, with just these two little problems to clear up, “two clouds” in a vast clear horizon. Those “two clouds” Kelvin talked about, regarding the ‘luminiferous ether’ and the ‘partition of energy’? They are, respectively, relativity and quantum mechanics. Almost 120 years later we still haven’t managed to really solve them, at least not in a way that works consistently as part of one broader theory.

But I’ll give Kelvin this: He knew where the problems were. He vastly underestimated how complex and difficult those problems would be, but he knew where they were.

I’m not sure I can say the same about economists. We don’t seem to have even reached the point where we agree where the problems are. Consider another quotation:

For a long while after the explosion of macroeconomics in the 1970s, the field looked like a battlefield. Over time however, largely because facts do not go away, a largely shared vision both of fluctuations and of methodology has emerged. Not everything is fine. Like all revolutions, this one has come with the destruction of some knowledge, and suffers from extremism and herding. None of this deadly however. The state of macro is good.


~ Oliver Blanchard, 2008

The timing of Blanchard’s remark is particularly ominous: It is much like the turkey who declares, the day before Thanksgiving, that his life is better than ever.

But the content is also important: Blanchard didn’t say that microeconomics is in good shape (which I think one could make a better case for). He didn’t even say that economics, in general, is in good shape. He specifically said, right before the greatest economic collapse since the Great Depression, that macroeconomics was in good shape. He didn’t merely underestimate the difficulty of the problem; he didn’t even see where the problem was.

If you search the Web, you can find a few lists of unsolved problems in economics. Wikipedia has such a list that I find particularly bad; Mike Moffatt offers a better list that still has significant blind spots.

Wikipedia’s list is full of esoteric problems that require deeply faulty assumptions to even exist, like the ‘American option problem’ which assumes that the Black-Scholes model is even remotely an accurate description of how option prices work, or the ‘tatonnement problem’ which ignores the fact that there may be many equilibria and we might never reach one at all, or the problem they list under ‘revealed preferences’ which doesn’t address any of the fundamental reasons why the entire concept of revealed preferences may fail once we apply a realistic account of cognitive science. (I could go pretty far afield with that last one—and perhaps I will in a later post—but for now, suffice it to say that human beings often freely choose to do things that we later go on to regret.) I think the only one that Wikipedia’s list really gets right is Unified models of human biases’. The ‘home bias in trade’ and ‘Feldstein-Horioka Puzzle’ problems are sort of edging toward genuine problems, but they’re bound up in too many false assumptions to really get at the right question, which is actually something like “How do we deal with nationalism?” Referring to the ‘Feldstein-Horioka Puzzle’ misses the forest for the trees. Likewise, the ‘PPP Puzzle’ and the ‘Exchange rate disconnect puzzle’ (and to some extent the ‘equity premium puzzle’ as well) are really side effects of a much deeper problem, which is that financial markets in general are ludicrously volatile and inefficient and we have no idea why.

And Wikipedia’s list doesn’t have some of the largest, most important problems in economics. Moffatt’s list does better, including good choices like “What Caused the Industrial Revolution?”, “What Is the Proper Size and Scope of Government?”, and “What Truly Caused the Great Depression?”, but it also includes some of the more esoteric problems like the ‘equity premium puzzle’ and the ‘endogeneity of money’. The way he states the problem “What Causes the Variation of Income Among Ethnic Groups?” suggests that he doesn’t quite understand what’s going on there either. More importantly, Moffatt still leaves out very obviously important questions like “How do we achieve economic development in poor countries?” (Or as I sometimes put it, “What did South Korea do from 1950 to 2000, and how can we do it again?”), “How do we fix shortages of housing and other necessities?”, “What is causing the global rise of income and wealth inequality?”, “How altruistic are human beings, to whom, and under what conditions?” and “What makes financial markets so unstable?” Ironically, ‘Unified models of human biases’, the one problem that Wikipedia got right, is missing from Moffatt’s list.

And I’m also humble enough to realize that some of the deepest problems in economics may be ones that we don’t even quite know how to formulate yet. We like to pretend that economics is a mature science, almost on the coattails of physics; but it’s really a very young science, more like psychology. We go through these ‘cargo cult science‘ rituals of p-values and econometric hypothesis tests, but there are deep, basic forces we don’t understand. We have precisely prepared all the apparatus for the detection of the phlogiston, and by God, we’ll get that 0.05 however we have to. (Think I’m being too harsh? “Real Business Cycle” theory essentially posits that the Great Depression was caused by everyone deciding that they weren’t going to work for a few years, and as whole countries fell into the abyss from failing financial markets, most economists still clung to the Efficient Market Hypothesis.) Our whole discipline requires major injections of intellectual humility: We not only don’t have all the answers; we’re not even sure we have all the questions.

I think the esoteric nature of questions like ‘the equity premium puzzle’ and the ‘tatonnement problem‘ is precisely the source of their appeal: It’s the sort of thing you can say you’re working on and sound very smart, because the person you’re talking to likely has no idea what you’re talking about. (Or else they are a fellow economist, and thus in on the con.) If you said that you’re trying to explain why poor countries are poor and why rich countries are rich—and if economics isn’t doing that, then what in the world are we doing?you’d have to admit that we honestly have only the faintest idea, and that millions of people have suffered from bad advice economists gave their governments based on ideas that turned out to be wrong.

It’s really quite problematic how closely economists are tied to policymaking (except when we do really know what we’re talking about?). We’re trying to do engineering without even knowing physics. Maybe there’s no way around it: We have to make some sort of economic policy, and it makes more sense to do it based on half-proven ideas than on completely unfounded ideas. (Engineering without physics worked pretty well for the Romans, after all.) But it seems to me that we could be relying more, at least for the time being, on the experiences and intuitions of the people who have worked on the ground, rather than on sophisticated theoretical models that often turn out to be utterly false. We could eschew ‘shock therapy‘ approaches that try to make large interventions in an economy all at once, in favor of smaller, subtler adjustments whose consequences are more predictable. We could endeavor to focus on the cases where we do have relatively clear knowledge (like rent control) and avoid those where the uncertainty is greatest (like economic development).

At the very least, we could admit what we don’t know, and admit that there is probably a great deal we don’t know that we don’t know.

Why do we need “publish or perish”?

June 23 JDN 2458658

This question may seem a bit self-serving, coming from a grad student who is struggling to get his first paper published in a peer-reviewed journal. But given the deep structural flaws in the academic publishing system, I think it’s worth taking a step back to ask just what peer-reviewed journals are supposed to be accomplishing.

The argument is often made that research journals are a way of sharing knowledge. If this is their goal, they have utterly and totally failed. Most papers are read by only a handful of people. When scientists want to learn about the research their colleagues are doing, they don’t read papers; they go to conferences to listen to presentations and look at posters. The way papers are written, they are often all but incomprehensible to anyone outside a very narrow subfield. When published by proprietary journals, papers are often hidden behind paywalls and accessible only through universities. As a knowledge-sharing mechanism, the peer-reviewed journal is a complete failure.

But academic publishing serves another function, which in practice is its only real function: Peer-reviewed publications are a method of evaluation. They are a way of deciding which researchers are good enough to be hired, get tenure, and receive grants. Having peer-reviewed publications—particularly in “top journals”, however that is defined within a given field—is a key metric that universities and grant agencies use to decide which researchers are worth spending on. Indeed, in some cases it seems to be utterly decisive.

We should be honest about this: This is an absolutely necessary function. It is uncomfortable to think about the fact that we must exclude a large proportion of competent, qualified people from being hired or getting tenure in academia, but given the large number of candidates and the small amounts of funding available, this is inevitable. We can’t hire everyone who would probably be good enough. We can only hire a few, and it makes sense to want those few to be the best. (Also, don’t fret too much: Even if you don’t make it into academia, getting a PhD is still a profitable investment. Economists and natural scientists do the best, unsurprisingly; but even humanities PhDs are still generally worth it. Median annual earnings of $77,000 is nothing to sneeze at: US median household income is only about $60,000. Humanities graduates only seem poor in relation to STEM or professional graduates; they’re still rich compared to everyone else.)

But I think it’s worth asking whether the peer review system is actually selecting the best researchers, or even the best research. Note that these are not the same question: The best research done in graduate school might not necessarily reflect the best long-run career trajectory for a researcher. A lot of very important, very difficult questions in science are just not the sort of thing you can get a convincing answer to in a couple of years, and so someone who wants to work on the really big problems may actually have a harder time getting published in graduate school or as a junior faculty member, even though ultimately work on the big problems is what’s most important for society. But I’m sure there’s a positive correlation overall: The kind of person who is going to do better research later is probably, other things equal, going to do better research right now.

Yet even accepting the fact that all we have to go on in assessing what you’ll eventually do is what you have already done, it’s not clear that the process of publishing in a peer-reviewed journal is a particularly good method of assessing the quality of research. Some really terrible research has gotten published in journals—I’m gonna pick on Daryl Bem, because he’s the worst—and a lot of really good research never made it into journals and is languishing on old computer hard drives. (The term “file drawer problem” is about 40 years obsolete; though to be fair, it was in fact coined about 40 years ago.)

That by itself doesn’t actually prove that journals are a bad mechanism. Even a good mechanism, applied to a difficult problem, is going to make some errors. But there are a lot of things about academic publishing, at least as currently constituted, that obviously don’t seem like a good mechanism, such as for-profit publishers, unpaid reviewiers, lack of double-blinded review, and above all, the obsession with “statistical significance” that leads to p-hacking.

Each of these problems I’ve listed has a simple fix (though whether the powers that be actually are willing to implement it is a different question: Questions of policy are often much easier to solve than problems of politics). But maybe we should ask whether the system is even worth fixing, or if it should simply be replaced entirely.

While we’re at it, let’s talk about the academic tenure system, because the peer-review system is largely an evaluation mechanism for the academic tenure system. Publishing in top journals is what decides whether you get tenure. The problem with “Publish or perish” isn’t the “publish”; it’s the perish”. Do we even need an academic tenure system?

The usual argument for academic tenure concerns academic freedom: Tenured professors have job security, so they can afford to say things that may be controversial or embarrassing to the university. But the way the tenure system works is that you only have this job security after going through a long and painful gauntlet of job insecurity. You have to spend several years prostrating yourself to the elders of your field before you can get inducted into their ranks and finally be secure.

Of course, job insecurity is the norm, particularly in the United States: Most employment in the US is “at-will”, meaning essentially that your employer can fire you for any reason at any time. There are specifically illegal reasons for firing (like gender, race, and religion); but it’s extremely hard to prove wrongful termination when all the employer needs to say is, “They didn’t do a good job” or “They weren’t a team player”. So I can understand how it must feel strange for a private-sector worker who could be fired at any time to see academics complain about the rigors of the tenure system.

But there are some important differences here: The academic job market is not nearly as competitive as the private sector job market. There simply aren’t that many prestigious universities, and within each university there are only a small number of positions to fill. As a result, universities have an enormous amount of power over their faculty, which is why they can get away with paying adjuncts salaries that amount to less than minimum wage. (People with graduate degrees! Making less than minimum wage!) At least in most private-sector labor markets in the US, the market is competitive enough that if you get fired, you can probably get hired again somewhere else. In academia that’s not so clear.

I think what bothers me the most about the tenure system is the hierarchical structure: There is a very sharp divide between those who have tenure, those who don’t have it but can get it (“tenure-track”), and those who can’t get it. The lines between professor, associate professor, assistant professor, lecturer, and adjunct are quite sharp. The higher up you are, the more job security you have, the more money you make, and generally the better your working conditions are overall. Much like what makes graduate school so stressful, there are a series of high-stakes checkpoints you need to get through in order to rise in the ranks. And several of those checkpoints are based largely, if not entirely, on publication in peer-reviewed journals.

In fact, we are probably stressing ourselves out more than we need to. I certainly did for my advancement to candidacy; I spent two weeks at such a high stress level I was getting migraines every single day (clearly on the wrong side of the Yerkes-Dodson curve), only to completely breeze through the exam.

I think I might need to put this up on a wall somewhere to remind myself:

Most grad students complete their degrees, and most assistant professors get tenure.

The real filters are admissions and hiring: Most applications to grad school are rejected (though probably most graduate students are ultimately accepted somewhere—I couldn’t find any good data on that in a quick search), and most PhD graduates do not get hired on the tenure track. But if you can make it through those two gauntlets, you can probably make it through the rest.

In our current system, publications are a way to filter people, because the number of people who want to become professors is much higher than the number of professor positions available. But as an economist, this raises a very big question: Why aren’t salaries falling?

You see, that’s how markets are supposed to work: When supply exceeds demand, the price is supposed to fall until the market clears. Lower salaries would both open up more slots at universities (you can hire more faculty with the same level of funding) and shift some candidates into other careers (if you can get paid a lot better elsewhere, academia may not seem so attractive). Eventually there should be a salary point at which demand equals supply. So why aren’t we reaching it?

Well, it comes back to that tenure system. We can’t lower the salaries of tenured faculty, not without a total upheaval of the current system. So instead what actually happens is that universities switch to using adjuncts, who have very low salaries indeed. If there were no tenure, would all faculty get paid like adjuncts? No, they wouldn’tbecause universities would have all that money they’re currently paying to tenured faculty, and all the talent currently locked up in tenured positions would be on the market, driving up the prevailing salary. What would happen if we eliminated tenure is not that all salaries would fall to adjunct level; rather, salaries would all adjust to some intermediate level between what adjuncts currently make and what tenured professors currently make.

What would the new salary be, exactly? That would require a detailed model of the supply and demand elasticities, so I can’t tell you without starting a whole new research paper. But a back-of-the-envelope calculation would suggest something like the overall current median faculty salary. This suggests a median salary somewhere around $75,000. This is a lot less than some professors make, but it’s also a lot more than what adjuncts make, and it’s a pretty good living overall.

If the salary for professors fell, the pool of candidates would decrease, and we wouldn’t need such harsh filtering mechanisms. We might decide we don’t need a strict evaluation system at all, and since the knowledge-sharing function of journals is much better served by other means, we could probably get rid of them altogether.

Of course, who am I kidding? That’s not going to happen. The people who make these rules succeeded in the current system. They are the ones who stand to lose high salaries and job security under a reform policy. They like things just the way they are.

The replication crisis, and the future of science

Aug 27, JDN 2457628 [Sat]

After settling in a little bit in Irvine, I’m now ready to resume blogging, but for now it will be on a reduced schedule. I’ll release a new post every Saturday, at least for the time being.

Today’s post was chosen by Patreon vote, though only one person voted (this whole Patreon voting thing has not been as successful as I’d hoped). It’s about something we scientists really don’t like to talk about, but definitely need to: We are in the middle of a major crisis of scientific replication.

Whenever large studies are conducted attempting to replicate published scientific results, their ability to do so is almost always dismal.

Psychology is the one everyone likes to pick on, because their record is particularly bad. Only 39% of studies were really replicated with the published effect size, though a further 36% were at least qualitatively but not quantitatively similar. Yet economics has its own replication problem, and even medical research is not immune to replication failure.

It’s important not to overstate the crisis; the majority of scientific studies do at least qualitatively replicate. We are doing better than flipping a coin, which is better than one can say of financial forecasters.
There are three kinds of replication, and only one of them should be expected to give near-100% results. That kind is reanalysiswhen you take the same data and use the same methods, you absolutely should get the exact same results. I favor making reanalysis a routine requirement of publication; if we can’t get your results by applying your statistical methods to your data, then your paper needs revision before we can entrust it to publication. A number of papers have failed on reanalysis, which is absurd and embarrassing; the worst offender was probably Rogart-Reinhoff, which was used in public policy decisions around the world despite having spreadsheet errors.

The second kind is direct replication—when you do the exact same experiment again and see if you get the same result within error bounds. This kind of replication should work something like 90% of the time, but in fact works more like 60% of the time.

The third kind is conceptual replication—when you do a similar experiment designed to test the same phenomenon from a different perspective. This kind of replication should work something like 60% of the time, but actually only works about 20% of the time.

Economists are well equipped to understand and solve this crisis, because it’s not actually about science. It’s about incentives. I facepalm every time I see another article by an aggrieved statistician about the “misunderstanding” of p-values; no, scientist aren’t misunderstanding anything. They know damn well how p-values are supposed to work. So why do they keep using them wrong? Because their jobs depend on doing so.

The first key point to understand here is “publish or perish”; academics in an increasingly competitive system are required to publish their research in order to get tenure, and frequently required to get tenure in order to keep their jobs at all. (Or they could become adjuncts, who are paid one-fifth as much.)

The second is the fundamentally defective way our research journals are run (as I have discussed in a previous post). As private for-profit corporations whose primary interest is in raising more revenue, our research journals aren’t trying to publish what will genuinely advance scientific knowledge. They are trying to publish what will draw attention to themselves. It’s a similar flaw to what has arisen in our news media; they aren’t trying to convey the truth, they are trying to get ratings to draw advertisers. This is how you get hours of meaningless fluff about a missing airliner and then a single chyron scroll about a war in Congo or a flood in Indonesia. Research journals haven’t fallen quite so far because they have reputations to uphold in order to attract scientists to read them and publish in them; but still, their fundamental goal is and has always been to raise attention in order to raise revenue.

The best way to do that is to publish things that are interesting. But if a scientific finding is interesting, that means it is surprising. It has to be unexpected or unusual in some way. And above all, it has to be positive; you have to have actually found an effect. Except in very rare circumstances, the null result is never considered interesting. This adds up to making journals publish what is improbable.

In particular, it creates a perfect storm for the abuse of p-values. A p-value, roughly speaking, is the probability you would get the observed result if there were no effect at all—for instance, the probability that you’d observe this wage gap between men and women in your sample if in the real world men and women were paid the exact same wages. The standard heuristic is a p-value of 0.05; indeed, it has become so enshrined that it is almost an explicit condition of publication now. Your result must be less than 5% likely to happen if there is no real difference. But if you will only publish results that show a p-value of 0.05, then the papers that get published and read will only be the ones that found such p-values—which renders the p-values meaningless.

It was never particularly meaningful anyway; as we Bayesians have been trying to explain since time immemorial, it matters how likely your hypothesis was in the first place. For something like wage gaps where we’re reasonably sure, but maybe could be wrong, the p-value is not too unreasonable. But if the theory is almost certainly true (“does gravity fall off as the inverse square of distance?”), even a high p-value like 0.35 is still supportive, while if the theory is almost certainly false (“are human beings capable of precognition?”—actual study), even a tiny p-value like 0.001 is still basically irrelevant. We really should be using much more sophisticated inference techniques, but those are harder to do, and don’t provide the nice simple threshold of “Is it below 0.05?”

But okay, p-values can be useful in many cases—if they are used correctly and you see all the results. If you have effect X with p-values 0.03, 0.07, 0.01, 0.06, and 0.09, effect X is probably a real thing. If you have effect Y with p-values 0.04, 0.02, 0.29, 0.35, and 0.74, effect Y is probably not a real thing. But I’ve just set it up so that these would be published exactly the same. They each have two published papers with “statistically significant” results. The other papers never get published and therefore never get seen, so we throw away vital information. This is called the file drawer problem.

Researchers often have a lot of flexibility in designing their experiments. If their only goal were to find truth, they would use this flexibility to test a variety of scenarios and publish all the results, so they can be compared holistically. But that isn’t their only goal; they also care about keeping their jobs so they can pay rent and feed their families. And under our current system, the only way to ensure that you can do that is by publishing things, which basically means only including the parts that showed up as statistically significant—otherwise, journals aren’t interested. And so we get huge numbers of papers published that tell us basically nothing, because we set up such strong incentives for researchers to give misleading results.

The saddest part is that this could be easily fixed.

First, reduce the incentives to publish by finding other ways to evaluate the skill of academics—like teaching for goodness’ sake. Working papers are another good approach. Journals already get far more submissions than they know what to do with, and most of these papers will never be read by more than a handful of people. We don’t need more published findings, we need better published findings—so stop incentivizing mere publication and start finding ways to incentivize research quality.

Second, eliminate private for-profit research journals. Science should be done by government agencies and nonprofits, not for-profit corporations. (And yes, I would apply this to pharmaceutical companies as well, which should really be pharmaceutical manufacturers who make cheap drugs based off of academic research and carry small profit margins.) Why? Again, it’s all about incentives. Corporations have no reason to want to find truth and every reason to want to tilt it in their favor.

Third, increase the number of tenured faculty positions. Instead of building so many new grand edifices to please your plutocratic donors, use your (skyrocketing) tuition money to hire more professors so that you can teach more students better. You can find even more funds if you cut the salaries of your administrators and football coaches. Come on, universities; you are the one industry in the world where labor demand and labor supply are the same people a few years later. You have no excuse for not having the smoothest market clearing in the world. You should never have gluts or shortages.

Fourth, require pre-registration of research studies (as some branches of medicine already do). If the study is sound, an optimal rational agent shouldn’t care in the slightest whether it had a positive or negative result, and if our ape brains won’t let us think that way, we need to establish institutions to force it to happen. They shouldn’t even see the effect size and p-value before they make the decision to publish it; all they should care about is that the experiment makes sense and the proper procedure was conducted.
If we did all that, the replication crisis could be almost completely resolved, as the incentives would be realigned to more closely match the genuine search for truth.

Alas, I don’t see universities or governments or research journals having the political will to actually make such changes, which is very sad indeed.

What’s wrong with academic publishing?

JDN 2457257 EDT 14:23.

I just finished expanding my master’s thesis into a research paper that is, I hope, suitable for publication in an economics journal. As part of this process I’ve been looking into the process of submitting articles for publication in academic journals… and I’ve found has been disgusting and horrifying. It is astonishingly bad, and my biggest question is why researchers put up with it.

Thus, the subject of this post is what’s wrong with the system—and what we might do instead.

Before I get into it, let me say that I don’t actually disagree with “publish or perish” in principle—as SMBC points out, it’s a lot like “do your job or get fired”. Researchers should publish in peer-reviewed journals; that’s a big part of what doing research means. The problem is how most peer-reviewed journals are currently operated.

First of all, in case you didn’t know, most scientific journals are owned by for-profit corporations. The largest corporation Elsevier, owns The Lancet and all of ScienceDirect, and has net income of over 1 billion Euros a year. Then there’s Springer and Wiley-Blackwell; between the three of them, these publishers account for over 40% of all scientific publications. These for-profit publishers retain the full copyright to most of the papers they publish, and tightly control access with paywalls; the cost to get through these paywalls is generally thousands of dollars a year for individuals and millions of dollars a year for universities. Their monopoly power is so great it “makes Rupert Murdoch look like a socialist.”

For-profit journals do often offer an “open-access” option in which you basically buy back your own copyright, but the price is high—the most common I’ve seen are $1800 or $3000 per paper—and very few researchers do this, for obvious financial reasons. In fact I think for a full-time tenured faculty researcher it’s probably worth it, given the alternatives. (Then again, full-time tenured faculty are becoming an endangered species lately; what might be worth it in the long run can still be very difficult for a cash-strapped adjunct to afford.) Open-access means people can actually read your paper and potentially cite your paper. Closed-access means it may languish in obscurity.

And of course it isn’t just about the benefits for the individual researcher. The scientific community as a whole depends upon the free flow of information; the reason we publish in the first place is that we want people to read papers, discuss them, replicate them, challenge them. Publication isn’t the finish line; it’s at best a checkpoint. Actually one thing that does seem to be wrong with “publish or perish” is that there is so much pressure for publication that we publish too many pointless papers and nobody has time to read the genuinely important ones.

These prices might be justifiable if the for-profit corporations actually did anything. But in fact they are basically just aggregators. They don’t do the peer-review, they farm it out to other academic researchers. They don’t even pay those other researchers; they just expect them to do it. (And they do! Like I said, why do they put up with this?) They don’t pay the authors who have their work published (on the contrary, they often charge submission fees—about $100 seems to be typical—simply to look at them). It’s been called “the world’s worst restaurant”, where you pay to get in, bring your own ingredients and recipes, cook your own food, serve other people’s food while they serve yours, and then have to pay again if you actually want to be allowed to eat.

They pay for the printing of paper copies of the journal, which basically no one reads; and they pay for the electronic servers that host the digital copies that everyone actually reads. They also provide some basic copyediting services (copyediting APA style is a job people advertise on Craigslist—so you can guess how much they must be paying).

And even supposing that they actually provided some valuable and expensive service, the fact would remain that we are making for-profit corporations the gatekeepers of the scientific community. Entities that exist only to make money for their owners are given direct control over the future of human knowledge. If you look at Cracked’s “reasons why we can’t trust science anymore”, all of them have to do with the for-profit publishing system. p-hacking might still happen in a better system, but publishers that really had the best interests of science in mind would be more motivated to fight it than publishers that are simply trying to raise revenue by getting people to buy access to their papers.

Then there’s the fact that most journals do not allow authors to submit to multiple journals at once, yet take 30 to 90 days to respond and only publish a fraction of what is submitted—it’s almost impossible to find good figures on acceptance rates (which is itself a major problem!), but the highest figures I’ve seen are 30% acceptance, a more typical figure seems to be 10%, and some top journals go as low as 3%. In the worst-case scenario you are locked into a journal for 90 days with only a 3% chance of it actually publishing your work. At that rate publishing an article could take years.

Is open-access the solution? Yes… well, part of it, anyway.

There are a large number of open-access journals, some of which do not charge submission fees, but very few of them are prestigious, and many are outright predatory. Predatory journals charge exorbitant fees, often after accepting papers for publication; many do little or no real peer review. There are almost seven hundred known predatory open-access journals; over one hundred have even been caught publishing hoax papers. These predatory journals are corrupting the process of science.

There are a few reputable open-access journals, such as BMC Biology and PLOSOne. Though not actually a journal, ArXiv serves a similar role. These will be part of the solution, most definitely. Yet even legitimate open-access journals often charge each author over $1000 to publish an article. There is a small but significant positive correlation between publication fees and journal impact factor.

We need to found more open-access journals which are funded by either governments or universities, so that neither author nor reader ever pays a cent. Science is a public good and should be funded as such. Even if copyright makes sense for other forms of content (I’m not so sure about that), it most certainly does not make sense for scientific knowledge, which by its very nature is only doing its job if it is shared with the world.

These journals should be specifically structured to be method-sensitive but results-blind. (It’s a very good thing that medical trials are usually registered before they are completed, so that publication is assured even if the results are negative—the same should be done with other sciences. Unfortunately, even in medicine there is significant publication bias.) If you could sum up the scientific method in one phrase, it might just be that: Method-sensitive but results-blind. If you think you know what you’re going to find beforehand, you may not be doing science. If you are certain what you’re going to find beforehand, you’re definitely not doing science.

The process should still be highly selective, but it should be possible—indeed, expected—to submit to multiple journals at once. If journals want to start paying their authors to entice them to publish in that journal rather than take another offer, that’s fine with me. Researchers are the ones who produce the content; if anyone is getting paid for it, it should be us.

This is not some wild and fanciful idea; it’s already the way that book publishing works. Very few literary agents or book publishers would ever have the audacity to say you can’t submit your work elsewhere; those that try are rapidly outcompeted as authors stop submitting to them. It’s fundamentally unreasonable to expect anyone to hang all their hopes on a particular buyer months in advance—and that is what you are, publishers, you are buyers. You are not sellers, you did not create this content.

But new journals face a fundamental problem: Good researchers will naturally want to publish in journals that are prestigious—that is, journals that are already prestigious. When all of the prestige is in journals that are closed-access and owned by for-profit companies, the best research goes there, and the prestige becomes self-reinforcing. Journals are prestigious because they are prestigious; welcome to tautology club.

Somehow we need to get good researchers to start boycotting for-profit journals and start investing in high-quality open-access journals. If Elsevier and Springer can’t get good researchers to submit to them, they’ll change their ways or wither and die. Research should be funded and published by governments and nonprofit institutions, not by for-profit corporations.

This may in fact highlight a much deeper problem in academia, the very concept of “prestige”. I have no doubt that Harvard is a good university, better university than most; but is it actually the best as it is in most people’s minds? Might Stanford or UC Berkeley be better, or University College London, or even the University of Michigan? How would we tell? Are the students better? Even if they are, might that just be because all the better students went to the schools that had better reputations? Controlling for the quality of the student, more prestigious universities are almost uncorrelated with better outcomes. Those who get accepted to Ivies but attend other schools do just as well in life as those who actually attend Ivies. (Good news for me, getting into Columbia but going to Michigan.) Yet once a university acquires such a high reputation, it can be very difficult for it to lose that reputation, and even more difficult for others to catch up.

Prestige is inherently zero-sum; for me to get more prestige you must lose some. For one university or research journal to rise in rankings, another must fall. Aside from simply feeding on other prestige, the prestige of a university is largely based upon the students it rejects—its “selectivity” score. What does it say about our society that we value educational institutions based upon the number of people they exclude?

Zero-sum ranking is always easier to do than nonzero-sum absolute scoring. Actually that’s a mathematical theorem, and one of the few good arguments against range voting (still not nearly good enough, in my opinion); if you have a list of scores you can always turn them into ranks (potentially with ties); but from a list of ranks there is no way to turn them back into scores.

Yet ultimately it is absolute scores that must drive humanity’s progress. If life were simply a matter of ranking, then progress would be by definition impossible. No matter what we do, there will always be top-ranked and bottom-ranked people.

There is simply no way mathematically for more than 1% of human beings to be in the top 1% of the income distribution. (If you’re curious where exactly that lies today, I highly recommend this interactive chart by the New York Times.) But we could raise the standard of living for the majority of people to a level that only the top 1% once had—and in fact, within the First World we have already done this. We could in fact raise the standard of living for everyone in the First World to a level that only the top 1%—or less—had as recently as the 16th century, by the simple change of implementing a basic income.

There is no way for more than 0.14% of people to have an IQ above 145, because IQ is defined to have a mean of 100 and a standard deviation of 15, regardless of how intelligent people are. People could get dramatically smarter over timeand in fact have—and yet it would still be the case that by definition, only 0.14% can be above 145.

Similarly, there is no way for much more than 1% of people to go to the top 1% of colleges. There is no way for more than 1% of people to be in the highest 1% of their class. But we could increase the number of college degrees (which we have); we could dramatically increase literacy rates (which we have).

We need to find a way to think of science in the same way. I wouldn’t suggest simply using number of papers published or even number of drugs invented; both of those are skyrocketing, but I can’t say that most of the increase is actually meaningful. I don’t have a good idea of what an absolute scale for scientific quality would look like, even at an aggregate level; and it is likely to be much harder still to make one that applies on an individual level.

But I think that ultimately this is the only way, the only escape from the darkness of cutthroat competition. We must stop thinking in terms of zero-sum rankings and start thinking in terms of nonzero-sum absolute scales.