Mar 12 JDN 2460016

The next few posts are going to be a bit different, a bit more advanced and technical than usual. This is because, for the first time in several months at least, I am actually working on what could be reasonably considered something like theoretical research.

I am writing it up in the form of blog posts, because actually writing a paper is still too stressful for me right now. This also forces me to articulate my ideas in a clearer and more readable way, rather than dive directly into a morass of equations. It also means that even if I do never actually get around to finishing a paper, the idea is out there, and maybe someone else could make use of it (and hopefully give me some of the credit).

I’ve written previously about the Yerkes-Dodson effect: On cognitively-demanding tasks, increased stress increases performance, but only to a point, after which it begins decreasing it again. The effect is well-documented, but the mechanism is poorly understood.

I am currently on the wrong side of the Yerkes-Dodson curve, which is why I’m too stressed to write this as a formal paper right now. But that also gave me some ideas about how it may work.

I have come up with a simple but powerful mathematical model that may provide a mechanism for the Yerkes-Dodson effect.

This model is clearly well within the realm of a behavioral economic model, but it is also closely tied to neuroscience and cognitive science.

I call it the stochastic overload model.

First, a metaphor: Consider an engine, which can run faster or slower. If you increase its RPMs, it will output more power, and provide more torque—but only up to a certain point. Eventually it hits a threshold where it will break down, or even break apart. In real engines, we often include safety systems that force the engine to shut down as it approaches such a threshold.

I believe that human brains function on a similar principle. Stress increases arousal, which activates a variety of processes via the sympathetic nervous system. This activation improves performance on both physical and cognitive tasks. But it has a downside; especially on cognitively demanding tasks which required sustained effort, I hypothesize that too much sympathetic activation can result in a kind of system overload, where your brain can no longer handle the stress and processes are forced to shut down.

This shutdown could be brief—a few seconds, or even a fraction of a second—or it could be prolonged—hours or days. That might depend on just how severe the stress is, or how much of your brain it requires, or how prolonged it is. For purposes of the model, this isn’t vital. It’s probably easiest to imagine it being a relatively brief, localized shutdown of a particular neural pathway. Then, your performance in a task is summed up over many such pathways over a longer period of time, and by the law of large numbers your overall performance is essentially the average performance of all your brain systems.

That’s the “overload” part of the model. Now for the “stochastic” part.

Let’s say that, in the absence of stress, your brain has a certain innate level of sympathetic activation, which varies over time in an essentially chaotic, unpredictable—stochastic—sort of way. It is never really completely deactivated, and may even have some chance of randomly overloading itself even without outside input. (Actually, a potential role in the model for the personality trait neuroticism is an innate tendency toward higher levels of sympathetic activation in the absence of outside stress.)

Let’s say that this innate activation is x, which follows some kind of known random distribution F(x).

For simplicity, let’s also say that added stress s adds linearly to your level of sympathetic activation, so your overall level of activation is x + s.

For simplicity, let’s say that activation ranges between 0 and 1, where 0 is no activation at all and 1 is the maximum possible activation and triggers overload.

I’m assuming that if a pathway shuts down from overload, it doesn’t contribute at all to performance on the task. (You can assume it’s only reduced performance, but this adds complexity without any qualitative change.)

Since sympathetic activation improves performance, but can result in overload, your overall expected performance in a given task can be computed as the product of two terms:

[expected value of x + s, provided overload does not occur] * [probability overload does not occur]

E[x + s | x + s < 1] P[x + s < 1]

The first term can be thought of as the incentive effect: Higher stress promotes more activation and thus better performance.

The second term can be thought of as the overload effect: Higher stress also increases the risk that activation will exceed the threshold and force shutdown.

This equation actually turns out to have a remarkably elegant form as an integral (and here’s where I get especially technical and mathematical):

\int_{0}^{1-s} (x+s) dF(x)

The integral subsumes both the incentive effect and the overload effect into one term; you can also think of the +s in the integrand as the incentive effect and the 1-s in the limit of integration as the overload effect.

For the uninitated, this is probably just Greek. So let me show you some pictures to help with your intuition. These are all freehand sketches, so let me apologize in advance for my limited drawing skills. Think of this as like Arthur Laffer’s famous cocktail napkin.

Suppose that, in the absence of outside stress, your innate activation follows a distribution like this (this could be a normal or logit PDF; as I’ll talk about next week, logit is far more tractable):

As I start adding stress, this shifts the distribution upward, toward increased activation:

Initially, this will improve average performance.

But at some point, increased stress actually becomes harmful, as it increases the probability of overload.

And eventually, the probability of overload becomes so high that performance becomes worse than it was with no stress at all:

The result is that overall performance, as a function of stress, looks like an inverted U-shaped curve—the Yerkes-Dodson curve:

The precise shape of this curve depends on the distribution that we use for the innate activation, which I will save for next week’s post.

# To a first approximation, all human behavior is social norms

Dec 15 JDN 2458833

The language we speak, the food we eat, and the clothes we wear—indeed, the fact that we wear clothes at all—are all the direct result of social norms. But norms run much deeper than this: Almost everything we do is more norm than not.

Why do sleep and wake up at a particular time of day? For most people, the answer is that they needed to get up to go to work. Why do you need to go to work at that specific time? Why does almost everyone go to work at the same time? Social norms.

Even the most extreme human behaviors are often most comprehensible in terms of social norms. The most effective predictive models of terrorism are based on social networks: You are much more likely to be a terrorist if you know people who are terrorists, and much more likely to become a terrorist if you spend a lot of time talking with terrorists. Cultists and conspiracy theorists seem utterly baffling if you imagine that humans form their beliefs rationally—and totally unsurprising if you realize that humans mainly form their beliefs by matching those around them.

For a long time, economists have ignored social norms at our peril; we’ve assumed that financial incentives will be sufficient to motivate behavior, when social incentives can very easily override them. Indeed, it is entirely possible for a financial incentive to have a negative effect, when it crowds out a social incentive: A good example is a friend who would gladly come over to help you with something as a friend, but then becomes reluctant if you offer to pay him 25. I previously discussed another example, where taking a mentor out to dinner sounds good but paying him seems corrupt. Why do you drive on the right side of the road (or the left, if you’re in Britain)? The law? Well, the law is already a social norm. But in fact, it’s hardly just that. You probably sometimes speed or run red lights, which are also in violation of traffic laws. Yet somehow driving on the right side seem to be different. Well, that’s because driving on the right has a much stronger norm—and in this case, that norm is self-enforcing with the risk of severe bodily harm or death. This is a good example of why it isn’t necessary for everyone to choose to follow a norm for that norm to have a great deal of power. As long as the norms include some mechanism for rewarding those who follow and punishing those who don’t, norms can become compelling even to those who would prefer not to obey. Sometimes it’s not even clear whether people are following a norm or following direct incentives, because the two are so closely aligned. Humans are not the only social species, but we are by far the most social species. We form larger, more complex groups than any other animal; we form far more complex systems of social norms; and we follow those norms with slavish obedience. Indeed, I’m a little suspicious of some of the evolutionary models predicting the evolution of social norms, because they predict it too well; they seem to suggest that it should arise all the time, when in fact it’s only a handful of species who exhibit it at all and only we who build our whole existence around it. Along with our extreme capacity for altruism, this is another way that human beings actually deviate more from the infinite identical psychopaths of neoclassical economics than most other animals. Yes, we’re smarter than other animals; other animals are more likely to make mistakes (though certainly we make plenty of our own). But most other animals aren’t motivated by entirely different goals than individual self-interest (or “evolutionary self-interest” in a Selfish Gene sort of sense) the way we typically are. Other animals try to be selfish and often fail; we try not to be selfish and usually succeed. Economics experiments often go out of their way to exclude social motives as much as possible—anonymous random matching with no communication, for instance—and still end up failing. Human behavior in experiments is consistent, systematic—and almost never completely selfish. Once you start looking for norms, you see them everywhere. Indeed, it becomes hard to see anything else. To a first approximation, all human behavior is social norms. # The replication crisis, and the future of science Aug 27, JDN 2457628 [Sat] After settling in a little bit in Irvine, I’m now ready to resume blogging, but for now it will be on a reduced schedule. I’ll release a new post every Saturday, at least for the time being. Today’s post was chosen by Patreon vote, though only one person voted (this whole Patreon voting thing has not been as successful as I’d hoped). It’s about something we scientists really don’t like to talk about, but definitely need to: We are in the middle of a major crisis of scientific replication. Whenever large studies are conducted attempting to replicate published scientific results, their ability to do so is almost always dismal. Psychology is the one everyone likes to pick on, because their record is particularly bad. Only 39% of studies were really replicated with the published effect size, though a further 36% were at least qualitatively but not quantitatively similar. Yet economics has its own replication problem, and even medical research is not immune to replication failure. It’s important not to overstate the crisis; the majority of scientific studies do at least qualitatively replicate. We are doing better than flipping a coin, which is better than one can say of financial forecasters. There are three kinds of replication, and only one of them should be expected to give near-100% results. That kind is reanalysiswhen you take the same data and use the same methods, you absolutely should get the exact same results. I favor making reanalysis a routine requirement of publication; if we can’t get your results by applying your statistical methods to your data, then your paper needs revision before we can entrust it to publication. A number of papers have failed on reanalysis, which is absurd and embarrassing; the worst offender was probably Rogart-Reinhoff, which was used in public policy decisions around the world despite having spreadsheet errors. The second kind is direct replication—when you do the exact same experiment again and see if you get the same result within error bounds. This kind of replication should work something like 90% of the time, but in fact works more like 60% of the time. The third kind is conceptual replication—when you do a similar experiment designed to test the same phenomenon from a different perspective. This kind of replication should work something like 60% of the time, but actually only works about 20% of the time. Economists are well equipped to understand and solve this crisis, because it’s not actually about science. It’s about incentives. I facepalm every time I see another article by an aggrieved statistician about the “misunderstanding” of p-values; no, scientist aren’t misunderstanding anything. They know damn well how p-values are supposed to work. So why do they keep using them wrong? Because their jobs depend on doing so. The first key point to understand here is “publish or perish”; academics in an increasingly competitive system are required to publish their research in order to get tenure, and frequently required to get tenure in order to keep their jobs at all. (Or they could become adjuncts, who are paid one-fifth as much.) The second is the fundamentally defective way our research journals are run (as I have discussed in a previous post). As private for-profit corporations whose primary interest is in raising more revenue, our research journals aren’t trying to publish what will genuinely advance scientific knowledge. They are trying to publish what will draw attention to themselves. It’s a similar flaw to what has arisen in our news media; they aren’t trying to convey the truth, they are trying to get ratings to draw advertisers. This is how you get hours of meaningless fluff about a missing airliner and then a single chyron scroll about a war in Congo or a flood in Indonesia. Research journals haven’t fallen quite so far because they have reputations to uphold in order to attract scientists to read them and publish in them; but still, their fundamental goal is and has always been to raise attention in order to raise revenue. The best way to do that is to publish things that are interesting. But if a scientific finding is interesting, that means it is surprising. It has to be unexpected or unusual in some way. And above all, it has to be positive; you have to have actually found an effect. Except in very rare circumstances, the null result is never considered interesting. This adds up to making journals publish what is improbable. In particular, it creates a perfect storm for the abuse of p-values. A p-value, roughly speaking, is the probability you would get the observed result if there were no effect at all—for instance, the probability that you’d observe this wage gap between men and women in your sample if in the real world men and women were paid the exact same wages. The standard heuristic is a p-value of 0.05; indeed, it has become so enshrined that it is almost an explicit condition of publication now. Your result must be less than 5% likely to happen if there is no real difference. But if you will only publish results that show a p-value of 0.05, then the papers that get published and read will only be the ones that found such p-values—which renders the p-values meaningless. It was never particularly meaningful anyway; as we Bayesians have been trying to explain since time immemorial, it matters how likely your hypothesis was in the first place. For something like wage gaps where we’re reasonably sure, but maybe could be wrong, the p-value is not too unreasonable. But if the theory is almost certainly true (“does gravity fall off as the inverse square of distance?”), even a high p-value like 0.35 is still supportive, while if the theory is almost certainly false (“are human beings capable of precognition?”—actual study), even a tiny p-value like 0.001 is still basically irrelevant. We really should be using much more sophisticated inference techniques, but those are harder to do, and don’t provide the nice simple threshold of “Is it below 0.05?” But okay, p-values can be useful in many cases—if they are used correctly and you see all the results. If you have effect X with p-values 0.03, 0.07, 0.01, 0.06, and 0.09, effect X is probably a real thing. If you have effect Y with p-values 0.04, 0.02, 0.29, 0.35, and 0.74, effect Y is probably not a real thing. But I’ve just set it up so that these would be published exactly the same. They each have two published papers with “statistically significant” results. The other papers never get published and therefore never get seen, so we throw away vital information. This is called the file drawer problem. Researchers often have a lot of flexibility in designing their experiments. If their only goal were to find truth, they would use this flexibility to test a variety of scenarios and publish all the results, so they can be compared holistically. But that isn’t their only goal; they also care about keeping their jobs so they can pay rent and feed their families. And under our current system, the only way to ensure that you can do that is by publishing things, which basically means only including the parts that showed up as statistically significant—otherwise, journals aren’t interested. And so we get huge numbers of papers published that tell us basically nothing, because we set up such strong incentives for researchers to give misleading results. The saddest part is that this could be easily fixed. First, reduce the incentives to publish by finding other ways to evaluate the skill of academics—like teaching for goodness’ sake. Working papers are another good approach. Journals already get far more submissions than they know what to do with, and most of these papers will never be read by more than a handful of people. We don’t need more published findings, we need better published findings—so stop incentivizing mere publication and start finding ways to incentivize research quality. Second, eliminate private for-profit research journals. Science should be done by government agencies and nonprofits, not for-profit corporations. (And yes, I would apply this to pharmaceutical companies as well, which should really be pharmaceutical manufacturers who make cheap drugs based off of academic research and carry small profit margins.) Why? Again, it’s all about incentives. Corporations have no reason to want to find truth and every reason to want to tilt it in their favor. Third, increase the number of tenured faculty positions. Instead of building so many new grand edifices to please your plutocratic donors, use your (skyrocketing) tuition money to hire more professors so that you can teach more students better. You can find even more funds if you cut the salaries of your administrators and football coaches. Come on, universities; you are the one industry in the world where labor demand and labor supply are the same people a few years later. You have no excuse for not having the smoothest market clearing in the world. You should never have gluts or shortages. Fourth, require pre-registration of research studies (as some branches of medicine already do). If the study is sound, an optimal rational agent shouldn’t care in the slightest whether it had a positive or negative result, and if our ape brains won’t let us think that way, we need to establish institutions to force it to happen. They shouldn’t even see the effect size and p-value before they make the decision to publish it; all they should care about is that the experiment makes sense and the proper procedure was conducted. If we did all that, the replication crisis could be almost completely resolved, as the incentives would be realigned to more closely match the genuine search for truth. Alas, I don’t see universities or governments or research journals having the political will to actually make such changes, which is very sad indeed. # Bet five dollars for maximum performance JDN 2457433 One of the more surprising findings from the study of human behavior under stress is the Yerkes-Dodson curve: This curve shows how well humans perform at a given task, as a function of how high the stakes are on whether or not they do it properly. For simple tasks, it says what most people intuitively expect—and what neoclassical economists appear to believe: As the stakes rise, the more highly incentivized you are to do it, and the better you do it. But for complex tasks, it says something quite different: While increased stakes do raise performance to a point—with nothing at stake at all, people hardly work at all—it is possible to become too incentivized. Formally we say the curve is not monotonic; it has a local maximum. This is one of many reasons why it’s ridiculous to say that top CEOs should make tens of millions of dollars a year on the rise and fall of their company’s stock price (as a great many economists do in fact say). Even if I believed that stock prices accurately reflect the company’s viability (they do not), and believed that the CEO has a great deal to do with the company’s success, it would still be a case of overincentivizing. When a million dollars rides on a decision, that decision is going to be worse than if the stakes had only been100. With this in mind, it’s really not surprising that higher CEO pay is correlated with worse company performance. Stock options are terrible motivators, but do offer a subtle way of making wages adjust to the business cycle.

The reason for this is that as the stakes get higher, we become stressed, and that stress response inhibits our ability to use higher cognitive functions. The sympathetic nervous system evolved to make us very good at fighting or running away in the face of danger, which works well should you ever be attacked by a tiger. It did not evolve to make us good at complex tasks under high stakes, the sort of skill we’d need when calculating the trajectory of an errant spacecraft or disarming a nuclear warhead.

To be fair, most of us never have to worry about piloting errant spacecraft or disarming nuclear warheads—indeed, you’re about as likely to get attacked by a tiger even in today’s world. (The rate of tiger attacks in the US is just under 2 per year, and the rate of manned space launches in the US was about 5 per year until the Space Shuttle was terminated.)

There are certain professions, such as pilots and surgeons, where performing complex tasks under life-or-death pressure is commonplace, but only a small fraction of people take such professions for precisely that reason. And if you’ve ever wondered why we use checklists for pilots and there is discussion of also using checklists for surgeons, this is why—checklists convert a single complex task into many simple tasks, allowing high performance even at extreme stakes.

But we do have to do a fair number of quite complex tasks with stakes that are, if not urgent life-or-death scenarios, then at least actions that affect our long-term life prospects substantially. In my tutoring business I encounter one in particular quite frequently: Standardized tests.

Tests like the SAT, ACT, GRE, LSAT, GMAT, and other assorted acronyms are not literally life-or-death, but they often feel that way to students because they really do have a powerful impact on where you’ll end up in life. Will you get into a good college? Will you get into grad school? Will you get the job you want? Even subtle deviations from the path of optimal academic success can make it much harder to achieve career success in the future.

Of course, these are hardly the only examples. Many jobs require us to complete tasks properly on tight deadlines, or else risk being fired. Working in academia infamously requires publishing in journals in time to rise up the tenure track, or else falling off the track entirely. (This incentivizes the production of huge numbers of papers, whether they’re worth writing or not; yes, the number of papers published goes down after tenure, but is that a bad thing? What we need to know is whether the number of good papers goes down. My suspicion is that most if not all of the reduction in publications is due to not publishing things that weren’t worth publishing.)

So if you are faced with this sort of task, what can you do? If you realize that you are faced with a high-stakes complex task, you know your performance will be bad—which only makes your stress worse!

My advice is to pretend you’re betting five dollars on the outcome.

Ignore all other stakes, and pretend you’re betting five dollars. $5.00 USD. Do it right and you get a Lincoln; do it wrong and you lose one. What this does is ensures that you care enough—you don’t want to lose$5 for no reason—but not too much—if you do lose $5, you don’t feel like your life is ending. We want to put you near that peak of the Yerkes-Dodson curve. The great irony here is that you most want to do this when it is most untrue. If you actually do have a task for which you’ve bet$5 and nothing else rides on it, you don’t need this technique, and any technique to improve your performance is not particularly worthwhile. It’s when you have a standardized test to pass that you really want to use this—and part of me even hopes that people know to do this whenever they have nuclear warheads to disarm. It is precisely when the stakes are highest that you must put those stakes out of your mind.

Why five dollars? Well, the exact amount is arbitrary, but this is at least about the right order of magnitude for most First World individuals. If you really want to get precise, I think the optimal stakes level for maximum performance is something like 100 microQALY per task, and assuming logarithmic utility of wealth, $5 at the US median household income of$53,600 is approximately 100 microQALY. If you have a particularly low or high income, feel free to adjust accordingly. Literally you should be prepared to bet about an hour of your life; but we are not accustomed to thinking that way, so use $5. (I think most people, if asked outright, would radically overestimate what an hour of life is worth to them. “I wouldn’t give up an hour of my life for$1,000!” Then why do you work at \$20 an hour?)

It’s a simple heuristic, easy to remember, and sometimes effective. Give it a try.