Implications of stochastic overload

Apr 2 JDN 2460037

A couple weeks ago I presented my stochastic overload model, which posits a neurological mechanism for the Yerkes-Dodson effect: Stress increases sympathetic activation, and this increases performance, up to the point where it starts to risk causing neural pathways to overload and shut down.

This week I thought I’d try to get into some of the implications of this model, how it might be applied to make predictions or guide policy.

One thing I often struggle with when it comes to applying theory is what actual benefits we get from a quantitative mathematical model as opposed to simply a basic qualitative idea. In many ways I think these benefits are overrated; people seem to think that putting something into an equation automatically makes it true and useful. I am sometimes tempted to try to take advantage of this, to put things into equations even though I know there is no good reason to put them into equations, simply because so many people seem to find equations so persuasive for some reason. (Studies have even shown that, particularly in disciplines that don’t use a lot of math, inserting a totally irrelevant equation into a paper makes it more likely to be accepted.)

The basic implications of the Yerkes-Dodson effect are already widely known, and utterly ignored in our society. We know that excessive stress is harmful to health and performance, and yet our entire economy seems to be based around maximizing the amount of stress that workers experience. I actually think neoclassical economics bears a lot of the blame for this, as neoclassical economists are constantly talking about “increasing work incentives”—which is to say, making work life more and more stressful. (And let me remind you that there has never been any shortage of people willing to work in my lifetime, except possibly briefly during the COVID pandemic. The shortage has always been employers willing to hire them.)

I don’t know if my model can do anything to change that. Maybe by putting it into an equation I can make people pay more attention to it, precisely because equations have this weird persuasive power over most people.

As far as scientific benefits, I think that the chief advantage of a mathematical model lies in its ability to make quantitative predictions. It’s one thing to say that performance increases with low levels of stress then decreases with high levels; but it would be a lot more useful if we could actually precisely quantify how much stress is optimal for a given person and how they are likely to perform at different levels of stress.

Unfortunately, the stochastic overload model can only make detailed predictions if you have fully specified the probability distribution of innate activation, which requires a lot of free parameters. This is especially problematic if you don’t even know what type of distribution to use, which we really don’t; I picked three classes of distribution because they were plausible and tractable, not because I had any particular evidence for them.

Also, we don’t even have standard units of measurement for stress; we have a vague notion of what more or less stressed looks like, but we don’t have the sort of quantitative measure that could be plugged into a mathematical model. Probably the best units to use would be something like blood cortisol levels, but then we’d need to go measure those all the time, which raises its own issues. And maybe people don’t even respond to cortisol in the same ways? But at least we could measure your baseline cortisol for awhile to get a prior distribution, and then see how different incentives increase your cortisol levels; and then the model should give relatively precise predictions about how this will affect your overall performance. (This is a very neuroeconomic approach.)

So, for now, I’m not really sure how useful the stochastic overload model is. This is honestly something I feel about a lot of the theoretical ideas I have come up with; they often seem too abstract to be usefully applicable to anything.

Maybe that’s how all theory begins, and applications only appear later? But that doesn’t seem to be how people expect me to talk about it whenever I have to present my work or submit it for publication. They seem to want to know what it’s good for, right now, and I never have a good answer to give them. Do other researchers have such answers? Do they simply pretend to?

Along similar lines, I recently had one of my students ask about a theory paper I wrote on international conflict for my dissertation, and after sending him a copy, I re-read the paper. There are so many pages of equations, and while I am confident that the mathematical logic is valid,I honestly don’t know if most of them are really useful for anything. (I don’t think I really believe that GDP is produced by a Cobb-Douglas production function, and we don’t even really know how to measure capital precisely enough to say.) The central insight of the paper, which I think is really important but other people don’t seem to care about, is a qualitative one: International treaties and norms provide an equilibrium selection mechanism in iterated games. The realists are right that this is cheap talk. The liberals are right that it works. Because when there are many equilibria, cheap talk works.

I know that in truth, science proceeds in tiny steps, building a wall brick by brick, never sure exactly how many bricks it will take to finish the edifice. It’s impossible to see whether your work will be an irrelevant footnote or the linchpin for a major discovery. But that isn’t how the institutions of science are set up. That isn’t how the incentives of academia work. You’re not supposed to say that this may or may not be correct and is probably some small incremental progress the ultimate impact of which no one can possibly foresee. You’re supposed to sell your work—justify how it’s definitely true and why it’s important and how it has impact. You’re supposed to convince other people why they should care about it and not all the dozens of other probably equally-valid projects being done by other researchers.

I don’t know how to do that, and it is agonizing to even try. It feels like lying. It feels like betraying my identity. Being good at selling isn’t just orthogonal to doing good science—I think it’s opposite. I think the better you are at selling your work, the worse you are at cultivating the intellectual humility necessary to do good science. If you think you know all the answers, you’re just bad at admitting when you don’t know things. It feels like in order to succeed in academia, I have to act like an unscientific charlatan.

Honestly, why do we even need to convince you that our work is more important than someone else’s? Are there only so many science points to go around? Maybe the whole problem is this scarcity mindset. Yes, grant funding is limited; but why does publishing my work prevent you from publishing someone else’s? Why do you have to reject 95% of the papers that get sent to you? Don’t tell me you’re limited by space; the journals are digital and searchable and nobody reads the whole thing anyway. Editorial time isn’t infinite, but most of the work has already been done by the time you get a paper back from peer review. Of course, I know the real reason: Excluding people is the main source of prestige.

Drift-diffusion decision-making: The stock market in your brain

JDN 2456173 EDT 17:32.

Since I’ve been emphasizing the “economics” side of things a lot lately, I decided this week to focus more on the “cognitive” side. Today’s topic comes from cutting-edge research in cognitive science and neuroeconomics, so we still haven’t ironed out all the details.

The question we are trying to answer is an incredibly basic one: How do we make decisions? Given the vast space of possible behaviors human beings can engage in, how do we determine which ones we actually do?

There are actually two phases of decision-making.

The first phase is alternative generation, in which we come up with a set of choices. Some ideas occur to us, others do not; some are familiar and come to mind easily, others only appear after careful consideration. Techniques like brainstorming exist to help us with this task, but none of them are really very good; one of the most important bottlenecks in human cognition is the individual capacity to generate creative alternatives. The task is mind-bogglingly complex; the number of possible choices you could make at any given moment is already vast, and with each passing moment the number of possible behavioral sequences grows exponentially. Just think about all the possible sentences I could type write now, and then think about how incredibly narrow a space of possible behavioral options it is to assume that I’m typing sentences.

Most of the world’s innovation can ultimately be attributed to better alternative generation; particular with regard to social systems, but in many cases even with regard to technologies, the capability existed for decades or even centuries but the idea simply never occurred to anyone. (You can see this by looking at the work of Heron of Alexandria and Leonardo da Vinci; the capacity to build these machines existed, and a handful of individuals were creative enough to actually try it, but it never occurred to anyone that there could be enormous, world-changing benefits to expanding these technologies for mass production.)

Unfortunately, we basically don’t understand alternative generation at all. It’s an almost complete gap in our understanding of human cognition. It actually has a lot to do with some of the central unsolved problems of cognitive science and artificial intelligence; if we could create a computer that is capable of creative thought, we would basically make human beings obsolete once and for all. (Oddly enough, physical labor is probably where human beings would still be necessary the longest; robots aren’t yet very good at climbing stairs or lifting irregularly-shaped objects, much less giving haircuts or painting on canvas.)

The second part is what most “decision-making” research is actually about, and I’ll call it alternative selection. Once you have a list of two, three or four viable options—rarely more than this, as I’ll talk about more in a moment—how do you go about choosing the one you’ll actually do?

This is a topic that has undergone considerable research, and we’re beginning to make progress. The leading models right now are variants of drift-diffusion (hence the title of the post), and these models have the very appealing property that they are neurologically plausible, predictively accurate, and yet close to rationally optimal.

Drift-diffusion models basically are, as I said in the subtitle, a stock market in your brain. Picture the stereotype of the trading floor of the New York Stock Exchange, with hundreds of people bustling about, shouting “Buy!” “Sell!” “Buy!” with the price going up with every “Buy!” and down with every “Sell!”; in reality the NYSE isn’t much like that, and hasn’t been for decades, because everyone is staring at a screen and most of the trading is automated and occurs in microseconds. (It’s kind of like how if you draw a cartoon of a doctor, they will invariably be wearing a head mirror, but if you’ve actually been to a doctor lately, they don’t actually wear those anymore.)

Drift-diffusion, however, is like that. Let’s say we have a decision to make, “Yes” or “No”. Thousands of neurons devoted to that decision start firing, some saying “Yes”, exciting other “Yes” neurons and inhibiting “No” neurons, while others say “No”, exciting other “No” neurons and inhibiting other “Yes” neurons. New information feeds in, triggering some to “Yes” and others to “No”. The resulting process behaves like a random walk, specifically a trend random walk, where the intensity of the trend is determined by whatever criteria you are feeding into the decision. The decision will be made when a certain threshold is reached, say, 95% agreement among all neurons.

I wrote a little R program to demonstrate drift-diffusion models; the images I’ll be showing are R plots from that program. The graphs represent the aggregated “opinion” of all the deciding neurons; as you go from left to right, time passes, and the opinions “drift” toward one side or the other. For these graphs, the top of the graph represents the better choice.

It may actually be easiest to understand if you imagine that we are choosing a belief; new evidence accumulates that pushes us toward the correct answer (top) or the incorrect answer (bottom), because even a true belief will have some evidence that seems to be against it. You encounter this evidence more or less randomly (or do you?), and which belief you ultimately form will depend upon both how strong the evidence is and how thoughtful you are in forming your beliefs.

If the evidence is very strong (or in general, the two choices are very different), the trend will be very strong, and you’ll almost certainly come to a decision very quickly:

   strong_bias

If the evidence is weaker (the two choices are very similar), the trend will be much weaker, and it will take much longer to make a decision:

weak_bias

One way to make a decision faster would be to have a weaker threshold, like 75% agreement instead of 95%; but this has the downside that it can result in making the wrong choice. Notice how some of the paths go down to the bottom, which in this case is the worse choice:

low_threshold

But if there is actually no difference between the two options, a low threshold is good, because you don’t spend time waffling over a pointless decision. (I know that I’ve had a problem with that in real life, spending too long making a decision that ultimately is of minor importance; my drift thresholds are too high!) With a low threshold, you get it over with:

indifferent

With a high threshold, you can go on for ages:

ambivalent

This is the difference between indifferent about a decision and being ambivalent. If you are indifferent, you are dealing with two small amounts of utility and it doesn’t really matter which one you choose. If you are ambivalent, you are dealing with two large amounts of utility and it’s very important to get it right—but you aren’t sure which one to choose. If you are indifferent, you should use a low threshold and get it over with; but if you are ambivalent, it actually makes sense to keep your threshold high and spend a lot of time thinking about the problem in order to be sure you get it right.

It’s also possible to set a higher threshold for one option than the other; I think this is actually what we’re doing when we exhibit many cognitive biases like confirmation bias. If the decision you’re making is between keeping your current beliefs and changing them to something else, your diffusion space actually looks more like this:

confirmation_bias

You’ll only make the correct choice (top) if you set equal thresholds (meaning you reason fairly instead of exhibiting cognitive biases) and high thresholds (meaning you spend sufficient time thinking about the question). If I may change to a sports metaphor, people tend to move the goalposts—the team “change your mind” has to kick a lot further than the team “keep your current belief”.

We can also extend drift-diffusion models to changing your mind (or experiencing regret such as “buyer’s remorse“) if we assume that the system doesn’t actually cut off once it reaches a threshold; the threshold makes us take the action, but then our neurons keep on arguing it out in the background. We may hover near the threshold or soar off into absolute certainty—but on the other hand we may waffle all the way back to the other decision:

regret

There are all sorts of generalizations and extensions of drift-diffusion models, but these basic ones should give you a sense of how useful they are. More importantly, they are accurate; drift-diffusion models produce very sharp mathematical predictions about human behavior, and in general these predictions are verified in experiments.

The main reason we started using drift-diffusion models is that they account very well for the fact that decisions become more accurate when we spend more time on them. The way they do that is quite elegant: Under harsher time pressure, we use lower thresholds, which speeds up the process but also introduces more errors. When we don’t have time pressure, we use high thresholds and take a long time, but almost always make the right decision.

Under certain (rather narrow) circumstances, drift-diffusion models can actually be equivalent to the optimal Bayesian model. These models can also be extended for use in purchasing choices, and one day we will hopefully have a stock-market-in-the-brain model of actual stock market decisions!

Drift-diffusion models are based on decisions between two alternatives with only one relevant attribute under consideration, but they are being expanded to decisions with multiple attributes and decisions with multiple alternatives; the fact that this is difficult is in my opinion not a bug but a feature—decisions with multiple alternatives and attributes are actually difficult for human beings to make. The fact that drift-diffusion models have difficulty with the very situations that human beings have difficulty with provides powerful evidence that drift-diffusion models are accurately representing the processes that go on inside a human brain. I’d be worried if it were too easy to extend the models to complex decisions—it would suggest that our model is describing a more flexible decision process than the one human beings actually use. Human decisions really do seem to be attempts to shoehorn two-choice single-attribute decision methods onto more complex problems, and a lot of mistakes we make are attributable to that.

In particular, the phenomena of analysis paralysis and the paradox of choice are easily explained this way. Why is it that when people are given more alternatives, they often spend far more time trying to decide and often end up less satisfied than they were before? This makes sense if, when faced with a large number of alternatives, we spend time trying to compare them pairwise on every attribute, and then get stuck with a whole bunch of incomparable pairwise comparisons that we then have to aggregate somehow. If we could simply assign a simple utility value to each attribute and sum them up, adding new alternatives should only increase the time required by a small amount and should never result in a reduction in final utility.

When I have an important decision to make, I actually assemble a formal utility model, as I did recently when deciding on a new computer to buy (it should be in the mail any day now!). The hardest part, however, is assigning values to the coefficients in the model; just how much am I willing to spend for an extra gigabyte of RAM, anyway? How exactly do those CPU benchmarks translate into dollar value for me? I can clearly tell that this is not the native process of my mental architecture.

No, alas, we seem to be stuck with drift-diffusion, which is nearly optimal for choices with two alternatives on a single attribute, but actually pretty awful for multiple-alternative multiple-attribute decisions. But perhaps by better understanding our suboptimal processes, we can rearrange our environment to bring us closer to optimal conditions—or perhaps, one day, change the processes themselves!

The Cognitive Science of Morality Part II: Molly Crockett

JDN 2457140 EDT 20:16.

This weekend has been very busy for me, so this post is going to be shorter than most—which is probably a good thing anyway, since my posts tend to run a bit long.

In an earlier post I discussed the Weinberg Cognitive Science Conference and my favorite speaker in the lineup, Joshua Greene. After a brief interlude from Capybara Day, it’s now time to talk about my second-favorite speaker, Molly Crockett. (Is it just me, or does the name “Molly” somehow seem incongruous with a person of such prestige?)

Molly Crockett is a neuroeconomist, though you’d never hear her say that. She doesn’t think of herself as an economist at all, but purely as a neuroscientist. I suspect this is because when she hears the word “economist” she thinks of only mainstream neoclassical economists, and she doesn’t want to be associated with such things.

Still, what she studies is clearly neuroeconomics—I in fact first learned of her work by reading the textbook Neuroeconomics, though I really got interested in her work after watching her TED Talk. It’s one of the better TED talks (they put out so many of them now that the quality is mixed at best); she talks about news reporting on neuroscience, how it is invariably ridiculous and sensationalist. This is particularly frustrating because of how amazing and important neuroscience actually is.

I could almost forgive the sensationalism if they were talking about something that’s actually fantastically boring, like, say, tax codes, or financial regulations. Of course, even then there is the Oliver Effect: You can hide a lot of evil by putting it in something boring. But Dodd-Frank is 2300 pages long; I read an earlier draft that was only (“only”) 600 pages, and it literally contained a three-page section explaining how to define the word “bank”. (Assuming direct proportionality, I would infer that there is now a twelve-page section defining the word “bank”. Hopefully not?) It doesn’t get a whole lot more snoozeworthy than that. So if you must be a bit sensationalist in order to get people to see why eliminating margin requirements and the swaps pushout rule are terrible, terrible ideas, so be it.

But neuroscience is not boring, and so sensationalism only means that news outlets are making up exciting things that aren’t true instead of saying the actually true things that are incredibly exciting.

Here, let me express without sensationalism what Molly Crockett does for a living: Molly Crockett experimentally determines how psychoactive drugs modulate moral judgments. The effects she observes are small, but they are real; and since these experiments are done using small doses for a short period of time, if these effects scale up they could be profound. This is the basic research component—when it comes to technological fruition it will be literally A Clockwork Orange. But it may be A Clockwork Orange in the best possible way: It could be, at last, a medical cure for psychopathy, a pill to make us not just happier or healthier, but better. We are not there yet by any means, but this is clearly the first step: Molly Crockett is to A Clockwork Orange roughly as Michael Faraday is to the Internet.

In one of the experiments she talked about at the conference, Crockett found that serotonin reuptake inhibitors enhance harm aversion. Serotonin reuptake inhibitors are very commonly used drugs—you are likely familiar with one called Prozac. So basically what this study means is that Prozac makes people more averse to causing pain in themselves or others. It doesn’t necessarily make them more altruistic, let alone more ethical; but it does make them more averse to causing pain. (To see the difference, imagine a 19th-century field surgeon dealing with a wounded soldier; there is no anesthetic, but an amputation must be made. Sometimes being ethical requires causing pain.)

The experiment is actually what Crockett calls “the honest Milgram Experiment“; under Milgram, the experimenters told their subjects they would be causing shocks, but no actual shocks were administered. Under Crockett, the shocks are absolutely 100% real (though they are restricted to a much lower voltage of course). People are given competing offers that contain an amount of money and a number of shocks to be delivered, either to you or to the other subject. They decide how much it’s worth to them to bear the shocks—or to make someone else bear them. It’s a classic willingness-to-pay paradigm, applied to the Milgram Experiment.

What Crockett found did not surprise me, nor do I expect it will surprise you if you imagine yourself in the same place; but it would totally knock the socks off of any neoclassical economist. People are much more willing to bear shocks for money than they are to give shocks for money. They are what Crockett terms hyper-altruistic; I would say that they are exhibiting an apparent solidarity coefficient greater than 1. They seem to be valuing others more than they value themselves.

Normally I’d say that this makes no sense at all—why would you value some random stranger more than yourself? Equally perhaps, and obviously only a psychopath would value them not at all; but more? And there’s no way you can actually live this way in your daily life; you’d give away all your possessions and perhaps even starve yourself to death. (I guess maybe Jesus lived that way.) But Crockett came up with a model that explains it pretty well: We are morally risk-averse. If we knew we were dealing with someone very strong who had no trouble dealing with shocks, we’d be willing to shock them a fairly large amount. But we might actually be dealing with someone very vulnerable who would suffer greatly; and we don’t want to take that chance.

I think there’s some truth to that. But her model leaves something else out that I think is quite important: We are also averse to unfairness. We don’t like the idea of raising one person while lowering another. (Obviously not so averse as to never do it—we do it all the time—but without a compelling reason we consider it morally unjustified.) So if the two subjects are in roughly the same condition (being two undergrads at Oxford, they probably are), then helping one while hurting the other is likely to create inequality where none previously existed. But if you hurt yourself in order to help yourself, no such inequality is created; all you do is raise yourself up, provided that you do believe that the money is good enough to be worth the shocks. It’s actually quite Rawslian; lifting one person up while not affecting the other is exactly the sort of inequality you’re allowed to create according to the Difference Principle.

There’s also the fact that the subjects can’t communicate; I think if I could make a deal to share the money afterward, I’d feel better about shocking someone more in order to get us both more money. So perhaps with communication people would actually be willing to shock others more. (And the sensation headline would of course be: “Talking makes people hurt each other.”)

But all of these ideas are things that could be tested in future experiments! And maybe I’ll do those experiments someday, or Crockett, or one of her students. And with clever experimental paradigms we might find out all sorts of things about how the human mind works, how moral intuitions are structured, and ultimately how chemical interventions can actually change human moral behavior. The potential for both good and evil is so huge, it’s both wondrous and terrifying—but can you deny that it is exciting?

And that’s not even getting into the Basic Fact of Cognitive Science, which undermines all concepts of afterlife and theistic religion. I already talked about it before—as the sort of thing that I sort of wish I could say when I introduce myself as a cognitive scientist—but I think it bears repeating.

As Patricia Churchland said on the Colbert Report: Colbert asked, “Are you saying I have no soul?” and she answered, “Yes.” I actually prefer Daniel Dennett’s formulation: “Yes, we have a soul, but it’s made of lots of tiny robots.”

We don’t have a magical, supernatural soul (whatever that means); we don’t have an immortal soul that will rise into Heaven or be reincarnated in someone else. But we do have something worth preserving: We have minds that are capable of consciousness. We love and hate, exalt and suffer, remember and imagine, understand and wonder. And yes, we are born and we die. Once the unique electrochemical pattern that defines your consciousness is sufficiently degraded, you are gone. Nothing remains of what you were—except perhaps the memories of others, or things you have created. But even this legacy is unlikely to last forever. One day it is likely that all of us—and everything we know, and everything we have built, from the Great Pyramids to Hamlet to Beethoven’s Ninth to Principia Mathematica to the US Interstate Highway System—will be gone. I don’t have any consolation to offer you on that point; I can’t promise you that anything will survive a thousand years, much less a million. There is a chance—even a chance that at some point in the distant future, whatever humanity has become will find a way to reverse the entropic decay of the universe itself—but nothing remotely like a guarantee. In all probability you, and I, and all of this will be gone someday, and that is absolutely terrifying.

But it is also undeniably true. The fundamental link between the mind and the brain is one of the basic facts of cognitive science; indeed I like to call it The Basic Fact of Cognitive Science. We know specifically which kinds of brain damage will make you unable to form memories, comprehend language, speak language (a totally different area), see, hear, smell, feel anger, integrate emotions with logic… do I need to go on? Everything that you are is done by your brain—because you are your brain.

Now why can’t the science journalists write about that? Instead we get “The Simple Trick That Can Boost Your Confidence Immediately” and “When it Comes to Picking Art, Men & Women Just Don’t See Eye to Eye.” HuffPo is particularly awful of course; the New York Times is better, but still hardly as good as one might like. They keep trying to find ways to make it exciting—but so rarely seem to grasp how exciting it already is.