What behavioral economics needs

Apr 16 JDN 2460049

The transition from neoclassical to behavioral economics has been a vital step forward in science. But lately we seem to have reached a plateau, with no major advances in the paradigm in quite some time.

It could be that there is work already being done which will, in hindsight, turn out to be significant enough to make that next step forward. But my fear is that we are getting bogged down by our own methodological limitations.

Neoclassical economics shared with us its obsession with mathematical sophistication. To some extent this was inevitable; in order to impress neoclassical economists enough to convert some of them, we had to use fancy math. We had to show that we could do it their way in order to convince them why we shouldn’t—otherwise, they’d just have dismissed us the way they had dismissed psychologists for decades, as too “fuzzy-headed” to do the “hard work” of putting everything into equations.

But the truth is, putting everything into equations was never the right approach. Because human beings clearly don’t think in equations. Once we write down a utility function and get ready to take its derivative and set it equal to zero, we have already distanced ourselves from how human thought actually works.

When dealing with a simple physical system, like an atom, equations make sense. Nobody thinks that the electron knows the equation and is following it intentionally. That equation simply describes how the forces of the universe operate, and the electron is subject to those forces.

But human beings do actually know things and do things intentionally. And while an equation could be useful for analyzing human behavior in the aggregate—I’m certainly not objecting to statistical analysis—it really never made sense to say that people make their decisions by optimizing the value of some function. Most people barely even know what a function is, much less remember calculus well enough to optimize one.

Yet right now, behavioral economics is still all based in that utility-maximization paradigm. We don’t use the same simplistic utility functions as neoclassical economists; we make them more sophisticated and realistic. Yet in that very sophistication we make things more complicated, more difficult—and thus in at least that respect, even further removed from how actual human thought must operate.

The worst offender here is surely Prospect Theory. I recognize that Prospect Theory predicts human behavior better than conventional expected utility theory; nevertheless, it makes absolutely no sense to suppose that human beings actually do some kind of probability-weighting calculation in their heads when they make judgments. Most of my students—who are well-trained in mathematics and economics—can’t even do that probability-weighting calculation on paper, with a calculator, on an exam. (There’s also absolutely no reason to do it! All it does it make your decisions worse!) This is a totally unrealistic model of human thought.

This is not to say that human beings are stupid. We are still smarter than any other entity in the known universe—computers are rapidly catching up, but they haven’t caught up yet. It is just that whatever makes us smart must not be easily expressible as an equation that maximizes a function. Our thoughts are bundles of heuristics, each of which may be individually quite simple, but all of which together make us capable of not only intelligence, but something computers still sorely, pathetically lack: wisdom. Computers optimize functions better than we ever will, but we still make better decisions than they do.

I think that what behavioral economics needs now is a new unifying theory of these heuristics, which accounts for not only how they work, but how we select which one to use in a given situation, and perhaps even where they come from in the first place. This new theory will of course be complex; there’s a lot of things to explain, and human behavior is a very complex phenomenon. But it shouldn’t be—mustn’t be—reliant on sophisticated advanced mathematics, because most people can’t do advanced mathematics (almost by construction—we would call it something different otherwise). If your model assumes that people are taking derivatives in their heads, your model is already broken. 90% of the world’s people can’t take a derivative.

I guess it could be that our cognitive processes in some sense operate as if they are optimizing some function. This is commonly posited for the human motor system, for instance; clearly baseball players aren’t actually solving differential equations when they throw and catch balls, but the trajectories that balls follow do in fact obey such equations, and the reliability with which baseball players can catch and throw suggests that they are in some sense acting as if they can solve them.

But I think that a careful analysis of even this classic example reveals some deeper insights that should call this whole notion into question. How do baseball players actually do what they do? They don’t seem to be calculating at all—in fact, if you asked them to try to calculate while they were playing, it would destroy their ability to play. They learn. They engage in practiced motions, acquire skills, and notice patterns. I don’t think there is anywhere in their brains that is actually doing anything like solving a differential equation. It’s all a process of throwing and catching, throwing and catching, over and over again, watching and remembering and subtly adjusting.

One thing that is particularly interesting to me about that process is that is astonishingly flexible. It doesn’t really seem to matter what physical process you are interacting with; as long as it is sufficiently orderly, such a method will allow you to predict and ultimately control that process. You don’t need to know anything about differential equations in order to learn in this way—and, indeed, I really can’t emphasize this enough, baseball players typically don’t.

In fact, learning is so flexible that it can even perform better than calculation. The usual differential equations most people would think to use to predict the throw of a ball would assume ballistic motion in a vacuum, which absolutely not what a curveball is. In order to throw a curveball, the ball must interact with the air, and it must be launched with spin; curving a baseball relies very heavily on the Magnus Effect. I think it’s probably possible to construct an equation that would fully predict the motion of a curveball, but it would be a tremendously complicated one, and might not even have an exact closed-form solution. In fact, I think it would require solving the Navier-Stokes equations, for which there is an outstanding Millennium Prize. Since the viscosity of air is very low, maybe you could get away with approximating using the Euler fluid equations.

To be fair, a learning process that is adapting to a system that obeys an equation will yield results that become an ever-closer approximation of that equation. And it is in that sense that a baseball player can be said to be acting as if solving a differential equation. But this relies heavily on the system in question being one that obeys an equation—and when it comes to economic systems, is that even true?

What if the reason we can’t find a simple set of equations that accurately describe the economy (as opposed to equations of ever-escalating complexity that still utterly fail to describe the economy) is that there isn’t one? What if the reason we can’t find the utility function people are maximizing is that they aren’t maximizing anything?

What behavioral economics needs now is a new approach, something less constrained by the norms of neoclassical economics and more aligned with psychology and cognitive science. We should be modeling human beings based on how they actually think, not some weird mathematical construct that bears no resemblance to human reasoning but is designed to impress people who are obsessed with math.

I’m of course not the first person to have suggested this. I probably won’t be the last, or even the one who most gets listened to. But I hope that I might get at least a few more people to listen to it, because I have gone through the mathematical gauntlet and earned my bona fides. It is too easy to dismiss this kind of reasoning from people who don’t actually understand advanced mathematics. But I do understand differential equations—and I’m telling you, that’s not how people think.

Implications of stochastic overload

Apr 2 JDN 2460037

A couple weeks ago I presented my stochastic overload model, which posits a neurological mechanism for the Yerkes-Dodson effect: Stress increases sympathetic activation, and this increases performance, up to the point where it starts to risk causing neural pathways to overload and shut down.

This week I thought I’d try to get into some of the implications of this model, how it might be applied to make predictions or guide policy.

One thing I often struggle with when it comes to applying theory is what actual benefits we get from a quantitative mathematical model as opposed to simply a basic qualitative idea. In many ways I think these benefits are overrated; people seem to think that putting something into an equation automatically makes it true and useful. I am sometimes tempted to try to take advantage of this, to put things into equations even though I know there is no good reason to put them into equations, simply because so many people seem to find equations so persuasive for some reason. (Studies have even shown that, particularly in disciplines that don’t use a lot of math, inserting a totally irrelevant equation into a paper makes it more likely to be accepted.)

The basic implications of the Yerkes-Dodson effect are already widely known, and utterly ignored in our society. We know that excessive stress is harmful to health and performance, and yet our entire economy seems to be based around maximizing the amount of stress that workers experience. I actually think neoclassical economics bears a lot of the blame for this, as neoclassical economists are constantly talking about “increasing work incentives”—which is to say, making work life more and more stressful. (And let me remind you that there has never been any shortage of people willing to work in my lifetime, except possibly briefly during the COVID pandemic. The shortage has always been employers willing to hire them.)

I don’t know if my model can do anything to change that. Maybe by putting it into an equation I can make people pay more attention to it, precisely because equations have this weird persuasive power over most people.

As far as scientific benefits, I think that the chief advantage of a mathematical model lies in its ability to make quantitative predictions. It’s one thing to say that performance increases with low levels of stress then decreases with high levels; but it would be a lot more useful if we could actually precisely quantify how much stress is optimal for a given person and how they are likely to perform at different levels of stress.

Unfortunately, the stochastic overload model can only make detailed predictions if you have fully specified the probability distribution of innate activation, which requires a lot of free parameters. This is especially problematic if you don’t even know what type of distribution to use, which we really don’t; I picked three classes of distribution because they were plausible and tractable, not because I had any particular evidence for them.

Also, we don’t even have standard units of measurement for stress; we have a vague notion of what more or less stressed looks like, but we don’t have the sort of quantitative measure that could be plugged into a mathematical model. Probably the best units to use would be something like blood cortisol levels, but then we’d need to go measure those all the time, which raises its own issues. And maybe people don’t even respond to cortisol in the same ways? But at least we could measure your baseline cortisol for awhile to get a prior distribution, and then see how different incentives increase your cortisol levels; and then the model should give relatively precise predictions about how this will affect your overall performance. (This is a very neuroeconomic approach.)

So, for now, I’m not really sure how useful the stochastic overload model is. This is honestly something I feel about a lot of the theoretical ideas I have come up with; they often seem too abstract to be usefully applicable to anything.

Maybe that’s how all theory begins, and applications only appear later? But that doesn’t seem to be how people expect me to talk about it whenever I have to present my work or submit it for publication. They seem to want to know what it’s good for, right now, and I never have a good answer to give them. Do other researchers have such answers? Do they simply pretend to?

Along similar lines, I recently had one of my students ask about a theory paper I wrote on international conflict for my dissertation, and after sending him a copy, I re-read the paper. There are so many pages of equations, and while I am confident that the mathematical logic is valid,I honestly don’t know if most of them are really useful for anything. (I don’t think I really believe that GDP is produced by a Cobb-Douglas production function, and we don’t even really know how to measure capital precisely enough to say.) The central insight of the paper, which I think is really important but other people don’t seem to care about, is a qualitative one: International treaties and norms provide an equilibrium selection mechanism in iterated games. The realists are right that this is cheap talk. The liberals are right that it works. Because when there are many equilibria, cheap talk works.

I know that in truth, science proceeds in tiny steps, building a wall brick by brick, never sure exactly how many bricks it will take to finish the edifice. It’s impossible to see whether your work will be an irrelevant footnote or the linchpin for a major discovery. But that isn’t how the institutions of science are set up. That isn’t how the incentives of academia work. You’re not supposed to say that this may or may not be correct and is probably some small incremental progress the ultimate impact of which no one can possibly foresee. You’re supposed to sell your work—justify how it’s definitely true and why it’s important and how it has impact. You’re supposed to convince other people why they should care about it and not all the dozens of other probably equally-valid projects being done by other researchers.

I don’t know how to do that, and it is agonizing to even try. It feels like lying. It feels like betraying my identity. Being good at selling isn’t just orthogonal to doing good science—I think it’s opposite. I think the better you are at selling your work, the worse you are at cultivating the intellectual humility necessary to do good science. If you think you know all the answers, you’re just bad at admitting when you don’t know things. It feels like in order to succeed in academia, I have to act like an unscientific charlatan.

Honestly, why do we even need to convince you that our work is more important than someone else’s? Are there only so many science points to go around? Maybe the whole problem is this scarcity mindset. Yes, grant funding is limited; but why does publishing my work prevent you from publishing someone else’s? Why do you have to reject 95% of the papers that get sent to you? Don’t tell me you’re limited by space; the journals are digital and searchable and nobody reads the whole thing anyway. Editorial time isn’t infinite, but most of the work has already been done by the time you get a paper back from peer review. Of course, I know the real reason: Excluding people is the main source of prestige.

The role of innate activation in stochastic overload

Mar 26 JDN 2460030

Two posts ago I introduced my stochastic overload model, which offers an explanation for the Yerkes-Dodson effect by positing that additional stress increases sympathetic activation, which is useful up until the point where it starts risking an overload that forces systems to shut down and rest.

The central equation of the model is actually quite simple, expressed either as an expectation or as an integral:

Y = E[x + s | x + s < 1] P[x + s < 1]

Y = \int_{0}^{1-s} (x+s) dF(x)

The amount of output produced is the expected value of innate activation plus stress activation, times the probability that there is no overload. Increased stress raises this expectation value (the incentive effect), but also increases the probability of overload (the overload effect).

The model relies upon assuming that the brain starts with some innate level of activation that is partially random. Exactly what sort of Yerkes-Dodson curve you get from this model depends very much on what distribution this innate activation takes.

I’ve so far solved it for three types of distribution.

The simplest is a uniform distribution, where within a certain range, any level of activation is equally probable. The probability density function looks like this:

Assume the distribution has support between a and b, where a < b.

When b+s < 1, then overload is impossible, and only the incentive effect occurs; productivity increases linearly with stress.

The expected output is simply the expected value of a uniform distribution from a+s to b+s, which is:

E[x + s] = (a+b)/2+s

Then, once b+s > 1, overload risk begins to increase.

In this range, the probability of avoiding overload is:

P[x + s < 1] = F(1-s) = (1-s-a)/(b-a)

(Note that at b+s=1, this is exactly 1.)

The expected value of x+s in this range is:

E[x + s | x + s < 1] = (1-s)(1+s)/(2(b-a))

Multiplying these two together:

Y = [(1-s)(1+s)(1-s-a)]/[2(b-a)^2]

Here is what that looks like for a=0, b=1/2:

It does have the right qualitative features: increasing, then decreasing. But its sure looks weird, doesn’t it? It has this strange kinked shape.

So let’s consider some other distributions.

The next one I was able to solve it for is an exponential distribution, where the most probable activation is zero, and then higher activation always has lower probability than lower activation in an exponential decay:

For this it was actually easiest to do the integral directly (I did it by integrating by parts, but I’m sure you don’t care about all the mathematical steps):

Y = \int_{0}^{1-s} (x+s) dF(x)

Y = (1/λ+s) – (1/ λ + 1)e^(-λ(1-s))

The parameter λdecides how steeply your activation probability decays. Someone with low λ is relatively highly activated all the time, while someone with high λ is usually not highly activated; this seems like it might be related to the personality trait neuroticism.

Here are graphs of what the resulting Yerkes-Dodson curve looks like for several different values of λ:

λ = 0.5:

λ = 1:

λ = 2:

λ = 4:

λ = 8:

The λ = 0.5 person has high activation a lot of the time. They are actually fairly productive even without stress, but stress quickly overwhelms them. The λ = 8 person has low activation most of the time. They are not very productive without stress, but can also bear relatively high amounts of stress without overloading.

(The low-λ people also have overall lower peak productivity in this model, but that might not be true in reality, if λ is inversely correlated with some other attributes that are related to productivity.)

Neither uniform nor exponential has the nice bell-curve shape for innate activation we might have hoped for. There is another class of distributions, beta distributions, which do have this shape, and they are sort of tractable—you need something called an incomplete beta function, which isn’t an elementary function but it’s useful enough that most statistical packages include it.

Beta distributions have two parameters, α and β. They look like this:

Beta distributions are quite useful in Bayesian statistics; if you’re trying to estimate the probability of a random event that either succeeds or fails with a fixed probability (a Bernoulli process), and so far you have observed a successes and b failures, your best guess of its probability at each trial is a beta distribution with α = a+1 and β = b+1.

For beta distributions with parameters α and β, the result comes out to (I is that incomplete beta function I mentioned earlier):

Y = I(1-s, α+1, β) + I(1-s, α, β)

For whole number values of α andβ, the incomplete beta function can be computed by hand (though it is more work the larger they are); here’s an example with α = β = 2.

The innate activation probability looks like this:

And the result comes out like this:

Y = 2(1-s)^3 – 3/2(1-s)^4 + 3s(1-s)^2 – 2s(1-s)^3

This person has pretty high innate activation most of the time, so stress very quickly overwhelms them. If I had chosen a much higher β, I could change that, making them less likely to be innately so activated.

These are the cases I’ve found to be relatively tractable so far. They all have the right qualitative pattern: Increasing stress increases productivity for awhile, then begins decreasing it once overload risk becomes too high. They also show a general pattern where people who are innately highly activated (neurotic?) are much more likely to overload and thus much more sensitive to stress.

The stochastic overload model

The stochastic overload model

Mar 12 JDN 2460016

The next few posts are going to be a bit different, a bit more advanced and technical than usual. This is because, for the first time in several months at least, I am actually working on what could be reasonably considered something like theoretical research.

I am writing it up in the form of blog posts, because actually writing a paper is still too stressful for me right now. This also forces me to articulate my ideas in a clearer and more readable way, rather than dive directly into a morass of equations. It also means that even if I do never actually get around to finishing a paper, the idea is out there, and maybe someone else could make use of it (and hopefully give me some of the credit).

I’ve written previously about the Yerkes-Dodson effect: On cognitively-demanding tasks, increased stress increases performance, but only to a point, after which it begins decreasing it again. The effect is well-documented, but the mechanism is poorly understood.

I am currently on the wrong side of the Yerkes-Dodson curve, which is why I’m too stressed to write this as a formal paper right now. But that also gave me some ideas about how it may work.

I have come up with a simple but powerful mathematical model that may provide a mechanism for the Yerkes-Dodson effect.

This model is clearly well within the realm of a behavioral economic model, but it is also closely tied to neuroscience and cognitive science.

I call it the stochastic overload model.

First, a metaphor: Consider an engine, which can run faster or slower. If you increase its RPMs, it will output more power, and provide more torque—but only up to a certain point. Eventually it hits a threshold where it will break down, or even break apart. In real engines, we often include safety systems that force the engine to shut down as it approaches such a threshold.

I believe that human brains function on a similar principle. Stress increases arousal, which activates a variety of processes via the sympathetic nervous system. This activation improves performance on both physical and cognitive tasks. But it has a downside; especially on cognitively demanding tasks which required sustained effort, I hypothesize that too much sympathetic activation can result in a kind of system overload, where your brain can no longer handle the stress and processes are forced to shut down.

This shutdown could be brief—a few seconds, or even a fraction of a second—or it could be prolonged—hours or days. That might depend on just how severe the stress is, or how much of your brain it requires, or how prolonged it is. For purposes of the model, this isn’t vital. It’s probably easiest to imagine it being a relatively brief, localized shutdown of a particular neural pathway. Then, your performance in a task is summed up over many such pathways over a longer period of time, and by the law of large numbers your overall performance is essentially the average performance of all your brain systems.

That’s the “overload” part of the model. Now for the “stochastic” part.

Let’s say that, in the absence of stress, your brain has a certain innate level of sympathetic activation, which varies over time in an essentially chaotic, unpredictable—stochastic—sort of way. It is never really completely deactivated, and may even have some chance of randomly overloading itself even without outside input. (Actually, a potential role in the model for the personality trait neuroticism is an innate tendency toward higher levels of sympathetic activation in the absence of outside stress.)

Let’s say that this innate activation is x, which follows some kind of known random distribution F(x).

For simplicity, let’s also say that added stress s adds linearly to your level of sympathetic activation, so your overall level of activation is x + s.

For simplicity, let’s say that activation ranges between 0 and 1, where 0 is no activation at all and 1 is the maximum possible activation and triggers overload.

I’m assuming that if a pathway shuts down from overload, it doesn’t contribute at all to performance on the task. (You can assume it’s only reduced performance, but this adds complexity without any qualitative change.)

Since sympathetic activation improves performance, but can result in overload, your overall expected performance in a given task can be computed as the product of two terms:

[expected value of x + s, provided overload does not occur] * [probability overload does not occur]

E[x + s | x + s < 1] P[x + s < 1]

The first term can be thought of as the incentive effect: Higher stress promotes more activation and thus better performance.

The second term can be thought of as the overload effect: Higher stress also increases the risk that activation will exceed the threshold and force shutdown.

This equation actually turns out to have a remarkably elegant form as an integral (and here’s where I get especially technical and mathematical):

\int_{0}^{1-s} (x+s) dF(x)

The integral subsumes both the incentive effect and the overload effect into one term; you can also think of the +s in the integrand as the incentive effect and the 1-s in the limit of integration as the overload effect.

For the uninitated, this is probably just Greek. So let me show you some pictures to help with your intuition. These are all freehand sketches, so let me apologize in advance for my limited drawing skills. Think of this as like Arthur Laffer’s famous cocktail napkin.

Suppose that, in the absence of outside stress, your innate activation follows a distribution like this (this could be a normal or logit PDF; as I’ll talk about next week, logit is far more tractable):

As I start adding stress, this shifts the distribution upward, toward increased activation:

Initially, this will improve average performance.

But at some point, increased stress actually becomes harmful, as it increases the probability of overload.

And eventually, the probability of overload becomes so high that performance becomes worse than it was with no stress at all:

The result is that overall performance, as a function of stress, looks like an inverted U-shaped curve—the Yerkes-Dodson curve:

The precise shape of this curve depends on the distribution that we use for the innate activation, which I will save for next week’s post.

The injustice of talent

Sep 4 JDN 2459827

Consider the following two principles of distributive justice.

A: People deserve to be rewarded in proportion to what they accomplish.

B: People deserve to be rewarded in proportion to the effort they put in.

Both principles sound pretty reasonable, don’t they? They both seem like sensible notions of fairness, and I think most people would broadly agree with both them.

This is a problem, because they are mutually contradictory. We cannot possibly follow them both.

For, as much as our society would like to pretend otherwise—and I think this contradiction is precisely why our society would like to pretend otherwise—what you accomplish is not simply a function of the effort you put in.

Don’t get me wrong; it is partly a function of the effort you put in. Hard work does contribute to success. But it is neither sufficient, nor strictly necessary.

Rather, success is a function of three factors: Effort, Environment, and Talent.

Effort is the work you yourself put in, and basically everyone agrees you deserve to be rewarded for that.

Environment includes all the outside factors that affect you—including both natural and social environment. Inheritance, illness, and just plain luck are all in here, and there is general, if not universal, agreement that society should make at least some efforts to minimize inequality created by such causes.

And then, there is talent. Talent includes whatever capacities you innately have. It could be strictly genetic, or it could be acquired in childhood or even in the womb. But by the time you are an adult and responsible for your own life, these factors are largely fixed and immutable. This includes things like intelligence, disability, even height. The trillion-dollar question is: How much should we reward talent?

For talent clearly does matter. I will never swim like Michael Phelps, run like Usain Bolt, or shoot hoops like Steph Curry. It doesn’t matter how much effort I put in, how many hours I spend training—I will never reach their level of capability. Never. It’s impossible. I could certainly improve from my current condition; perhaps it would even be good for me to do so. But there are certain hard fundamental constraints imposed by biology that give them more potential in these skills than I will ever have.

Conversely, there are likely things I can do that they will never be able to do, though this is less obvious. Could Michael Phelps never be as good a programmer or as skilled a mathematician as I am? He certainly isn’t now. Maybe, with enough time, enough training, he could be; I honestly don’t know. But I can tell you this: I’m sure it would be harder for him than it was for me. He couldn’t breeze through college-level courses in differential equations and quantum mechanics the way I did. There is something I have that he doesn’t, and I’m pretty sure I was born with it. Call it spatial working memory, or mathematical intuition, or just plain IQ. Whatever it is, math comes easy to me in not so different a way from how swimming comes easy to Michael Phelps. I have talent for math; he has talent for swimming.

Moreover, these are not small differences. It’s not like we all come with basically the same capabilities with a little bit of variation that can be easily washed out by effort. We’d like to believe that—we have all sorts of cultural tropes that try to inculcate that belief in us—but it’s obviously not true. The vast majority of quantum physicists are people born with high IQ. The vast majority of pro athletes are people born with physical prowess. The vast majority of movie stars are people born with pretty faces. For many types of jobs, the determining factor seems to be talent.

This isn’t too surprising, actually—even if effort matters a lot, we would still expect talent to show up as the determining factor much of the time.

Let’s go back to that contest function model I used to analyze the job market awhile back (the one that suggests we spend way too much time and money in the hiring process). This time let’s focus on the perspective of the employees themselves.

Each employee has a level of talent, h. Employee X has talent hx and exerts effort x, producing output of a quality that is the product of these: hx x. Similarly, employee Z has talent hz and exerts effort z, producing output hz z.

Then, there’s a certain amount of luck that factors in. The most successful output isn’t necessarily the best, or maybe what should have been the best wasn’t because some random circumstance prevailed. But we’ll say that the probability an individual succeeds is proportional to the quality of their output.

So the probability that employee X succeeds is: hx x / ( hx x + hz z)

I’ll skip the algebra this time (if you’re interested you can look back at that previous post), but to make a long story short, in Nash equilibrium the two employees will exert exactly the same amount of effort.

Then, which one succeeds will be entirely determined by talent; because x = z, the probability that X succeeds is hx / ( hx + hz).

It’s not that effort doesn’t matter—it absolutely does matter, and in fact in this model, with zero effort you get zero output (which isn’t necessarily the case in real life). It’s that in equilibrium, everyone is exerting the same amount of effort; so what determines who wins is innate talent. And I gotta say, that sounds an awful lot like how professional sports works. It’s less clear whether it applies to quantum physicists.

But maybe we don’t really exert the same amount of effort! This is true. Indeed, it seems like actually effort is easier for people with higher talent—that the same hour spent running on a track is easier for Usain Bolt than for me, and the same hour studying calculus is easier for me than it would be for Usain Bolt. So in the end our equilibrium effort isn’t the same—but rather than compensating, this effect only serves to exaggerate the difference in innate talent between us.

It’s simple enough to generalize the model to allow for such a thing. For instance, I could say that the cost of producing a unit of effort is inversely proportional to your talent; then instead of hx / ( hx + hz ), in equilibrium the probability of X succeeding would become hx2 / ( hx2 + hz2). The equilibrium effort would also be different, with x > z if hx > hz.

Once we acknowledge that talent is genuinely important, we face an ethical problem. Do we want to reward people for their accomplishment (A), or for their effort (B)? There are good cases to be made for each.

Rewarding for accomplishment, which we might call meritocracy,will tend to, well, maximize accomplishment. We’ll get the best basketball players playing basketball, the best surgeons doing surgery. Moreover, accomplishment is often quite easy to measure, even when effort isn’t.

Rewarding for effort, which we might call egalitarianism, will give people the most control over their lives, and might well feel the most fair. Those who succeed will be precisely those who work hard, even if they do things they are objectively bad at. Even people who are born with very little talent will still be able to make a living by working hard. And it will ensure that people do work hard, which meritocracy can actually fail at: If you are extremely talented, you don’t really need to work hard because you just automatically succeed.

Capitalism, as an economic system, is very good at rewarding accomplishment. I think part of what makes socialism appealing to so many people is that it tries to reward effort instead. (Is it very good at that? Not so clear.)

The more extreme differences are actually in terms of disability. There’s a certain baseline level of activities that most people are capable of, which we think of as “normal”: most people can talk; most people can run, if not necessarily very fast; most people can throw a ball, if not pitch a proper curveball. But some people can’t throw. Some people can’t run. Some people can’t even talk. It’s not that they are bad at it; it’s that they are literally not capable of it. No amount of effort could have made Stephen Hawking into a baseball player—not even a bad one.

It’s these cases when I think egalitarianism becomes most appealing: It just seems deeply unfair that people with severe disabilities should have to suffer in poverty. Even if they really can’t do much productive work on their own, it just seems wrong not to help them, at least enough that they can get by. But capitalism by itself absolutely would not do that—if you aren’t making a profit for the company, they’re not going to keep you employed. So we need some kind of social safety net to help such people. And it turns out that such people are quite numerous, and our current system is really not adequate to help them.

But meritocracy has its pull as well. Especially when the job is really important—like surgery, not so much basketball—we really want the highest quality work. It’s not so important whether the neurosurgeon who removes your tumor worked really hard at it or found it a breeze; what we care about is getting that tumor out.

Where does this leave us?

I think we have no choice but to compromise, on both principles. We will reward both effort and accomplishment, to greater or lesser degree—perhaps varying based on circumstances. We will never be able to entirely reward accomplishment or entirely reward effort.

This is more or less what we already do in practice, so why worry about it? Well, because we don’t like to admit that it’s what we do in practice, and a lot of problems seem to stem from that.

We have people acting like billionaires are such brilliant, hard-working people just because they’re rich—because our society rewards effort, right? So they couldn’t be so successful if they didn’t work so hard, right? Right?

Conversely, we have people who denigrate the poor as lazy and stupid just because they are poor. Because it couldn’t possibly be that their circumstances were worse than yours? Or hey, even if they are genuinely less talented than you—do less talented people deserve to be homeless and starving?

We tell kids from a young age, “You can be whatever you want to be”, and “Work hard and you’ll succeed”; and these things simply aren’t true. There are limitations on what you can achieve through effort—limitations imposed by your environment, and limitations imposed by your innate talents.

I’m not saying we should crush children’s dreams; I’m saying we should help them to build more realistic dreams, dreams that can actually be achieved in the real world. And then, when they grow up, they either will actually succeed, or when they don’t, at least they won’t hate themselves for failing to live up to what you told them they’d be able to do.

If you were wondering why Millennials are so depressed, that’s clearly a big part of it: We were told we could be and do whatever we wanted if we worked hard enough, and then that didn’t happen; and we had so internalized what we were told that we thought it had to be our fault that we failed. We didn’t try hard enough. We weren’t good enough. I have spent years feeling this way—on some level I do still feel this way—and it was not because adults tried to crush my dreams when I was a child, but on the contrary because they didn’t do anything to temper them. They never told me that life is hard, and people fail, and that I would probably fail at my most ambitious goals—and it wouldn’t be my fault, and it would still turn out okay.

That’s really it, I think: They never told me that it’s okay not to be wildly successful. They never told me that I’d still be good enough even if I never had any great world-class accomplishments. Instead, they kept feeding me the lie that I would have great world-class accomplishments; and then, when I didn’t, I felt like a failure and I hated myself. I think my own experience may be particularly extreme in this regard, but I know a lot of other people in my generation who had similar experiences, especially those who were also considered “gifted” as children. And we are all now suffering from depression, anxiety, and Impostor Syndrome.

All because nobody wanted to admit that talent, effort, and success are not the same thing.

Scalability and inequality

May 15 JDN 2459715

Why are some molecules (e.g. DNA) billions of times larger than others (e.g. H2O), but all atoms are within a much narrower range of sizes (only a few hundred)?

Why are some animals (e.g. elephants) millions of times as heavy as other (e.g. mice), but their cells are basically the same size?

Why does capital income vary so much more (factors of thousands or millions) than wages (factors of tens or hundreds)?

These three questions turn out to have much the same answer: Scalability.

Atoms are not very scalable: Adding another proton to a nucleus causes interactions with all the other protons, which makes the whole atom unstable after a hundred protons or so. But molecules, particularly organic polymers such as DNA, are tremendously scalable: You can add another piece to one end without affecting anything else in the molecule, and keep on doing that more or less forever.

Cells are not very scalable: Even with the aid of active transport mechanisms and complex cellular machinery, a cell’s functionality is still very much limited by its surface area. But animals are tremendously scalable: The same exponential growth that got you from a zygote to a mouse only needs to continue a couple years longer and it’ll get you all the way to an elephant. (A baby elephant, anyway; an adult will require a dozen or so years—remarkably comparable to humans, in fact.)

Labor income is not very scalable: There are only so many hours in a day, and the more hours you work the less productive you’ll be in each additional hour. But capital income is perfectly scalable: We can add another digit to that brokerage account with nothing more than a few milliseconds of electronic pulses, and keep doing that basically forever (due to the way integer storage works, above 2^63 it would require special coding, but it can be done; and seeing as that’s over 9 quintillion, it’s not likely to be a problem any time soon—though I am vaguely tempted to write a short story about an interplanetary corporation that gets thrown into turmoil by an integer overflow error).

This isn’t just an effect of our accounting either. Capital is scalable in a way that labor is not. When your contribution to production is owning a factory, there’s really nothing to stop you from owning another factory, and then another, and another. But when your contribution is working at a factory, you can only work so hard for so many hours.

When a phenomenon is highly scalable, it can take on a wide range of outcomes—as we see in molecules, animals, and capital income. When it’s not, it will only take on a narrow range of outcomes—as we see in atoms, cells, and labor income.

Exponential growth is also part of the story here: Animals certainly grow exponentially, and so can capital when invested; even some polymers function that way (e.g. under polymerase chain reaction). But I think the scalability is actually more important: Growing rapidly isn’t so useful if you’re going to immediately be blocked by a scalability constraint. (This actually relates to the difference between r- and K- evolutionary strategies, and offers further insight into the differences between mice and elephants.) Conversely, even if you grow slowly, given enough time, you’ll reach whatever constraint you’re up against.

Indeed, we can even say something about the probability distribution we are likely to get from random processes that are scalable or non-scalable.

A non-scalable random process will generally converge toward the familiar normal distribution, a “bell curve”:

[Image from Wikipedia: By Inductiveload – self-made, Mathematica, Inkscape, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3817954]

The normal distribution has most of its weight near the middle; most of the population ends up near there. This is clearly the case for labor income: Most people are middle class, while some are poor and a few are rich.

But a scalable random process will typically converge toward quite a different distribution, a Pareto distribution:

[Image from Wikipedia: By Danvildanvil – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=31096324]

A Pareto distribution has most of its weight near zero, but covers an extremely wide range. Indeed it is what we call fat tailed, meaning that really extreme events occur often enough to have a meaningful effect on the average. A Pareto distribution has most of the people at the bottom, but the ones at the top are really on top.

And indeed, that’s exactly how capital income works: Most people have little or no capital income (indeed only about half of Americans and only a third(!) of Brits own any stocks at all), while a handful of hectobillionaires make utterly ludicrous amounts of money literally in their sleep.

Indeed, it turns out that income in general is pretty close to distributed normally (or maybe lognormally) for most of the income range, and then becomes very much Pareto at the top—where nearly all the income is capital income.

This fundamental difference in scalability between capital and labor underlies much of what makes income inequality so difficult to fight. Capital is scalable, and begets more capital. Labor is non-scalable, and we only have to much to give.

It would require a radically different system of capital ownership to really eliminate this gap—and, well, that’s been tried, and so far, it hasn’t worked out so well. Our best option is probably to let people continue to own whatever amounts of capital, and then tax the proceeds in order to redistribute the resulting income. That certainly has its own downsides, but they seem to be a lot more manageable than either unfettered anarcho-capitalism or totalitarian communism.

The fragility of encryption

Feb 13 JDN 2459620

I said in last week’s post that most of the world’s online security rests upon public-key encryption. It’s how we do our shopping, our banking, and paying our taxes.

Yet public-key encryption has an Achilles’ Heel. It relies entirely on the assumption that, even knowing someone’s public key, you can’t possibly figure out what their private key is. Yet obviously the two must be deeply connected: In order for my private key to decrypt all messages that are encrypted using my public key, they must, in a deep sense, contain the same information. There must be a mathematical operation that will translate from one to the other—and that mathematical operation must be invertible.

What we have been relying on to keep public-key encryption secure is the notion of a one-way function: A function that is easy to compute, but hard to invert. A typical example is multiplying two numbers: Multiplication is a basic computing operation that is extremely fast, even for numbers with thousands of digits; but factoring a number into its prime factors is far more difficult, and currently cannot be done in any reasonable amount of time for numbers that are more than a hundred digits long.


“Easy” and “hard” in what sense? The usual criterion is in polynomial time.

Say you have an input that is n bits long—i.e. n digits, when expressed as a binary number, all 0s and 1s. A function that can be computed in time proportional to n is linear time; if it can only be done in time proportional to n2, that is quadratic time; n3 would be cubic time. All of these are examples of polynomial time.

But if instead the time required were 2n, that would be exponential time. 3n and 1.5n would also be exponential time.

This is significant because of how much faster exponential functions grow relative to polynomial functions, for large values of n. For example, let’s compare n3 with2n. When n=3, the polynomial is actually larger: n3=27 but 2n=8. At n=10 they are nearly equal: n3=1000 but 2n=1024. But by n=20, n3 is only 8000 while 2n is over 1 million. At n=100, n3is a manageable (for a modern computer) 1 million, while 2nis a staggering 1030; that’s a million trillion trillion.

You may see that there is already something a bit fishy about this: There are lots of different ways to be polynomial and lots of different ways to be exponential. Linear time n is clearly fast, and for many types of problems it seems unlikely one could do any better. But is n100 time really all that fast? It’s still polynomial. It doesn’t take a large exponential base to make for very fast growth—2 doesn’t seem that big, after all, and when dealing with binary digits it shows up quite naturally. But while 2n grows very fast even for reasonably-sized n, 1.0000001n grows slower than most polynomials—even linear!—for quite a long range before eventually becoming very fast growth when n is in the hundreds of millions. Yet it is still exponential.


So, why do we use these categories? Well, computer scientists and mathematicians have discovered that many types of problems that seem different can in fact be translated into one another, so that solving one would solve the other. For instance, you can easily convert between the Boolean satisfiability problem and the subset-sum problem or the travelling salesman problem. These conversions always take time that is a polynomial in n(usually somewhere between linear and quadratic, as it turns out). This has allowed to build complexity classes, classes of problem such that any problem can be converted to any other in polynomial time or better.

Problems that can be solved in polynomial timeare in class P, for polynomial.

Problems that can be checked—but not necessarily solved—in polynomial time are in class NP, which actually stands for “non-deterministic polynomial” (not a great name, to be honest). Given a problem in NP, you may not be able to come up with a valid answer in polynomial time. But if someone gave you an answer, you could tell in polynomial time whether or not that answer was valid.

Boolean satisfiability (often abbreviated SAT) is the paradigmatic NP problem: Given a Boolean formula like (A OR B OR C) AND (¬A OR D OR E) AND (¬D OR ¬C OR B) and so on, it isn’t a simple task to determine if there’s some assignment of the variables A, B, C, D, E that makes it all true. But if someone handed you such an assignment, say (¬A, B, ¬C, D, E), you could easily check that it does in fact satisfy the expression. It turns out that in fact SAT is what’s called NP-complete: Any NP problem can be converted into SAT in polynomial time.

This is important because in order to be useful as an encryption system, we need our one-way function to be in class P (otherwise, we couldn’t compute it quickly). Yet, by definition, this means its inverse must be in class NP.


Thus, simply because it is easy to multiply two numbers, I know for sure that factoring numbers must be in NP: All I have to do to verify that a factorization is correct is multiply the numbers. Since the way to get a public key from a private key is (essentially) to multiply two numbers, this means that getting a private key from a public key is equivalent to factorization—which means it must be in NP.

This would be fine if we knew some problems in NP that could never, ever be solved in polynomial time. We could just pick one of those and make it the basis of our encryption system. Yet in fact, we do not know any such problems—indeed, we are not even certain they exist.

One of the biggest unsolved problems in mathematics is P versus NP, which asks the seemingly-simple question: “Are P and NP really different classes?” It certainly seems like they are—there are problems like multiplying numbers, or even finding out whether a number is prime, that are clearly in P, and there are other problems, like SAT, that are definitely in NP but seem to not be in P. But in fact no one has ever been able to prove that P ≠ NP. Despite decades of attempts, no one has managed it.

To be clear, no one has managed to prove that P = NP, either. (Doing either one would win you a Clay Millennium Prize.) But since the conventional wisdom among most mathematicians is that P ≠ NP (99% of experts polled in 2019 agreed), I actually think this possibility has not been as thoroughly considered.

Vague heuristic arguments are often advanced for why P ≠ NP, such as this one by Scott Aaronson: “If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found.”

That really doesn’t follow at all. Doing something in polynomial time is not the same thing as doing it instantly.

Say for instance someone finds an algorithm to solve SAT in n6 time. Such an algorithm would conclusively prove P = NP. n6; that’s a polynomial, all right. But it’s a big polynomial. The time required to check a SAT solution is linear in the number of terms in the Boolean formula—just check each one, see if it works. But if it turns out we could generate such a solution in time proportional to the sixth power of the number of terms, that would still mean it’s a lot easier to check than it is to solve. A lot easier.

I guess if your notion of a “fundamental gap” rests upon the polynomial/exponential distinction, you could say that’s not “fundamental”. But this is a weird notion to say the least. If n = 1 million can be checked in 1 million processor cycles (that is, milliseconds, or with some overhead, seconds), but only solved in 1036 processor cycles (that is, over a million trillion years), that sounds like a pretty big difference to me.

Even an n2 algorithm wouldn’t show there’s no difference. The difference between n and n2, is, well, a factor of n. So finding the answer could still take far longer than verifying it. This would be worrisome for encryption, however: Even a million times as long isn’t really that great actually. It means that if something would work in a few seconds for an ordinary computer (the timescale we want for our online shopping and banking), then, say, the Russian government with a supercomputer a thousand times better could spend half an hour on it. That’s… a problem. I guess if breaking our encryption was only feasible for superpower national intelligence agencies, it wouldn’t be a complete disaster. (Indeed, many people suspect that the NSA and FSB have already broken most of our encryption, and I wouldn’t be surprised to learn that’s true.)

But what I really want to say here is that since it may be true that P=NP—we don’t know it isn’t, even if most people strongly suspect as much—we should be trying to find methods of encryption that would remain secure even if that turns out to be the case. (There’s another reason as well: Quantum computers are known to be able to factor numbers in polynomial time—though it may be awhile before they get good enough to do so usefully.)

We do know two such methods, as a matter of fact. There is quantum encryption, which, like most things quantum, is very esoteric and hard to explain. (Maybe I’ll get to that in another post.) It also requires sophisticated, expensive hardware that most people are unlikely to be able to get.

And then there is onetime pad encryption, which is shockingly easy to explain and can be implemented on any home computer.

The problem with substitution ciphers is that you can look for patterns. You can do this because the key ultimately contains only so much information, based on how long it is. If the key contains 100 bits and the message contains 10,000 bits, at some point you’re going to have to repeat some kind of pattern—even if it’s a very complex, sophisticated one like the Enigma machine.

Well, what if the key were as long as the message? What if a 10,000 bit message used a 10,000 bit key? Then you could substitute every single letter for a different symbol each time. What if, on its first occurrence, E is D, but then it’s Q, and then it’s T—and each of these was generated randomly and independently each time? Then it can’t be broken by searching for patterns—because there are no patterns to be found.

Mathematically, it would look like this: Take each bit of the plaintext, and randomly generate another bit for the key. Add the key bit to the plaintext bit (technically you want to use bitwise XOR, but that’s basically adding), and you’ve got the ciphertext bit. At the other end, subtracting out each key bit will give back each plaintext bit. Provided you can generate random numbers efficiently, this will be fast to encrypt and decrypt—but literally impossible to break without the key.

Indeed, onetime-pad encryption is so secure that it is a proven mathematical theorem that there is no way to break it. Even if you had such staggering computing power that you could try every possible key, you wouldn’t even know when you got the right one—because every possible message can be generated from a given ciphertext, using some key. Even if you knew some parts of the message already, you would have no way to figure out any of the rest—because there are no patterns linking the two.

The downside is that you need to somehow send the keys. As I said in last week’s post, if you have a safe way to send the key, why can’t you send the message that way? Well, there is still an advantage, actually, and that’s speed.

If there is a slow, secure way to send information (e.g. deliver it physically by armed courier), and a fast, insecure way (e.g. send it over the Internet), then you can send the keys in advance by the slow, safe way and then send ciphertexts later the fast, risky way. Indeed, this kind of courier-based onetime-pad encryption is how the red phone” (really a fax line) linking the White House to the Kremlin works.

Now, for online banking, we’re not going to be able to use couriers. But here’s something we could do. When you open a bank account, the bank could give you a, say, 128 GB flash drive of onetime-pad keys for you to use in your online banking. You plug that into your computer every time you want to log in, and it grabs the next part of key each time (there are some tricky technical details with synchronizing this that could, in practice, create some risk—but, done right, the risk would be small). If you are sending 10 megabytes of encrypted data each time (and that’s surely enough to encode a bank statement, though they might want to use a format other than PDF), you’ll get over 10,000 uses out of that flash drive. If you’ve been sending a lot of data and your key starts to run low, you can physically show up at the bank branch and get a new one.

Similarly, you could have onetime-pad keys on flash drives (more literal flash keys)given to you by the US government for tax filing, and another from each of your credit card issuers. For online purchases, the sellers would probably need to have their own onetime-pad keys set up with the banks and credit card companies, so that you send the info to VISA encrypted one way and they send it to the seller encrypted another way. Businesses with large sales volume would go through keys very quickly—but then, they can afford to keep buying new flash drives. Since each transaction should only take a few kilobytes, the cost of additional onetime-pad should be small compared to the cost of packing, shipping, and the items themselves. For larger purchases, business could even get in the habit of sending you a free flash key with each purchase so that future purchases are easier.

This would render paywalls very difficult to implement, but good riddance. Cryptocurrency would die, but even better riddance.It would be most inconvenient to deal with things like, well, writing a blog like this; needing to get a physical key from WordPress sounds like quite a hassle. People might actually just tolerate having their blogs hacked on occasion, because… who is going to hack your blog, and who really cares if your blog gets hacked?

Yes, this system is awkward and inconvenient compared to our current system. But unlike our current system, it is provably secure. Right now, it may seem like a remote possibility that someone would find an algorithm to prove P=NP and break encryption. But it could definitely happen, and if it did happen, it could happen quite suddenly. It would be far better to prepare for the worst than be unprepared when it’s too late.

Risk compensation is not a serious problem

Nov 28 JDN 2459547

Risk compensation. It’s one of those simple but counter-intuitive ideas that economists love, and it has been a major consideration in regulatory policy since the 1970s.

The idea is this: The risk we face in our actions is partly under our control. It requires effort to reduce risk, and effort is costly. So when an external source, such as a government regulation, reduces our risk, we will compensate by reducing the effort we expend, and thus our risk will decrease less, or maybe not at all. Indeed, perhaps we’ll even overcompensate and make our risk worse!

It’s often used as an argument against various kinds of safety efforts: Airbags will make people drive worse! Masks will make people go out and get infected!

The basic theory here is sound: Effort to reduce risk is costly, and people try to reduce costly things.

Indeed, it’s theoretically possible that risk compensation could yield the exact same risk, or even more risk than before—or at least, I wasn’t able to prove that for any possible risk profile and cost function it couldn’t happen.

But I wasn’t able to find any actual risk profiles or cost functions that would yield this result, even for a quite general form. Here, let me show you.

Let’s say there’s some possible harm H. There is also some probability that it will occur, which you can mitigate with some choice x. For simplicity let’s say that it’s one-to-one, so that your risk of H occurring is precisely 1-x. Since probabilities must be between 0 and 1, thus so must x.

Reducing that risk costs effort. I won’t say much about that cost, except to call it c(x) and assume the following:

(1) It is increasing: More effort reduces risk more and costs more than less effort.

(2) It is convex: Reducing risk from a high level to a low level (e.g. 0.9 to 0.8) costs less than reducing it from a low level to an even lower level (e.g. 0.2 to 0.1).

These both seem like eminently plausible—indeed, nigh-unassailable—assumptions. And they result in the following total expected cost (the opposite of your expected utility):

(1-x)H + c(x)

Now let’s suppose there’s some policy which will reduce your risk by a factor r, which must be between 0 and 1. Your cost then becomes:

r(1-x)H + c(x)

Minimizing this yields the following result:

rH = c'(x)

where c'(x) is the derivative of c(x). Since c(x) is increasing and convex, c'(x) is positive and increasing.

Thus, if I make r smaller—an external source of less risk—then I will reduce the optimal choice of x. This is risk compensation.

But have I reduced or increased the amount of risk?

The total risk is r(1-x); since r decreased and so did x, it’s not clear whether this went up or down. Indeed, it’s theoretically possible to have cost functions that would make it go up—but I’ve never seen one.

For instance, suppose we assume that c(x) = axb, where a and b are constants. This seems like a pretty general form, doesn’t it? To maintain the assumption that c(x) is increasing and convex, I need a > 0 and b > 1. (If 0 < b < 1, you get a function that’s increasing but concave. If b=1, you get a linear function and some weird corner solutions where you either expend no effort at all or all possible effort.)

Then I’m trying to minimize:

r(1-x)H + axb

This results in a closed-form solution for x:

x = (rH/ab)^(1/(b-1))

Since b>1, 1/(b-1) > 0.


Thus, the optimal choice of x is increasing in rH and decreasing in ab. That is, reducing the harm H or the overall risk r will make me put in less effort, while reducing the cost of effort (via either a or b) will make me put in more effort. These all make sense.

Can I ever increase the overall risk by reducing r? Let’s see.


My total risk r(1-x) is therefore:

r(1-x) = r[1-(rH/ab)^(1/(b-1))]

Can making r smaller ever make this larger?

Well, let’s compare it against the case when r=1. We want to see if there’s a case where it’s actually larger.

r[1-(rH/ab)^(1/(b-1))] > [1-(H/ab)^(1/(b-1))]

r – r^(1/(b-1)) (H/ab)^(1/(b-1)) > 1 – (H/ab)^(1/(b-1))

For this to be true, we would need r > 1, which would mean we didn’t reduce risk at all. Thus, reducing risk externally reduces total risk even after compensation.

Now, to be fair, this isn’t a fully general model. I had to assume some specific functional forms. But I didn’t assume much, did I?

Indeed, there is a fully general argument that externally reduced risk will never harm you. It’s quite simple.

There are three states to consider: In state A, you have your original level of risk and your original level of effort to reduce it. In state B, you have an externally reduced level of risk and your original level of effort. In state C, you have an externally reduced level of risk, and you compensate by reducing your effort.

Which states make you better off?

Well, clearly state B is better than state A: You get reduced risk at no cost to you.

Furthermore, state C must be better than state B: You voluntarily chose to risk-compensate precisely because it made you better off.

Therefore, as long as your preferences are rational, state C is better than state A.

Externally reduced risk will never make you worse off.

QED. That’s it. That’s the whole proof.

But I’m a behavioral economist, am I not? What if people aren’t being rational? Perhaps there’s some behavioral bias that causes people to overcompensate for reduced risks. That’s ultimately an empirical question.

So, what does the empirical data say? Risk compensation is almost never a serious problem in the real world. Measures designed to increase safety, lo and behold, actually increase safety. Removing safety regulations, astonishingly enough, makes people less safe and worse off.

If we ever do find a case where risk compensation is very large, then I guess we can remove that safety measure, or find some way to get people to stop overcompensating. But in the real world this has basically never happened.

It’s still a fair question whether any given safety measure is worth the cost: Implementing regulations can be expensive, after all. And while many people would like to think that “no amount of money is worth a human life”, nobody does—or should, or even can—act like that in the real world. You wouldn’t drive to work or get out of bed in the morning if you honestly believed that.

If it would cost $4 billion to save one expected life, it’s definitely not worth it. Indeed, you should still be able to see that even if you don’t think lives can be compared with other things—because $4 billion could save an awful lot of lives if you spent it more efficiently. (Probablyover a million, in fact, as current estimates of the marginal cost to save one life are about $2,300.) Inefficient safety interventions don’t just cost money—they prevent us from doing other, more efficient safety interventions.

And as for airbags and wearing masks to prevent COVID? Yes, definitely 100% worth it, as both interventions have already saved tens if not hundreds of thousands of lives.

Marriage and matching

Oct 10 JDN 2459498

When this post goes live, I will be married. We already had a long engagement, but it was made even longer by the pandemic: We originally planned to be married in October 2020, but then rescheduled for October 2021. Back then, we naively thought that the pandemic would be under control by now and we could have a wedding without COVID testing and masks. As it turns out, all we really accomplished was having a wedding where everyone is vaccinated—and the venue still required testing and masks. Still, it should at least be safer than it was last year, because everyone is vaccinated.

Since marriage is on my mind, I thought I would at least say a few things about the behavioral economics of marriage.

Now when I say the “economics of marriage” you likely have in mind things like tax laws that advantage (or disadvantage) marriage at different incomes, or the efficiency gains from living together that allow you to save money relative to each having your own place. That isn’t what I’m interested in.

What I want to talk about today is something a bit less economic, but more directly about marriage: the matching process by which one finds a spouse.

Economists would refer to marriage as a matching market. Unlike a conventional market where you can buy and sell arbitrary quantities, marriage is (usually; polygamy notwithstanding) a one-to-one arrangement. And unlike even the job market (which is also a one-to-one matching market), marriage usually doesn’t involve direct monetary payments (though in cultures with dowries it arguably does).

The usual model of a matching market has two separate pools: Employers and employees, for example. Typical heteronormative analyses of marriage have done likewise, separating men and women into different pools. But it turns out that sometimes men marry men and women marry women.

So what happens to our matching theory if we allow the pools to overlap?

I think the most sensible way to do it, actually, is to have only one pool: people who want to get married. Then, the way we capture the fact that most—but not all—men only want to marry women, and most—but not all—women only want to marry men is through the utililty function: Heterosexuals are simply those for whom a same-sex match would have very low utility. This would actually mean modeling marriage as a form of the stable roommates problem. (Oh my god, they were roommates!)

The stable roommates problem actually turns out to be harder than the conventional (heteronormative) stable marriage problem; in fact, while the hetero marriage problem (as I’ll henceforth call it) guarantees at least one stable matching for any preference ordering, the queer marriage problem can fail to have any stable solutions. While the hetero marriage problem ensures that everyone will eventually be matched to someone (if the number of men is equal to the number of women), sadly, the queer marriage problem can result in some people being forever rejected and forever alone. (There. Now you can blame the gays for ruining something: We ruined marriage matching.)

The queer marriage problem is actually more general than the hetero marriage problem: The hetero marriage problem is just the queer marriage problem with a particular utility function that assigns everyone strictly gendered preferences.

The best known algorithm for the queer marriage problem is an extension of the standard Gale-Shapley algorithm for the hetero marriage problem, with the same O(n^2) complexity in theory but a considerably more complicated implementation in practice. Honestly, while I can clearly grok the standard algorithm well enough to explain it to someone, I’m not sure I completely follow this one.

Then again, maybe preference orderings aren’t such a great approach after all. There has been a movement in economics toward what is called ordinal utility, where we speak only of preference orderings: You can like A more than B, but there’s no way to say how much more. But I for one am much more inclined toward cardinal utility, where differences have magnitudes: I like Coke more than Pepsi, and I like getting massaged more than being stabbed—and the difference between Coke and Pepsi is a lot smaller than the difference between getting massaged and being stabbed. (Many economists make much of the notion that even cardinal utility is “equivalent up to an affine transformation”, but I’ve got some news for you: So are temperature and time. All you are really doing by making an “affine transformation” is assigning a starting point and a unit of measurement. Temperature has a sensible absolute zero to use as a starting point, you say? Well, so does utility—not existing. )

With cardinal utility, I can offer you a very simple naive algorithm for finding an optimal match: Just try out every possible set of matchings and pick the one that has the highest total utility.

There are up to n!/((n/2)! 2^n) possible matchings to check, so this could take a long time—but it should work. I’m sure there’s a more efficient algorithm out there, but I don’t have the mental energy to figure it out at the moment. It might still be NP-hard, but I doubt it’s that hard.

Moreover, even once we find a utility-maximizing matching, that doesn’t guarantee a stable matching: Some people might still prefer to change even if it would end up reducing total utility.

Here’s a simple set of preferences for which that becomes an issue. In this table, the row is the person making the evaluation, and the columns are how much utility they assign to a match with each person. The total utility of a match is just the sum of utility from the two partners. The utility of “matching with yourself” is the utility of not being matched at all.


ABCD
A0321
B2031
C3201
D3210

Since everyone prefers every other person to not being matched at all (likely not true in real life!), the optimal matchings will always match everyone with someone. Thus, there are actually only 3 matchings to compare:

AB, CD: (3+2)+(1+1) = 7

AC, BD: (2+3)+(1+2) = 8

AD, BC: (1+3)+(3+2) = 9

The optimal matching, in utilitarian terms, is to match A with D and B with C. This yields total utility of 9.

But that’s not stable, because A prefers C over D, and C prefers A over B. So A and C would choose to pair up instead.

In fact, this set of preferences yields no stable matching at all. For anyone who is partnered with D, another member will rate them highest, and D’s partner will prefer that person over D (because D is everyone’s last choice).

There is always a nonempty set of utility-maximizing matchings. (There must be at least one, and could in principle have as many as there are possible matchings.) This actually just follows from the well-ordering property of the real numbers: Any finite set of reals has a maximum.

As this counterexample shows, there isn’t always a stable matching.

So here are a couple of interesting theoretical questions that this gives rise to:
1. If there is a stable matching, must it be in the set of utility-maximizing matchings?

2. If there is a stable matching, must all utility-maximizing matchings be stable?

Question 1 asks whether being stable implies being utility-maximizing.
Question 2 asks whether being utility-maximizing implies being stable—conditional on there being at least one stable possibility.

So, what is the answer to these questions? I don’t know! I’m actually not sure anyone does! We may have stumbled onto cutting-edge research!

I found a paper showing that these properties do not hold when you are doing the hetero marriage problem and you use multiplicative utility for matchings, but this is the queer marriage problem, and moreover I think multiplicative utility is the wrong approach. It doesn’t make sense to me to say that a marriage where one person is extremely happy and the other is indifferent to leaving is equivalent to a marriage where both partners are indifferent to leaving, but that’s what you’d get if you multiply 1*0 = 0. And if you allow negative utility from matchings (i.e. some people would prefer to remain single than to be in a particular match—which seems sensible enough, right?), since -1*-1 = 1, multiplicative utility yields the incredibly perverse result that two people who despise each other constitute a great match. Additive utility solves both problems: 1+0 = 1 and -1+-1 = -2, so, as we would hope, like + indifferent = like, and hate + hate = even more hate.

There is something to be said for the idea that two people who kind of like each other is better than one person ecstatic and the other miserable, but (1) that’s actually debatable, isn’t it? And (2) I think that would be better captured by somehow penalizing inequality in matches, not by using multiplicative utility.

Of course, I haven’t done a really thorough literature search, so other papers may exist. Nor have I spent a lot of time just trying to puzzle through this problem myself. Perhaps I should; this is sort of my job, after all. But even if I had the spare energy to invest heavily in research at the moment (which I sadly do not), I’ve been warned many times that pure theory papers are hard to publish, and I have enough trouble getting published as it is… so perhaps not.

My intuition is telling me that 2 is probably true but 1 is probably false. That is, I would guess that the set of stable matchings, when it’s not empty, is actually larger than the set of utility-maximizing matchings.

I think where I’m getting that intuition is from the properties of Pareto-efficient allocations: Any utility-maximizing allocation is necessarily Pareto-efficient, but many Pareto-efficient allocations are not utility-maximizing. A stable matching is sort of a strengthening of the notion of a Pareto-efficient allocation (though the problem of finding a Pareto-efficient matching for the general queer marriage problem has been solved).

But it is interesting to note that while a Pareto-efficient allocation must exist (typically there are many, but there must be at least one, because it’s impossible to have a cycle of Pareto improvements as long as preferences are transitive), it’s entirely possible to have no stable matchings at all.

On the quality of matches

Apr 11 JDN 2459316

Many situations in the real world involve matching people to other people: Dating, job hunting, college admissions, publishing, organ donation.

Alvin Roth won his Nobel Prize for his work on matching algorithms. I have nothing to contribute to improving his algorithm; what baffles me is that we don’t use it more often. It would probably feel too impersonal to use it for dating; but why don’t we use it for job hunting or college admissions? (We do use it for organ donation, and that has saved thousands of lives.)

In this post I will be looking at matching in a somewhat different way. Using a simple model, I’m going to illustrate some of the reasons why it is so painful and frustrating to try to match and keep getting rejected.

Suppose we have two sets of people on either side of a matching market: X and Y. I’ll denote an arbitrarily chosen person in X as x, and an arbitrarily chosen person in Y as y. There’s no reason the two sets can’t have overlap or even be the same set, but making them different sets makes the model as general as possible.

Each person in X wants to match with a person in Y, and vice-versa. But they don’t merely want to accept any possible match; they have preferences over which matches would be better or worse.

In general, we could say that people have some kind of utility function: Ux:Y->R and Uy:X->R that maps from possible match partners to the utility of such a match. But that gets very complicated very fast, because it raises the question of when you should keep searching, and when you should stop searching and accept what you have. (There’s a whole literature of search theory on this.)

For now let’s take the simplest possible case, and just say that there are some matches each person will accept, and some they will reject. This can be seen as a special case where the utility functions Ux and Uy always yield a result of 1 (accept) or 0 (reject).

This defines a set of acceptable partners for each person: A(x) is the set of partners x will accept: {y in Y|Ux(y) = 1} and A(y) is the set of partners y will accept: {x in X|Uy(x) = 1}

Then, the set of mutual matches than x can actually get is the set of ys that x wants, which also want x back: M(x) = {y in A(x)|x in A(y)}

Whereas, the set of mutual matches that y can actually get is the set of xs that y wants, which also want y back: M(y) = {x in A(y)|y in A(x)}

This relation is mutual by construction: If x is in M(y), then y is in M(x).

But this does not mean that the sets must be the same size.

For instance, suppose that there are three people in X, x1, x2, x3, and three people in Y, y1, y2, y3.

Let’s say that the acceptable matches are as follows:

A(x1) = {y1, y2, y3}

A(x2) = {y2, y3}

A(x3) = {y2, y3}

A(y1) = {x1,x2,x3}

A(y2) = {x1,x2}

A(y3) = {x1}

This results in the following mutual matches:

M(x1) = {y1, y2, y3}

M(y1) = {x1}

M(x2) = {y2}

M(y2) = {x1, x2}

M(x3) = {}

M(y3) = {x1}

x1 can match with whoever they like; everyone wants to match with them. x2 can match with y2. But x3, despite having the same preferences as x2, and being desired by y3, can’t find any mutual matches at all, because the one person who wants them is a person they don’t want.

y1 can only match with x1, but the same is true of y3. So they will be fighting over x1. As long as y2 doesn’t also try to fight over x1, x2 and y2 will be happy together. Yet x3 will remain alone.

Note that the number of mutual matches has no obvious relation with the number of individually acceptable partners. x2 and x3 had the same number of acceptable partners, but x2 found a mutual match and x3 didn’t. y1 was willing to accept more potential partners than y3, but got the same lone mutual match in the end. y3 was only willing to accept one partner, but will get a shot at x1, the one that everyone wants.

One thing is true: Adding another acceptable partner will never reduce your number of mutual matches, and removing one will never increase it. But often changing your acceptable partners doesn’t have any effect on your mutual matches at all.

Now let’s consider what it must feel like to be x1 versus x3.

For x1, the world is their oyster; they can choose whoever they want and be guaranteed to get a match. Life is easy and simple for them; all they have to do is decide who they want most and that will be it.

For x3, life is an endless string of rejection and despair. Every time they try to reach out to suggest a match with someone, they are rebuffed. They feel hopeless and alone. They feel as though no one would ever actually want them—even though in fact there is someone who wants them, it’s just not someone they were willing to consider.

This is of course a very simple and small-scale model; there are only six people in it, and they each only say yes or no. Yet already I’ve got x1 who feels like a rock star and x3 who feels utterly hopeless if not worthless.

In the real world, there are so many more people in the system that the odds that no one is in your mutual match set are negligible. Almost everyone has someone they can match with. But some people have many more matches than others, and that makes life much easier for the ones with many matches and much harder for the ones with fewer.

Moreover, search costs then become a major problem: Even knowing that in all probability there is a match for you somewhere out there, how do you actually find that person? (And that’s not even getting into the difficulty of recognizing a good match when you see it; in this simple model you know immediately, but in the real world it can take a remarkably long time.)

If we think of the acceptable partner sets as preferences, they may not be within anyone’s control; you want what you want. But if we instead characterize them as decisions, the results are quite differentand I think it’s easy to see them, if nothing else, as the decision of how high to set your standards.

This raises a question: When we are searching and not getting matches, should we lower our standards and add more people to our list of acceptable partners?

This simple model would seem to say that we should always do that—there’s no downside, since the worst that can happen is nothing. And x3 for instance would be much happier if they were willing to lower their standards and accept y1. (Indeed, if they did so, there would be a way to pair everyone off happily: x1 with y3, x2 with y2, and x3 with y1.)

But in the real world, searching is often costly: There is at least the involved, and often a literal application or submission fee; but perhaps worst of all is the crushing pain of rejection. Under those circumstances, adding another acceptable partner who is not a mutual match will actually make you worse off.

That’s pretty much what the job market has been for me for the last six months. I started out with the really good matches: GiveWell, the Oxford Global Priorities Institute, Purdue, Wesleyan, Eastern Michigan University. And after investing considerable effort into getting those applications right, I made it as far as an interview at all those places—but no further.

So I extended my search, applying to dozens more places. I’ve now applied to over 100 positions. I knew that most of them were not good matches, because there simply weren’t that many good matches to be found. And the result of all those 100 applications has been precisely 0 interviews. Lowering my standards accomplished absolutely nothing. I knew going in that these places were not a good fit for me—and it looks like they all agreed.

It’s possible that lowering my standards in some different way might have worked, but even this is not clear: I’ve already been willing to accept much lower salaries than a PhD in economics ought to entitle, and included positions in my search that are only for a year or two with no job security, and applied to far-flung locales across the globe that I don’t know if I’d really be willing to move to.

Honestly at this point I’ve only been using the following criteria: (1) At least vaguely related to my field (otherwise they wouldn’t want me anyway), (2) a higher salary than I currently get as a grad student (otherwise why bother?), (3) a geographic location where homosexuality is not literally illegal and an institution that doesn’t actively discriminate against LGBT employees (this rules out more than you’d think—there are at least three good postings I didn’t apply to on these grounds), (4) in a region that speaks a language I have at least some basic knowledge of (i.e. preferably English, but also allowing Spanish, French, German, or Japanese) (5) working conditions that don’t involve working more than 40 hours per week (which has severely detrimental health effects, even ignoring my disability which would compound the effects), and (6) not working for a company that is implicated in large-scale criminal activity (as a remarkable number of major banks have in fact been implicated). I don’t feel like these are unreasonably high standards, and yet so far I have failed to land a match.

What’s more, the entire process has been emotionally devastating. While others seem to be suffering from pandemic burnout, I don’t think I’ve made it that far; I think I’d be just as burnt out even if there were no pandemic, simply from how brutal the job market has been.

Why does rejection hurt so much? Why does being turned down for a date, or a job, or a publication feel so utterly soul-crushing? When I started putting together this model I had hoped that thinking of it in terms of match-sets might actually help reduce that feeling, but instead what happened is that it offered me a way of partly explaining that feeling (much as I did in my post on Bayesian Impostor Syndrome).

What is the feeling of rejection? It is the feeling of expending search effort to find someone in your acceptable partner set—and then learning that you were not in their acceptable partner set, and thus you have failed to make a mutual match.

I said earlier that x1 feels like a rock star and x3 feels hopeless. This is because being present in someone else’s acceptable partner set is a sign of status—the more people who consider you an acceptable partner, the more you are “worth” in some sense. And when it’s something as important as a romantic partner or a career, that sense of “worth” is difficult to circumscribe into a particular domain; it begins to bleed outward into a sense of your overall self-worth as a human being.

Being wanted by someone you don’t want makes you feel superior, like they are “beneath” you; but wanting someone who doesn’t want you makes you feel inferior, like they are “above” you. And when you are applying for jobs in a market with a Beveridge Curve as skewed as ours, or trying to get a paper or a book published in a world flooded with submissions, you end up with a lot more cases of feeling inferior than cases of feeling superior. In fact, I even applied for a few jobs that I felt were “beneath” my level—they didn’t take me either, perhaps because they felt I was overqualified.

In such circumstances, it’s hard not to feel like I am the problem, like there is something wrong with me. Sometimes I can convince myself that I’m not doing anything wrong and the market is just exceptionally brutal this year. But I really have no clear way of distinguishing that hypothesis from the much darker possibility that I have done something terribly wrong that I cannot correct and will continue in this miserable and soul-crushing fruitless search for months or even years to come. Indeed, I’m not even sure it’s actually any better to know that you did everything right and still failed; that just makes you helpless instead of defective. It might be good for my self-worth to know that I did everything right; but it wouldn’t change the fact that I’m in a miserable situation I can’t get out of. If I knew I were doing something wrong, maybe I could actually fix that mistake in the future and get a better outcome.

As it is, I guess all I can do is wait for more opportunities and keep trying.