The injustice of talent

Sep 4 JDN 2459827

Consider the following two principles of distributive justice.

A: People deserve to be rewarded in proportion to what they accomplish.

B: People deserve to be rewarded in proportion to the effort they put in.

Both principles sound pretty reasonable, don’t they? They both seem like sensible notions of fairness, and I think most people would broadly agree with both them.

This is a problem, because they are mutually contradictory. We cannot possibly follow them both.

For, as much as our society would like to pretend otherwise—and I think this contradiction is precisely why our society would like to pretend otherwise—what you accomplish is not simply a function of the effort you put in.

Don’t get me wrong; it is partly a function of the effort you put in. Hard work does contribute to success. But it is neither sufficient, nor strictly necessary.

Rather, success is a function of three factors: Effort, Environment, and Talent.

Effort is the work you yourself put in, and basically everyone agrees you deserve to be rewarded for that.

Environment includes all the outside factors that affect you—including both natural and social environment. Inheritance, illness, and just plain luck are all in here, and there is general, if not universal, agreement that society should make at least some efforts to minimize inequality created by such causes.

And then, there is talent. Talent includes whatever capacities you innately have. It could be strictly genetic, or it could be acquired in childhood or even in the womb. But by the time you are an adult and responsible for your own life, these factors are largely fixed and immutable. This includes things like intelligence, disability, even height. The trillion-dollar question is: How much should we reward talent?

For talent clearly does matter. I will never swim like Michael Phelps, run like Usain Bolt, or shoot hoops like Steph Curry. It doesn’t matter how much effort I put in, how many hours I spend training—I will never reach their level of capability. Never. It’s impossible. I could certainly improve from my current condition; perhaps it would even be good for me to do so. But there are certain hard fundamental constraints imposed by biology that give them more potential in these skills than I will ever have.

Conversely, there are likely things I can do that they will never be able to do, though this is less obvious. Could Michael Phelps never be as good a programmer or as skilled a mathematician as I am? He certainly isn’t now. Maybe, with enough time, enough training, he could be; I honestly don’t know. But I can tell you this: I’m sure it would be harder for him than it was for me. He couldn’t breeze through college-level courses in differential equations and quantum mechanics the way I did. There is something I have that he doesn’t, and I’m pretty sure I was born with it. Call it spatial working memory, or mathematical intuition, or just plain IQ. Whatever it is, math comes easy to me in not so different a way from how swimming comes easy to Michael Phelps. I have talent for math; he has talent for swimming.

Moreover, these are not small differences. It’s not like we all come with basically the same capabilities with a little bit of variation that can be easily washed out by effort. We’d like to believe that—we have all sorts of cultural tropes that try to inculcate that belief in us—but it’s obviously not true. The vast majority of quantum physicists are people born with high IQ. The vast majority of pro athletes are people born with physical prowess. The vast majority of movie stars are people born with pretty faces. For many types of jobs, the determining factor seems to be talent.

This isn’t too surprising, actually—even if effort matters a lot, we would still expect talent to show up as the determining factor much of the time.

Let’s go back to that contest function model I used to analyze the job market awhile back (the one that suggests we spend way too much time and money in the hiring process). This time let’s focus on the perspective of the employees themselves.

Each employee has a level of talent, h. Employee X has talent hx and exerts effort x, producing output of a quality that is the product of these: hx x. Similarly, employee Z has talent hz and exerts effort z, producing output hz z.

Then, there’s a certain amount of luck that factors in. The most successful output isn’t necessarily the best, or maybe what should have been the best wasn’t because some random circumstance prevailed. But we’ll say that the probability an individual succeeds is proportional to the quality of their output.

So the probability that employee X succeeds is: hx x / ( hx x + hz z)

I’ll skip the algebra this time (if you’re interested you can look back at that previous post), but to make a long story short, in Nash equilibrium the two employees will exert exactly the same amount of effort.

Then, which one succeeds will be entirely determined by talent; because x = z, the probability that X succeeds is hx / ( hx + hz).

It’s not that effort doesn’t matter—it absolutely does matter, and in fact in this model, with zero effort you get zero output (which isn’t necessarily the case in real life). It’s that in equilibrium, everyone is exerting the same amount of effort; so what determines who wins is innate talent. And I gotta say, that sounds an awful lot like how professional sports works. It’s less clear whether it applies to quantum physicists.

But maybe we don’t really exert the same amount of effort! This is true. Indeed, it seems like actually effort is easier for people with higher talent—that the same hour spent running on a track is easier for Usain Bolt than for me, and the same hour studying calculus is easier for me than it would be for Usain Bolt. So in the end our equilibrium effort isn’t the same—but rather than compensating, this effect only serves to exaggerate the difference in innate talent between us.

It’s simple enough to generalize the model to allow for such a thing. For instance, I could say that the cost of producing a unit of effort is inversely proportional to your talent; then instead of hx / ( hx + hz ), in equilibrium the probability of X succeeding would become hx2 / ( hx2 + hz2). The equilibrium effort would also be different, with x > z if hx > hz.

Once we acknowledge that talent is genuinely important, we face an ethical problem. Do we want to reward people for their accomplishment (A), or for their effort (B)? There are good cases to be made for each.

Rewarding for accomplishment, which we might call meritocracy,will tend to, well, maximize accomplishment. We’ll get the best basketball players playing basketball, the best surgeons doing surgery. Moreover, accomplishment is often quite easy to measure, even when effort isn’t.

Rewarding for effort, which we might call egalitarianism, will give people the most control over their lives, and might well feel the most fair. Those who succeed will be precisely those who work hard, even if they do things they are objectively bad at. Even people who are born with very little talent will still be able to make a living by working hard. And it will ensure that people do work hard, which meritocracy can actually fail at: If you are extremely talented, you don’t really need to work hard because you just automatically succeed.

Capitalism, as an economic system, is very good at rewarding accomplishment. I think part of what makes socialism appealing to so many people is that it tries to reward effort instead. (Is it very good at that? Not so clear.)

The more extreme differences are actually in terms of disability. There’s a certain baseline level of activities that most people are capable of, which we think of as “normal”: most people can talk; most people can run, if not necessarily very fast; most people can throw a ball, if not pitch a proper curveball. But some people can’t throw. Some people can’t run. Some people can’t even talk. It’s not that they are bad at it; it’s that they are literally not capable of it. No amount of effort could have made Stephen Hawking into a baseball player—not even a bad one.

It’s these cases when I think egalitarianism becomes most appealing: It just seems deeply unfair that people with severe disabilities should have to suffer in poverty. Even if they really can’t do much productive work on their own, it just seems wrong not to help them, at least enough that they can get by. But capitalism by itself absolutely would not do that—if you aren’t making a profit for the company, they’re not going to keep you employed. So we need some kind of social safety net to help such people. And it turns out that such people are quite numerous, and our current system is really not adequate to help them.

But meritocracy has its pull as well. Especially when the job is really important—like surgery, not so much basketball—we really want the highest quality work. It’s not so important whether the neurosurgeon who removes your tumor worked really hard at it or found it a breeze; what we care about is getting that tumor out.

Where does this leave us?

I think we have no choice but to compromise, on both principles. We will reward both effort and accomplishment, to greater or lesser degree—perhaps varying based on circumstances. We will never be able to entirely reward accomplishment or entirely reward effort.

This is more or less what we already do in practice, so why worry about it? Well, because we don’t like to admit that it’s what we do in practice, and a lot of problems seem to stem from that.

We have people acting like billionaires are such brilliant, hard-working people just because they’re rich—because our society rewards effort, right? So they couldn’t be so successful if they didn’t work so hard, right? Right?

Conversely, we have people who denigrate the poor as lazy and stupid just because they are poor. Because it couldn’t possibly be that their circumstances were worse than yours? Or hey, even if they are genuinely less talented than you—do less talented people deserve to be homeless and starving?

We tell kids from a young age, “You can be whatever you want to be”, and “Work hard and you’ll succeed”; and these things simply aren’t true. There are limitations on what you can achieve through effort—limitations imposed by your environment, and limitations imposed by your innate talents.

I’m not saying we should crush children’s dreams; I’m saying we should help them to build more realistic dreams, dreams that can actually be achieved in the real world. And then, when they grow up, they either will actually succeed, or when they don’t, at least they won’t hate themselves for failing to live up to what you told them they’d be able to do.

If you were wondering why Millennials are so depressed, that’s clearly a big part of it: We were told we could be and do whatever we wanted if we worked hard enough, and then that didn’t happen; and we had so internalized what we were told that we thought it had to be our fault that we failed. We didn’t try hard enough. We weren’t good enough. I have spent years feeling this way—on some level I do still feel this way—and it was not because adults tried to crush my dreams when I was a child, but on the contrary because they didn’t do anything to temper them. They never told me that life is hard, and people fail, and that I would probably fail at my most ambitious goals—and it wouldn’t be my fault, and it would still turn out okay.

That’s really it, I think: They never told me that it’s okay not to be wildly successful. They never told me that I’d still be good enough even if I never had any great world-class accomplishments. Instead, they kept feeding me the lie that I would have great world-class accomplishments; and then, when I didn’t, I felt like a failure and I hated myself. I think my own experience may be particularly extreme in this regard, but I know a lot of other people in my generation who had similar experiences, especially those who were also considered “gifted” as children. And we are all now suffering from depression, anxiety, and Impostor Syndrome.

All because nobody wanted to admit that talent, effort, and success are not the same thing.

Scalability and inequality

May 15 JDN 2459715

Why are some molecules (e.g. DNA) billions of times larger than others (e.g. H2O), but all atoms are within a much narrower range of sizes (only a few hundred)?

Why are some animals (e.g. elephants) millions of times as heavy as other (e.g. mice), but their cells are basically the same size?

Why does capital income vary so much more (factors of thousands or millions) than wages (factors of tens or hundreds)?

These three questions turn out to have much the same answer: Scalability.

Atoms are not very scalable: Adding another proton to a nucleus causes interactions with all the other protons, which makes the whole atom unstable after a hundred protons or so. But molecules, particularly organic polymers such as DNA, are tremendously scalable: You can add another piece to one end without affecting anything else in the molecule, and keep on doing that more or less forever.

Cells are not very scalable: Even with the aid of active transport mechanisms and complex cellular machinery, a cell’s functionality is still very much limited by its surface area. But animals are tremendously scalable: The same exponential growth that got you from a zygote to a mouse only needs to continue a couple years longer and it’ll get you all the way to an elephant. (A baby elephant, anyway; an adult will require a dozen or so years—remarkably comparable to humans, in fact.)

Labor income is not very scalable: There are only so many hours in a day, and the more hours you work the less productive you’ll be in each additional hour. But capital income is perfectly scalable: We can add another digit to that brokerage account with nothing more than a few milliseconds of electronic pulses, and keep doing that basically forever (due to the way integer storage works, above 2^63 it would require special coding, but it can be done; and seeing as that’s over 9 quintillion, it’s not likely to be a problem any time soon—though I am vaguely tempted to write a short story about an interplanetary corporation that gets thrown into turmoil by an integer overflow error).

This isn’t just an effect of our accounting either. Capital is scalable in a way that labor is not. When your contribution to production is owning a factory, there’s really nothing to stop you from owning another factory, and then another, and another. But when your contribution is working at a factory, you can only work so hard for so many hours.

When a phenomenon is highly scalable, it can take on a wide range of outcomes—as we see in molecules, animals, and capital income. When it’s not, it will only take on a narrow range of outcomes—as we see in atoms, cells, and labor income.

Exponential growth is also part of the story here: Animals certainly grow exponentially, and so can capital when invested; even some polymers function that way (e.g. under polymerase chain reaction). But I think the scalability is actually more important: Growing rapidly isn’t so useful if you’re going to immediately be blocked by a scalability constraint. (This actually relates to the difference between r- and K- evolutionary strategies, and offers further insight into the differences between mice and elephants.) Conversely, even if you grow slowly, given enough time, you’ll reach whatever constraint you’re up against.

Indeed, we can even say something about the probability distribution we are likely to get from random processes that are scalable or non-scalable.

A non-scalable random process will generally converge toward the familiar normal distribution, a “bell curve”:

[Image from Wikipedia: By Inductiveload – self-made, Mathematica, Inkscape, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3817954]

The normal distribution has most of its weight near the middle; most of the population ends up near there. This is clearly the case for labor income: Most people are middle class, while some are poor and a few are rich.

But a scalable random process will typically converge toward quite a different distribution, a Pareto distribution:

[Image from Wikipedia: By Danvildanvil – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=31096324]

A Pareto distribution has most of its weight near zero, but covers an extremely wide range. Indeed it is what we call fat tailed, meaning that really extreme events occur often enough to have a meaningful effect on the average. A Pareto distribution has most of the people at the bottom, but the ones at the top are really on top.

And indeed, that’s exactly how capital income works: Most people have little or no capital income (indeed only about half of Americans and only a third(!) of Brits own any stocks at all), while a handful of hectobillionaires make utterly ludicrous amounts of money literally in their sleep.

Indeed, it turns out that income in general is pretty close to distributed normally (or maybe lognormally) for most of the income range, and then becomes very much Pareto at the top—where nearly all the income is capital income.

This fundamental difference in scalability between capital and labor underlies much of what makes income inequality so difficult to fight. Capital is scalable, and begets more capital. Labor is non-scalable, and we only have to much to give.

It would require a radically different system of capital ownership to really eliminate this gap—and, well, that’s been tried, and so far, it hasn’t worked out so well. Our best option is probably to let people continue to own whatever amounts of capital, and then tax the proceeds in order to redistribute the resulting income. That certainly has its own downsides, but they seem to be a lot more manageable than either unfettered anarcho-capitalism or totalitarian communism.

The fragility of encryption

Feb 13 JDN 2459620

I said in last week’s post that most of the world’s online security rests upon public-key encryption. It’s how we do our shopping, our banking, and paying our taxes.

Yet public-key encryption has an Achilles’ Heel. It relies entirely on the assumption that, even knowing someone’s public key, you can’t possibly figure out what their private key is. Yet obviously the two must be deeply connected: In order for my private key to decrypt all messages that are encrypted using my public key, they must, in a deep sense, contain the same information. There must be a mathematical operation that will translate from one to the other—and that mathematical operation must be invertible.

What we have been relying on to keep public-key encryption secure is the notion of a one-way function: A function that is easy to compute, but hard to invert. A typical example is multiplying two numbers: Multiplication is a basic computing operation that is extremely fast, even for numbers with thousands of digits; but factoring a number into its prime factors is far more difficult, and currently cannot be done in any reasonable amount of time for numbers that are more than a hundred digits long.


“Easy” and “hard” in what sense? The usual criterion is in polynomial time.

Say you have an input that is n bits long—i.e. n digits, when expressed as a binary number, all 0s and 1s. A function that can be computed in time proportional to n is linear time; if it can only be done in time proportional to n2, that is quadratic time; n3 would be cubic time. All of these are examples of polynomial time.

But if instead the time required were 2n, that would be exponential time. 3n and 1.5n would also be exponential time.

This is significant because of how much faster exponential functions grow relative to polynomial functions, for large values of n. For example, let’s compare n3 with2n. When n=3, the polynomial is actually larger: n3=27 but 2n=8. At n=10 they are nearly equal: n3=1000 but 2n=1024. But by n=20, n3 is only 8000 while 2n is over 1 million. At n=100, n3is a manageable (for a modern computer) 1 million, while 2nis a staggering 1030; that’s a million trillion trillion.

You may see that there is already something a bit fishy about this: There are lots of different ways to be polynomial and lots of different ways to be exponential. Linear time n is clearly fast, and for many types of problems it seems unlikely one could do any better. But is n100 time really all that fast? It’s still polynomial. It doesn’t take a large exponential base to make for very fast growth—2 doesn’t seem that big, after all, and when dealing with binary digits it shows up quite naturally. But while 2n grows very fast even for reasonably-sized n, 1.0000001n grows slower than most polynomials—even linear!—for quite a long range before eventually becoming very fast growth when n is in the hundreds of millions. Yet it is still exponential.


So, why do we use these categories? Well, computer scientists and mathematicians have discovered that many types of problems that seem different can in fact be translated into one another, so that solving one would solve the other. For instance, you can easily convert between the Boolean satisfiability problem and the subset-sum problem or the travelling salesman problem. These conversions always take time that is a polynomial in n(usually somewhere between linear and quadratic, as it turns out). This has allowed to build complexity classes, classes of problem such that any problem can be converted to any other in polynomial time or better.

Problems that can be solved in polynomial timeare in class P, for polynomial.

Problems that can be checked—but not necessarily solved—in polynomial time are in class NP, which actually stands for “non-deterministic polynomial” (not a great name, to be honest). Given a problem in NP, you may not be able to come up with a valid answer in polynomial time. But if someone gave you an answer, you could tell in polynomial time whether or not that answer was valid.

Boolean satisfiability (often abbreviated SAT) is the paradigmatic NP problem: Given a Boolean formula like (A OR B OR C) AND (¬A OR D OR E) AND (¬D OR ¬C OR B) and so on, it isn’t a simple task to determine if there’s some assignment of the variables A, B, C, D, E that makes it all true. But if someone handed you such an assignment, say (¬A, B, ¬C, D, E), you could easily check that it does in fact satisfy the expression. It turns out that in fact SAT is what’s called NP-complete: Any NP problem can be converted into SAT in polynomial time.

This is important because in order to be useful as an encryption system, we need our one-way function to be in class P (otherwise, we couldn’t compute it quickly). Yet, by definition, this means its inverse must be in class NP.


Thus, simply because it is easy to multiply two numbers, I know for sure that factoring numbers must be in NP: All I have to do to verify that a factorization is correct is multiply the numbers. Since the way to get a public key from a private key is (essentially) to multiply two numbers, this means that getting a private key from a public key is equivalent to factorization—which means it must be in NP.

This would be fine if we knew some problems in NP that could never, ever be solved in polynomial time. We could just pick one of those and make it the basis of our encryption system. Yet in fact, we do not know any such problems—indeed, we are not even certain they exist.

One of the biggest unsolved problems in mathematics is P versus NP, which asks the seemingly-simple question: “Are P and NP really different classes?” It certainly seems like they are—there are problems like multiplying numbers, or even finding out whether a number is prime, that are clearly in P, and there are other problems, like SAT, that are definitely in NP but seem to not be in P. But in fact no one has ever been able to prove that P ≠ NP. Despite decades of attempts, no one has managed it.

To be clear, no one has managed to prove that P = NP, either. (Doing either one would win you a Clay Millennium Prize.) But since the conventional wisdom among most mathematicians is that P ≠ NP (99% of experts polled in 2019 agreed), I actually think this possibility has not been as thoroughly considered.

Vague heuristic arguments are often advanced for why P ≠ NP, such as this one by Scott Aaronson: “If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found.”

That really doesn’t follow at all. Doing something in polynomial time is not the same thing as doing it instantly.

Say for instance someone finds an algorithm to solve SAT in n6 time. Such an algorithm would conclusively prove P = NP. n6; that’s a polynomial, all right. But it’s a big polynomial. The time required to check a SAT solution is linear in the number of terms in the Boolean formula—just check each one, see if it works. But if it turns out we could generate such a solution in time proportional to the sixth power of the number of terms, that would still mean it’s a lot easier to check than it is to solve. A lot easier.

I guess if your notion of a “fundamental gap” rests upon the polynomial/exponential distinction, you could say that’s not “fundamental”. But this is a weird notion to say the least. If n = 1 million can be checked in 1 million processor cycles (that is, milliseconds, or with some overhead, seconds), but only solved in 1036 processor cycles (that is, over a million trillion years), that sounds like a pretty big difference to me.

Even an n2 algorithm wouldn’t show there’s no difference. The difference between n and n2, is, well, a factor of n. So finding the answer could still take far longer than verifying it. This would be worrisome for encryption, however: Even a million times as long isn’t really that great actually. It means that if something would work in a few seconds for an ordinary computer (the timescale we want for our online shopping and banking), then, say, the Russian government with a supercomputer a thousand times better could spend half an hour on it. That’s… a problem. I guess if breaking our encryption was only feasible for superpower national intelligence agencies, it wouldn’t be a complete disaster. (Indeed, many people suspect that the NSA and FSB have already broken most of our encryption, and I wouldn’t be surprised to learn that’s true.)

But what I really want to say here is that since it may be true that P=NP—we don’t know it isn’t, even if most people strongly suspect as much—we should be trying to find methods of encryption that would remain secure even if that turns out to be the case. (There’s another reason as well: Quantum computers are known to be able to factor numbers in polynomial time—though it may be awhile before they get good enough to do so usefully.)

We do know two such methods, as a matter of fact. There is quantum encryption, which, like most things quantum, is very esoteric and hard to explain. (Maybe I’ll get to that in another post.) It also requires sophisticated, expensive hardware that most people are unlikely to be able to get.

And then there is onetime pad encryption, which is shockingly easy to explain and can be implemented on any home computer.

The problem with substitution ciphers is that you can look for patterns. You can do this because the key ultimately contains only so much information, based on how long it is. If the key contains 100 bits and the message contains 10,000 bits, at some point you’re going to have to repeat some kind of pattern—even if it’s a very complex, sophisticated one like the Enigma machine.

Well, what if the key were as long as the message? What if a 10,000 bit message used a 10,000 bit key? Then you could substitute every single letter for a different symbol each time. What if, on its first occurrence, E is D, but then it’s Q, and then it’s T—and each of these was generated randomly and independently each time? Then it can’t be broken by searching for patterns—because there are no patterns to be found.

Mathematically, it would look like this: Take each bit of the plaintext, and randomly generate another bit for the key. Add the key bit to the plaintext bit (technically you want to use bitwise XOR, but that’s basically adding), and you’ve got the ciphertext bit. At the other end, subtracting out each key bit will give back each plaintext bit. Provided you can generate random numbers efficiently, this will be fast to encrypt and decrypt—but literally impossible to break without the key.

Indeed, onetime-pad encryption is so secure that it is a proven mathematical theorem that there is no way to break it. Even if you had such staggering computing power that you could try every possible key, you wouldn’t even know when you got the right one—because every possible message can be generated from a given ciphertext, using some key. Even if you knew some parts of the message already, you would have no way to figure out any of the rest—because there are no patterns linking the two.

The downside is that you need to somehow send the keys. As I said in last week’s post, if you have a safe way to send the key, why can’t you send the message that way? Well, there is still an advantage, actually, and that’s speed.

If there is a slow, secure way to send information (e.g. deliver it physically by armed courier), and a fast, insecure way (e.g. send it over the Internet), then you can send the keys in advance by the slow, safe way and then send ciphertexts later the fast, risky way. Indeed, this kind of courier-based onetime-pad encryption is how the red phone” (really a fax line) linking the White House to the Kremlin works.

Now, for online banking, we’re not going to be able to use couriers. But here’s something we could do. When you open a bank account, the bank could give you a, say, 128 GB flash drive of onetime-pad keys for you to use in your online banking. You plug that into your computer every time you want to log in, and it grabs the next part of key each time (there are some tricky technical details with synchronizing this that could, in practice, create some risk—but, done right, the risk would be small). If you are sending 10 megabytes of encrypted data each time (and that’s surely enough to encode a bank statement, though they might want to use a format other than PDF), you’ll get over 10,000 uses out of that flash drive. If you’ve been sending a lot of data and your key starts to run low, you can physically show up at the bank branch and get a new one.

Similarly, you could have onetime-pad keys on flash drives (more literal flash keys)given to you by the US government for tax filing, and another from each of your credit card issuers. For online purchases, the sellers would probably need to have their own onetime-pad keys set up with the banks and credit card companies, so that you send the info to VISA encrypted one way and they send it to the seller encrypted another way. Businesses with large sales volume would go through keys very quickly—but then, they can afford to keep buying new flash drives. Since each transaction should only take a few kilobytes, the cost of additional onetime-pad should be small compared to the cost of packing, shipping, and the items themselves. For larger purchases, business could even get in the habit of sending you a free flash key with each purchase so that future purchases are easier.

This would render paywalls very difficult to implement, but good riddance. Cryptocurrency would die, but even better riddance.It would be most inconvenient to deal with things like, well, writing a blog like this; needing to get a physical key from WordPress sounds like quite a hassle. People might actually just tolerate having their blogs hacked on occasion, because… who is going to hack your blog, and who really cares if your blog gets hacked?

Yes, this system is awkward and inconvenient compared to our current system. But unlike our current system, it is provably secure. Right now, it may seem like a remote possibility that someone would find an algorithm to prove P=NP and break encryption. But it could definitely happen, and if it did happen, it could happen quite suddenly. It would be far better to prepare for the worst than be unprepared when it’s too late.

Risk compensation is not a serious problem

Nov 28 JDN 2459547

Risk compensation. It’s one of those simple but counter-intuitive ideas that economists love, and it has been a major consideration in regulatory policy since the 1970s.

The idea is this: The risk we face in our actions is partly under our control. It requires effort to reduce risk, and effort is costly. So when an external source, such as a government regulation, reduces our risk, we will compensate by reducing the effort we expend, and thus our risk will decrease less, or maybe not at all. Indeed, perhaps we’ll even overcompensate and make our risk worse!

It’s often used as an argument against various kinds of safety efforts: Airbags will make people drive worse! Masks will make people go out and get infected!

The basic theory here is sound: Effort to reduce risk is costly, and people try to reduce costly things.

Indeed, it’s theoretically possible that risk compensation could yield the exact same risk, or even more risk than before—or at least, I wasn’t able to prove that for any possible risk profile and cost function it couldn’t happen.

But I wasn’t able to find any actual risk profiles or cost functions that would yield this result, even for a quite general form. Here, let me show you.

Let’s say there’s some possible harm H. There is also some probability that it will occur, which you can mitigate with some choice x. For simplicity let’s say that it’s one-to-one, so that your risk of H occurring is precisely 1-x. Since probabilities must be between 0 and 1, thus so must x.

Reducing that risk costs effort. I won’t say much about that cost, except to call it c(x) and assume the following:

(1) It is increasing: More effort reduces risk more and costs more than less effort.

(2) It is convex: Reducing risk from a high level to a low level (e.g. 0.9 to 0.8) costs less than reducing it from a low level to an even lower level (e.g. 0.2 to 0.1).

These both seem like eminently plausible—indeed, nigh-unassailable—assumptions. And they result in the following total expected cost (the opposite of your expected utility):

(1-x)H + c(x)

Now let’s suppose there’s some policy which will reduce your risk by a factor r, which must be between 0 and 1. Your cost then becomes:

r(1-x)H + c(x)

Minimizing this yields the following result:

rH = c'(x)

where c'(x) is the derivative of c(x). Since c(x) is increasing and convex, c'(x) is positive and increasing.

Thus, if I make r smaller—an external source of less risk—then I will reduce the optimal choice of x. This is risk compensation.

But have I reduced or increased the amount of risk?

The total risk is r(1-x); since r decreased and so did x, it’s not clear whether this went up or down. Indeed, it’s theoretically possible to have cost functions that would make it go up—but I’ve never seen one.

For instance, suppose we assume that c(x) = axb, where a and b are constants. This seems like a pretty general form, doesn’t it? To maintain the assumption that c(x) is increasing and convex, I need a > 0 and b > 1. (If 0 < b < 1, you get a function that’s increasing but concave. If b=1, you get a linear function and some weird corner solutions where you either expend no effort at all or all possible effort.)

Then I’m trying to minimize:

r(1-x)H + axb

This results in a closed-form solution for x:

x = (rH/ab)^(1/(b-1))

Since b>1, 1/(b-1) > 0.


Thus, the optimal choice of x is increasing in rH and decreasing in ab. That is, reducing the harm H or the overall risk r will make me put in less effort, while reducing the cost of effort (via either a or b) will make me put in more effort. These all make sense.

Can I ever increase the overall risk by reducing r? Let’s see.


My total risk r(1-x) is therefore:

r(1-x) = r[1-(rH/ab)^(1/(b-1))]

Can making r smaller ever make this larger?

Well, let’s compare it against the case when r=1. We want to see if there’s a case where it’s actually larger.

r[1-(rH/ab)^(1/(b-1))] > [1-(H/ab)^(1/(b-1))]

r – r^(1/(b-1)) (H/ab)^(1/(b-1)) > 1 – (H/ab)^(1/(b-1))

For this to be true, we would need r > 1, which would mean we didn’t reduce risk at all. Thus, reducing risk externally reduces total risk even after compensation.

Now, to be fair, this isn’t a fully general model. I had to assume some specific functional forms. But I didn’t assume much, did I?

Indeed, there is a fully general argument that externally reduced risk will never harm you. It’s quite simple.

There are three states to consider: In state A, you have your original level of risk and your original level of effort to reduce it. In state B, you have an externally reduced level of risk and your original level of effort. In state C, you have an externally reduced level of risk, and you compensate by reducing your effort.

Which states make you better off?

Well, clearly state B is better than state A: You get reduced risk at no cost to you.

Furthermore, state C must be better than state B: You voluntarily chose to risk-compensate precisely because it made you better off.

Therefore, as long as your preferences are rational, state C is better than state A.

Externally reduced risk will never make you worse off.

QED. That’s it. That’s the whole proof.

But I’m a behavioral economist, am I not? What if people aren’t being rational? Perhaps there’s some behavioral bias that causes people to overcompensate for reduced risks. That’s ultimately an empirical question.

So, what does the empirical data say? Risk compensation is almost never a serious problem in the real world. Measures designed to increase safety, lo and behold, actually increase safety. Removing safety regulations, astonishingly enough, makes people less safe and worse off.

If we ever do find a case where risk compensation is very large, then I guess we can remove that safety measure, or find some way to get people to stop overcompensating. But in the real world this has basically never happened.

It’s still a fair question whether any given safety measure is worth the cost: Implementing regulations can be expensive, after all. And while many people would like to think that “no amount of money is worth a human life”, nobody does—or should, or even can—act like that in the real world. You wouldn’t drive to work or get out of bed in the morning if you honestly believed that.

If it would cost $4 billion to save one expected life, it’s definitely not worth it. Indeed, you should still be able to see that even if you don’t think lives can be compared with other things—because $4 billion could save an awful lot of lives if you spent it more efficiently. (Probablyover a million, in fact, as current estimates of the marginal cost to save one life are about $2,300.) Inefficient safety interventions don’t just cost money—they prevent us from doing other, more efficient safety interventions.

And as for airbags and wearing masks to prevent COVID? Yes, definitely 100% worth it, as both interventions have already saved tens if not hundreds of thousands of lives.

Marriage and matching

Oct 10 JDN 2459498

When this post goes live, I will be married. We already had a long engagement, but it was made even longer by the pandemic: We originally planned to be married in October 2020, but then rescheduled for October 2021. Back then, we naively thought that the pandemic would be under control by now and we could have a wedding without COVID testing and masks. As it turns out, all we really accomplished was having a wedding where everyone is vaccinated—and the venue still required testing and masks. Still, it should at least be safer than it was last year, because everyone is vaccinated.

Since marriage is on my mind, I thought I would at least say a few things about the behavioral economics of marriage.

Now when I say the “economics of marriage” you likely have in mind things like tax laws that advantage (or disadvantage) marriage at different incomes, or the efficiency gains from living together that allow you to save money relative to each having your own place. That isn’t what I’m interested in.

What I want to talk about today is something a bit less economic, but more directly about marriage: the matching process by which one finds a spouse.

Economists would refer to marriage as a matching market. Unlike a conventional market where you can buy and sell arbitrary quantities, marriage is (usually; polygamy notwithstanding) a one-to-one arrangement. And unlike even the job market (which is also a one-to-one matching market), marriage usually doesn’t involve direct monetary payments (though in cultures with dowries it arguably does).

The usual model of a matching market has two separate pools: Employers and employees, for example. Typical heteronormative analyses of marriage have done likewise, separating men and women into different pools. But it turns out that sometimes men marry men and women marry women.

So what happens to our matching theory if we allow the pools to overlap?

I think the most sensible way to do it, actually, is to have only one pool: people who want to get married. Then, the way we capture the fact that most—but not all—men only want to marry women, and most—but not all—women only want to marry men is through the utililty function: Heterosexuals are simply those for whom a same-sex match would have very low utility. This would actually mean modeling marriage as a form of the stable roommates problem. (Oh my god, they were roommates!)

The stable roommates problem actually turns out to be harder than the conventional (heteronormative) stable marriage problem; in fact, while the hetero marriage problem (as I’ll henceforth call it) guarantees at least one stable matching for any preference ordering, the queer marriage problem can fail to have any stable solutions. While the hetero marriage problem ensures that everyone will eventually be matched to someone (if the number of men is equal to the number of women), sadly, the queer marriage problem can result in some people being forever rejected and forever alone. (There. Now you can blame the gays for ruining something: We ruined marriage matching.)

The queer marriage problem is actually more general than the hetero marriage problem: The hetero marriage problem is just the queer marriage problem with a particular utility function that assigns everyone strictly gendered preferences.

The best known algorithm for the queer marriage problem is an extension of the standard Gale-Shapley algorithm for the hetero marriage problem, with the same O(n^2) complexity in theory but a considerably more complicated implementation in practice. Honestly, while I can clearly grok the standard algorithm well enough to explain it to someone, I’m not sure I completely follow this one.

Then again, maybe preference orderings aren’t such a great approach after all. There has been a movement in economics toward what is called ordinal utility, where we speak only of preference orderings: You can like A more than B, but there’s no way to say how much more. But I for one am much more inclined toward cardinal utility, where differences have magnitudes: I like Coke more than Pepsi, and I like getting massaged more than being stabbed—and the difference between Coke and Pepsi is a lot smaller than the difference between getting massaged and being stabbed. (Many economists make much of the notion that even cardinal utility is “equivalent up to an affine transformation”, but I’ve got some news for you: So are temperature and time. All you are really doing by making an “affine transformation” is assigning a starting point and a unit of measurement. Temperature has a sensible absolute zero to use as a starting point, you say? Well, so does utility—not existing. )

With cardinal utility, I can offer you a very simple naive algorithm for finding an optimal match: Just try out every possible set of matchings and pick the one that has the highest total utility.

There are up to n!/((n/2)! 2^n) possible matchings to check, so this could take a long time—but it should work. I’m sure there’s a more efficient algorithm out there, but I don’t have the mental energy to figure it out at the moment. It might still be NP-hard, but I doubt it’s that hard.

Moreover, even once we find a utility-maximizing matching, that doesn’t guarantee a stable matching: Some people might still prefer to change even if it would end up reducing total utility.

Here’s a simple set of preferences for which that becomes an issue. In this table, the row is the person making the evaluation, and the columns are how much utility they assign to a match with each person. The total utility of a match is just the sum of utility from the two partners. The utility of “matching with yourself” is the utility of not being matched at all.


ABCD
A0321
B2031
C3201
D3210

Since everyone prefers every other person to not being matched at all (likely not true in real life!), the optimal matchings will always match everyone with someone. Thus, there are actually only 3 matchings to compare:

AB, CD: (3+2)+(1+1) = 7

AC, BD: (2+3)+(1+2) = 8

AD, BC: (1+3)+(3+2) = 9

The optimal matching, in utilitarian terms, is to match A with D and B with C. This yields total utility of 9.

But that’s not stable, because A prefers C over D, and C prefers A over B. So A and C would choose to pair up instead.

In fact, this set of preferences yields no stable matching at all. For anyone who is partnered with D, another member will rate them highest, and D’s partner will prefer that person over D (because D is everyone’s last choice).

There is always a nonempty set of utility-maximizing matchings. (There must be at least one, and could in principle have as many as there are possible matchings.) This actually just follows from the well-ordering property of the real numbers: Any finite set of reals has a maximum.

As this counterexample shows, there isn’t always a stable matching.

So here are a couple of interesting theoretical questions that this gives rise to:
1. If there is a stable matching, must it be in the set of utility-maximizing matchings?

2. If there is a stable matching, must all utility-maximizing matchings be stable?

Question 1 asks whether being stable implies being utility-maximizing.
Question 2 asks whether being utility-maximizing implies being stable—conditional on there being at least one stable possibility.

So, what is the answer to these questions? I don’t know! I’m actually not sure anyone does! We may have stumbled onto cutting-edge research!

I found a paper showing that these properties do not hold when you are doing the hetero marriage problem and you use multiplicative utility for matchings, but this is the queer marriage problem, and moreover I think multiplicative utility is the wrong approach. It doesn’t make sense to me to say that a marriage where one person is extremely happy and the other is indifferent to leaving is equivalent to a marriage where both partners are indifferent to leaving, but that’s what you’d get if you multiply 1*0 = 0. And if you allow negative utility from matchings (i.e. some people would prefer to remain single than to be in a particular match—which seems sensible enough, right?), since -1*-1 = 1, multiplicative utility yields the incredibly perverse result that two people who despise each other constitute a great match. Additive utility solves both problems: 1+0 = 1 and -1+-1 = -2, so, as we would hope, like + indifferent = like, and hate + hate = even more hate.

There is something to be said for the idea that two people who kind of like each other is better than one person ecstatic and the other miserable, but (1) that’s actually debatable, isn’t it? And (2) I think that would be better captured by somehow penalizing inequality in matches, not by using multiplicative utility.

Of course, I haven’t done a really thorough literature search, so other papers may exist. Nor have I spent a lot of time just trying to puzzle through this problem myself. Perhaps I should; this is sort of my job, after all. But even if I had the spare energy to invest heavily in research at the moment (which I sadly do not), I’ve been warned many times that pure theory papers are hard to publish, and I have enough trouble getting published as it is… so perhaps not.

My intuition is telling me that 2 is probably true but 1 is probably false. That is, I would guess that the set of stable matchings, when it’s not empty, is actually larger than the set of utility-maximizing matchings.

I think where I’m getting that intuition is from the properties of Pareto-efficient allocations: Any utility-maximizing allocation is necessarily Pareto-efficient, but many Pareto-efficient allocations are not utility-maximizing. A stable matching is sort of a strengthening of the notion of a Pareto-efficient allocation (though the problem of finding a Pareto-efficient matching for the general queer marriage problem has been solved).

But it is interesting to note that while a Pareto-efficient allocation must exist (typically there are many, but there must be at least one, because it’s impossible to have a cycle of Pareto improvements as long as preferences are transitive), it’s entirely possible to have no stable matchings at all.

On the quality of matches

Apr 11 JDN 2459316

Many situations in the real world involve matching people to other people: Dating, job hunting, college admissions, publishing, organ donation.

Alvin Roth won his Nobel Prize for his work on matching algorithms. I have nothing to contribute to improving his algorithm; what baffles me is that we don’t use it more often. It would probably feel too impersonal to use it for dating; but why don’t we use it for job hunting or college admissions? (We do use it for organ donation, and that has saved thousands of lives.)

In this post I will be looking at matching in a somewhat different way. Using a simple model, I’m going to illustrate some of the reasons why it is so painful and frustrating to try to match and keep getting rejected.

Suppose we have two sets of people on either side of a matching market: X and Y. I’ll denote an arbitrarily chosen person in X as x, and an arbitrarily chosen person in Y as y. There’s no reason the two sets can’t have overlap or even be the same set, but making them different sets makes the model as general as possible.

Each person in X wants to match with a person in Y, and vice-versa. But they don’t merely want to accept any possible match; they have preferences over which matches would be better or worse.

In general, we could say that people have some kind of utility function: Ux:Y->R and Uy:X->R that maps from possible match partners to the utility of such a match. But that gets very complicated very fast, because it raises the question of when you should keep searching, and when you should stop searching and accept what you have. (There’s a whole literature of search theory on this.)

For now let’s take the simplest possible case, and just say that there are some matches each person will accept, and some they will reject. This can be seen as a special case where the utility functions Ux and Uy always yield a result of 1 (accept) or 0 (reject).

This defines a set of acceptable partners for each person: A(x) is the set of partners x will accept: {y in Y|Ux(y) = 1} and A(y) is the set of partners y will accept: {x in X|Uy(x) = 1}

Then, the set of mutual matches than x can actually get is the set of ys that x wants, which also want x back: M(x) = {y in A(x)|x in A(y)}

Whereas, the set of mutual matches that y can actually get is the set of xs that y wants, which also want y back: M(y) = {x in A(y)|y in A(x)}

This relation is mutual by construction: If x is in M(y), then y is in M(x).

But this does not mean that the sets must be the same size.

For instance, suppose that there are three people in X, x1, x2, x3, and three people in Y, y1, y2, y3.

Let’s say that the acceptable matches are as follows:

A(x1) = {y1, y2, y3}

A(x2) = {y2, y3}

A(x3) = {y2, y3}

A(y1) = {x1,x2,x3}

A(y2) = {x1,x2}

A(y3) = {x1}

This results in the following mutual matches:

M(x1) = {y1, y2, y3}

M(y1) = {x1}

M(x2) = {y2}

M(y2) = {x1, x2}

M(x3) = {}

M(y3) = {x1}

x1 can match with whoever they like; everyone wants to match with them. x2 can match with y2. But x3, despite having the same preferences as x2, and being desired by y3, can’t find any mutual matches at all, because the one person who wants them is a person they don’t want.

y1 can only match with x1, but the same is true of y3. So they will be fighting over x1. As long as y2 doesn’t also try to fight over x1, x2 and y2 will be happy together. Yet x3 will remain alone.

Note that the number of mutual matches has no obvious relation with the number of individually acceptable partners. x2 and x3 had the same number of acceptable partners, but x2 found a mutual match and x3 didn’t. y1 was willing to accept more potential partners than y3, but got the same lone mutual match in the end. y3 was only willing to accept one partner, but will get a shot at x1, the one that everyone wants.

One thing is true: Adding another acceptable partner will never reduce your number of mutual matches, and removing one will never increase it. But often changing your acceptable partners doesn’t have any effect on your mutual matches at all.

Now let’s consider what it must feel like to be x1 versus x3.

For x1, the world is their oyster; they can choose whoever they want and be guaranteed to get a match. Life is easy and simple for them; all they have to do is decide who they want most and that will be it.

For x3, life is an endless string of rejection and despair. Every time they try to reach out to suggest a match with someone, they are rebuffed. They feel hopeless and alone. They feel as though no one would ever actually want them—even though in fact there is someone who wants them, it’s just not someone they were willing to consider.

This is of course a very simple and small-scale model; there are only six people in it, and they each only say yes or no. Yet already I’ve got x1 who feels like a rock star and x3 who feels utterly hopeless if not worthless.

In the real world, there are so many more people in the system that the odds that no one is in your mutual match set are negligible. Almost everyone has someone they can match with. But some people have many more matches than others, and that makes life much easier for the ones with many matches and much harder for the ones with fewer.

Moreover, search costs then become a major problem: Even knowing that in all probability there is a match for you somewhere out there, how do you actually find that person? (And that’s not even getting into the difficulty of recognizing a good match when you see it; in this simple model you know immediately, but in the real world it can take a remarkably long time.)

If we think of the acceptable partner sets as preferences, they may not be within anyone’s control; you want what you want. But if we instead characterize them as decisions, the results are quite differentand I think it’s easy to see them, if nothing else, as the decision of how high to set your standards.

This raises a question: When we are searching and not getting matches, should we lower our standards and add more people to our list of acceptable partners?

This simple model would seem to say that we should always do that—there’s no downside, since the worst that can happen is nothing. And x3 for instance would be much happier if they were willing to lower their standards and accept y1. (Indeed, if they did so, there would be a way to pair everyone off happily: x1 with y3, x2 with y2, and x3 with y1.)

But in the real world, searching is often costly: There is at least the involved, and often a literal application or submission fee; but perhaps worst of all is the crushing pain of rejection. Under those circumstances, adding another acceptable partner who is not a mutual match will actually make you worse off.

That’s pretty much what the job market has been for me for the last six months. I started out with the really good matches: GiveWell, the Oxford Global Priorities Institute, Purdue, Wesleyan, Eastern Michigan University. And after investing considerable effort into getting those applications right, I made it as far as an interview at all those places—but no further.

So I extended my search, applying to dozens more places. I’ve now applied to over 100 positions. I knew that most of them were not good matches, because there simply weren’t that many good matches to be found. And the result of all those 100 applications has been precisely 0 interviews. Lowering my standards accomplished absolutely nothing. I knew going in that these places were not a good fit for me—and it looks like they all agreed.

It’s possible that lowering my standards in some different way might have worked, but even this is not clear: I’ve already been willing to accept much lower salaries than a PhD in economics ought to entitle, and included positions in my search that are only for a year or two with no job security, and applied to far-flung locales across the globe that I don’t know if I’d really be willing to move to.

Honestly at this point I’ve only been using the following criteria: (1) At least vaguely related to my field (otherwise they wouldn’t want me anyway), (2) a higher salary than I currently get as a grad student (otherwise why bother?), (3) a geographic location where homosexuality is not literally illegal and an institution that doesn’t actively discriminate against LGBT employees (this rules out more than you’d think—there are at least three good postings I didn’t apply to on these grounds), (4) in a region that speaks a language I have at least some basic knowledge of (i.e. preferably English, but also allowing Spanish, French, German, or Japanese) (5) working conditions that don’t involve working more than 40 hours per week (which has severely detrimental health effects, even ignoring my disability which would compound the effects), and (6) not working for a company that is implicated in large-scale criminal activity (as a remarkable number of major banks have in fact been implicated). I don’t feel like these are unreasonably high standards, and yet so far I have failed to land a match.

What’s more, the entire process has been emotionally devastating. While others seem to be suffering from pandemic burnout, I don’t think I’ve made it that far; I think I’d be just as burnt out even if there were no pandemic, simply from how brutal the job market has been.

Why does rejection hurt so much? Why does being turned down for a date, or a job, or a publication feel so utterly soul-crushing? When I started putting together this model I had hoped that thinking of it in terms of match-sets might actually help reduce that feeling, but instead what happened is that it offered me a way of partly explaining that feeling (much as I did in my post on Bayesian Impostor Syndrome).

What is the feeling of rejection? It is the feeling of expending search effort to find someone in your acceptable partner set—and then learning that you were not in their acceptable partner set, and thus you have failed to make a mutual match.

I said earlier that x1 feels like a rock star and x3 feels hopeless. This is because being present in someone else’s acceptable partner set is a sign of status—the more people who consider you an acceptable partner, the more you are “worth” in some sense. And when it’s something as important as a romantic partner or a career, that sense of “worth” is difficult to circumscribe into a particular domain; it begins to bleed outward into a sense of your overall self-worth as a human being.

Being wanted by someone you don’t want makes you feel superior, like they are “beneath” you; but wanting someone who doesn’t want you makes you feel inferior, like they are “above” you. And when you are applying for jobs in a market with a Beveridge Curve as skewed as ours, or trying to get a paper or a book published in a world flooded with submissions, you end up with a lot more cases of feeling inferior than cases of feeling superior. In fact, I even applied for a few jobs that I felt were “beneath” my level—they didn’t take me either, perhaps because they felt I was overqualified.

In such circumstances, it’s hard not to feel like I am the problem, like there is something wrong with me. Sometimes I can convince myself that I’m not doing anything wrong and the market is just exceptionally brutal this year. But I really have no clear way of distinguishing that hypothesis from the much darker possibility that I have done something terribly wrong that I cannot correct and will continue in this miserable and soul-crushing fruitless search for months or even years to come. Indeed, I’m not even sure it’s actually any better to know that you did everything right and still failed; that just makes you helpless instead of defective. It might be good for my self-worth to know that I did everything right; but it wouldn’t change the fact that I’m in a miserable situation I can’t get out of. If I knew I were doing something wrong, maybe I could actually fix that mistake in the future and get a better outcome.

As it is, I guess all I can do is wait for more opportunities and keep trying.

Signaling and the Curse of Knowledge

Jan 3 JDN 2459218

I received several books for Christmas this year, and the one I was most excited to read first was The Sense of Style by Steven Pinker. Pinker is exactly the right person to write such a book: He is both a brilliant linguist and cognitive scientist and also an eloquent and highly successful writer. There are two other books on writing that I rate at the same tier: On Writing by Stephen King, and The Art of Fiction by John Gardner. Don’t bother with style manuals from people who only write style manuals; if you want to learn how to write, learn from people who are actually successful at writing.

Indeed, I knew I’d love The Sense of Style as soon as I read its preface, containing some truly hilarious takedowns of Strunk & White. And honestly Strunk & White are among the best standard style manuals; they at least actually manage to offer some useful advice while also being stuffy, pedantic, and often outright inaccurate. Most style manuals only do the second part.

One of Pinker’s central focuses in The Sense of Style is on The Curse of Knowledge, an all-too-common bias in which knowing things makes us unable to appreciate the fact that other people don’t already know it. I think I succumbed to this failing most greatly in my first book, Special Relativity from the Ground Up, in which my concept of “the ground” was above most people’s ceilings. I was trying to write for high school physics students, and I think the book ended up mostly being read by college physics professors.

The problem is surely a real one: After years of gaining expertise in a subject, we are all liable to forget the difficulty of reaching our current summit and automatically deploy concepts and jargon that only a small group of experts actually understand. But I think Pinker underestimates the difficulty of escaping this problem, because it’s not just a cognitive bias that we all suffer from time to time. It’s also something that our society strongly incentivizes.

Pinker points out that a small but nontrivial proportion of published academic papers are genuinely well written, using this to argue that obscurantist jargon-laden writing isn’t necessary for publication; but he didn’t seem to even consider the fact that nearly all of those well-written papers were published by authors who already had tenure or even distinction in the field. I challenge you to find a single paper written by a lowly grad student that could actually get published without being full of needlessly technical terminology and awkward passive constructions: “A murian model was utilized for the experiment, in an acoustically sealed environment” rather than “I tested using mice and rats in a quiet room”. This is not because grad students are more thoroughly entrenched in the jargon than tenured professors (quite the contrary), nor that grad students are worse writers in general (that one could really go either way), but because grad students have more to prove. We need to signal our membership in the tribe, whereas once you’ve got tenure—or especially once you’ve got an endowed chair or something—you have already proven yourself.

Pinker seems to briefly touch this insight (p. 69), without fully appreciating its significance: “Even when we have an inlkling that we are speaking in a specialized lingo, we may be reluctant to slip back into plain speech. It could betray to our peers the awful truth that we are still greenhorns, tenderfoots, newbies. And if our readers do know the lingo, we might be insulting their intelligence while spelling it out. We would rather run the risk of confusing them while at least appearing to be soophisticated than take a chance at belaboring the obvious while striking them as naive or condescending.”

What we are dealing with here is a signaling problem. The fact that one can write better once one is well-established is the phenomenon of countersignaling, where one who has already established their status stops investing in signaling.

Here’s a simple model for you. Suppose each person has a level of knowledge x, which they are trying to demonstrate. They know their own level of knowledge, but nobody else does.

Suppose that when we observe someone’s knowledge, we get two pieces of information: We have an imperfect observation of their true knowledge which is x+e, the real value of x plus some amount of error e. Nobody knows exactly what the error is. To keep the model as simple as possible I’ll assume that e is drawn from a uniform distribution between -1 and 1.

Finally, assume that we are trying to select people above a certain threshold: Perhaps we are publishing in a journal, or hiring candidates for a job. Let’s call that threshold z. If x < z-1, then since e can never be larger than 1, we will immediately observe that they are below the threshold and reject them. If x > z+1, then since e can never be smaller than -1, we will immediately observe that they are above the threshold and accept them.

But when z-1 < x < z+1, we may think they are above the threshold when they actually are not (if e is positive), or think they are not above the threshold when they actually are (if e is negative).

So then let’s say that they can invest in signaling by putting some amount of visible work in y (like citing obscure papers or using complex jargon). This additional work may be costly and provide no real value in itself, but it can still be useful so long as one simple condition is met: It’s easier to do if your true knowledge x is high.

In fact, for this very simple model, let’s say that you are strictly limited by the constraint that y <= x. You can’t show off what you don’t know.

If your true value x > z, then you should choose y = x. Then, upon observing your signal, we know immediately that you must be above the threshold.

But if your true value x < z, then you should choose y = 0, because there’s no point in signaling that you were almost at the threshold. You’ll still get rejected.

Yet remember before that only those with z-1 < x < z+1 actually need to bother signaling at all. Those with x > z+1 can actually countersignal, by also choosing y = 0. Since you already have tenure, nobody doubts that you belong in the club.

This means we’ll end up with three groups: Those with x < z, who don’t signal and don’t get accepted; those with z < x < z+1, who signal and get accepted; and those with x > z+1, who don’t signal but get accepted. Then life will be hardest for those who are just above the threshold, who have to spend enormous effort signaling in order to get accepted—and that sure does sound like grad school.

You can make the model more sophisticated if you like: Perhaps the error isn’t uniformly distributed, but some other distribution with wider support (like a normal distribution, or a logistic distribution); perhaps the signalling isn’t perfect, but itself has some error; and so on. With such additions, you can get a result where the least-qualified still signal a little bit so they get some chance, and the most-qualified still signal a little bit to avoid a small risk of being rejected. But it’s a fairly general phenomenon that those closest to the threshold will be the ones who have to spend the most effort in signaling.

This reveals a disturbing overlap between the Curse of Knowledge and Impostor Syndrome: We write in impenetrable obfuscationist jargon because we are trying to conceal our own insecurity about our knowledge and our status in the profession. We’d rather you not know what we’re talking about than have you realize that we don’t know what we’re talking about.

For the truth is, we don’t know what we’re talking about. And neither do you, and neither does anyone else. This is the agonizing truth of research that nearly everyone doing research knows, but one must be either very brave, very foolish, or very well-established to admit out loud: It is in the nature of doing research on the frontier of human knowledge that there is always far more that we don’t understand about our subject than that we do understand.

I would like to be more open about that. I would like to write papers saying things like “I have no idea why it turned out this way; it doesn’t make sense to me; I can’t explain it.” But to say that the profession disincentivizes speaking this way would be a grave understatement. It’s more accurate to say that the profession punishes speaking this way to the full extent of its power. You’re supposed to have a theory, and it’s supposed to work. If it doesn’t actually work, well, maybe you can massage the numbers until it seems to, or maybe you can retroactively change the theory into something that does work. Or maybe you can just not publish that paper and write a different one.

Here is a graph of one million published z-scores in academic journals:

It looks like a bell curve, except that almost all the values between -2 and 2 are mysteriously missing.

If we were actually publishing all the good science that gets done, it would in fact be a very nice bell curve. All those missing values are papers that never got published, or results that were excluded from papers, or statistical analyses that were massaged, in order to get a p-value less than the magical threshold for publication of 0.05. (For the statistically uninitiated, a z-score less than -2 or greater than +2 generally corresponds to a p-value less than 0.05, so these are effectively the same constraint.)

I have literally never read a single paper published in an academic journal in the last 50 years that said in plain language, “I have no idea what’s going on here.” And yet I have read many papers—probably most of them, in fact—where that would have been an appropriate thing to say. It’s actually quite a rare paper, at least in the social sciences, that actually has a theory good enough to really precisely fit the data and not require any special pleading or retroactive changes. (Often the bar for a theory’s success is lowered to “the effect is usually in the right direction”.) Typically results from behavioral experiments are bizarre and baffling, because people are a little screwy. It’s just that nobody is willing to stake their career on being that honest about the depth of our ignorance.

This is a deep shame, for the greatest advances in human knowledge have almost always come from people recognizing the depth of their ignorance. Paradigms never shift until people recognize that the one they are using is defective.

This is why it’s so hard to beat the Curse of Knowledge: You need to signal that you know what you’re talking about, and the truth is you probably don’t, because nobody does. So you need to sound like you know what you’re talking about in order to get people to listen to you. You may be doing nothing more than educated guesses based on extremely limited data, but that’s actually the best anyone can do; those other people saying they have it all figured out are either doing the same thing, or they’re doing something even less reliable than that. So you’d better sound like you have it all figured out, and that’s a lot more convincing when you “utilize a murian model” than when you “use rats and mice”.

Perhaps we can at least push a little bit toward plainer language. It helps to be addressing a broader audience: it is both blessing and curse that whatever I put on this blog is what you will read, without any gatekeepers in my path. I can use plainer language here if I so choose, because no one can stop me. But of course there’s a signaling risk here as well: The Internet is a public place, and potential employers can read this as well, and perhaps decide they don’t like me speaking so plainly about the deep flaws in the academic system. Maybe I’d be better off keeping my mouth shut, at least for awhile. I’ve never been very good at keeping my mouth shut.

Once we get established in the system, perhaps we can switch to countersignaling, though even this doesn’t always happen. I think there are two reasons this can fail: First, you can almost always try to climb higher. Once you have tenure, aim for an endowed chair. Once you have that, try to win a Nobel. Second, once you’ve spent years of your life learning to write in a particular stilted, obscurantist, jargon-ridden way, it can be very difficult to change that habit. People have been rewarding you all your life for writing in ways that make your work unreadable; why would you want to take the risk of suddenly making it readable?

I don’t have a simple solution to this problem, because it is so deeply embedded. It’s not something that one person or even a small number of people can really fix. Ultimately we will need to, as a society, start actually rewarding people for speaking plainly about what they don’t know. Admitting that you have no clue will need to be seen as a sign of wisdom and honesty rather than a sign of foolishness and ignorance. And perhaps even that won’t be enough: Because the fact will still remain that knowing what you know that other people don’t know is a very difficult thing to do.

Hyper-competition

Dec13 JDN 2459197

This phenomenon has been particularly salient for me the last few months, but I think it’s a common experience for most people in my generation: Getting a job takes an awful lot of work.

Over the past six months, I’ve applied to over 70 different positions and so far gone through 4 interviews (2 by video, 2 by phone). I’ve done about 10 hours of test work. That so far has gotten me no offers, though I have yet to hear from 50 employers. Ahead of me I probably have about another 10 interviews, then perhaps 4 of what would have been flyouts and in-person presentations but instead will be “comprehensive interviews” and presentations conducted online, likely several more hours of test work, and then finally, maybe, if I’m lucky, I’ll get a good offer or two. If I’m unlucky, I won’t, and I’ll have to stick around for another year and do all this over again next year.

Aside from the limitations imposed by the pandemic, this is basically standard practice for PhD graduates. And this is only the most extreme end of a continuum of intensive job search efforts, for which even applying to be a cashier at Target requires a formal application, references, and a personality test.

This wasn’t how things used to be. Just a couple of generations ago, low-wage employers would more or less hire you on the spot, with perhaps a resume or a cursory interview. More prestigious employers would almost always require a CV with references and an interview, but it more or less stopped there. I discussed in an earlier post how much of the difference actually seems to come from our chronic labor surplus.

Is all of this extra effort worthwhile? Are we actually fitting people to better jobs this way? Even if the matches are better, are they enough better to justify all this effort?

It is a commonly-held notion among economists that competition in markets is good, that it increases efficiency and improves outcomes. I think that this is often, perhaps usually, the case. But the labor market has become so intensely competitive, particularly for high-paying positions, that the costs of this competitive effort likely outweigh the benefits.

How could this happen? Shouldn’t the free market correct for such an imbalance? Not necessarily. Here is a simple formal model of how this sort of intensive competition can result in significant waste.

Note that this post is about a formal mathematical model, so it’s going to use a lot of algebra. If you are uninterested in such things, you can read the next two paragraphs and then skip to the conclusions at the end.

The overall argument is straightforward: If candidates are similar in skill level, a complicated application process can make sense from a firm’s perspective, but be harmful from society’s perspective, due to the great cost to the applicants. This can happen because the difficult application process imposes an externality on the workers who don’t get the job.

All right, here is where the algebra begins.

I’ve included each equation as both formatted text and LaTeX.

Consider a competition between two applicants, X and Z.

They are each asked to complete a series of tasks in an application process. The amount of effort X puts into the application is x, and the amount of effort Z puts into the application is z. Let’s say each additional bit of effort has a fixed cost, normalized to 1.

Let’s say that their skills are similar, but not identical; this seems quite realistic. X has skill level hx, and Z has skill level hz.

Getting hired has a payoff for each worker of V. This includes all the expected benefits of the salary, benefits, and working conditions. I’ll assume that these are essentially the same for both workers, which also seems realistic.

The benefit to the employer is proportional to the worker’s skill, so letting h be the skill level of the actually hired worker, the benefit of hiring that worker is hY. The reason they are requiring this application process is precisely because they want to get the worker with the highest h. Let’s say that this application process has a cost to implement, c.

Who will get hired? Well, presumably whoever does better on the application. The skill level will amplify the quality of their output, let’s say proportionally to the effort they put in; so X’s expected quality will be hxx and Z’s expected output will be hzz.

Let’s also say there’s a certain amount of error in the process; maybe the more-qualified candidate will sleep badly the day of the interview, or make a glaring and embarrassing typo on their CV. And quite likely the quality of application output isn’t perfectly correlated with the quality of actual output once hired. To capture all this, let’s say that having more skill and putting in more effort only increases your probability of getting the job, rather than actually guaranteeing it.

In particular, let’s say that the probability of X getting hired is P[X] = hxx/(hxx + hzz).

\[ P[X] = \frac{h_x}{h_x x + h_z z} \]

This results in a contest function, a type of model that I’ve discussed in some earlier posts in a rather different context.


The expected payoff for worker X is:

E[Ux] = hxx/(hxx + hzz) V – x

\[ E[U_x] = \frac{h_x x}{h_x x + h_z z} V – x \]

Maximizing this with respect to the choice of effort x (which is all that X can control at this point) yields:

hxhzz V = (hxx + hzz)2

\[ h_x h_z x V = (h_x x + h_z z)^2 \]

A similar maximization for worker Z yields:

hxhzx V = (hxx + hzz)2

\[ h_x h_z z V = (h_x x + h_z z)^2 \]

It follows that x=z, i.e. X and Z will exert equal efforts in Nash equilibrium. Their probability of success will then be contingent entirely on their skill levels:

P[X] = hx/(hx + hz).

\[ P[X] = \frac{h_x}{h_x + h_y} \]

Substituting that back in, we can solve for the actual amount of effort:

hxhzx V = (hx + hz)2x2

\[h_x h_z x V = (h_x + h_z)^2 x^2 \]

x = hxhzV/(hx + hz)2

\[ x = \frac{h_x h_z}{h_x + h_z} V \]

Now let’s see what that gives for the expected payoffs of the firm and the workers. This is worker X’s expected payoff:

E[Ux] = hx/(hx + hz) V – hxhzV/(hx + hz)2 = (hx/(hx + hz))2 V

\[ E[U_x] = \frac{h_x}{h_x + h_z} V – \frac{h_x h_z}{(h_x + h_z)^2} V = \left( \frac{h_x}{h_x + h_z}\right)^2 V \]

Worker Z’s expected payoff is the same, with hx and hz exchanged:

E[Uz] = (hz/(hx + hz))2 V

\[ E[U_z] = \left( \frac{h_z}{h_x + h_z}\right)^2 V \]

What about the firm? Their expected payoff is the the probability of hiring X, times the value of hiring X, plus the probability of hiring Z, times the value of hiring Z, all minus the cost c:

E[Uf] = hx/(hx + hz) hx Y + hz/(hx + hz) hz Y – c= (hx2 + hz2)/(hx + hz) Y – c

\[ E[U_f] = \frac{h_x}{h_x + h_z} h_x Y + \frac{h_z}{h_x + h_z} h_z Y – c = \frac{h_x^2 + h_z^2}{h_x + h_z} Y – c\]

To see whether the application process was worthwhile, let’s compare against the alternative of simply flipping a coin and hiring X or Z at random. The probability of getting hired is then 1/2 for each candidate.

Expected payoffs for X and Z are now equal:

E[Ux] = E[Uz] = V/2

\[ E[U_x] = E[U_z] = \frac{V}{2} \]

The expected payoff for the firm can be computed the same as before, but now without the cost c:

E[Uf] = 1/2 hx Y + 1/2 hz Y = (hx + hz)/2 Y

\[ E[U_f] = \frac{1}{2} h_x Y + \frac{1}{2} h_z Y = \frac{h_x + h_z}{2} Y \]

This has a very simple interpretation: The expected value to the firm is just the average quality of the two workers, times the overall value of the job.

Which of these two outcomes is better? Well, that depends on the parameters, of course. But in particular, it depends on the difference between hx and hz.

Consider two extremes: In one case, the two workers are indistinguishable, and hx = hz = h. In that case, the payoffs for the hiring process reduce to the following:

E[Ux] = E[Uz] = V/4

\[ E[U_x] = E[U_z] = \frac{V}{4} \]

E[Uf] = h Y – c

\[ E[U_f] = h Y – c \]

Compare this against the payoffs for hiring randomly:

E[Ux] = E[Uz] = V/2

\[ E[U_x] = E[U_z] = \frac{V}{2} \]

E[Uf] = h Y

\[ E[U_f] = h Y \]

Both the workers and the firm are strictly better off if the firm just hires at random. This makes sense, since the workers have identical skill levels.

Now consider the other extreme, where one worker is far better than the other; in fact, one is nearly worthless, so hz ~ 0. (I can’t do exactly zero because I’d be dividing by zero, but let’s say one is 100 times better or something.)

In that case, the payoffs for the hiring process reduce to the following:

E[Ux] = V

E[Uz] = 0

\[ E[U_x] = V \]

\[ E[U_z] = 0 \]

X will definitely get the job, so X is much better off.

E[Uf] = hx Y – c

\[ E[U_f] = h_x Y – c \]

If the firm had hired randomly, this would have happened instead:

E[Ux] = E[Uz] = V/2

\[ E[U_x] = E[U_z] = \frac{V}{2} \]

E[Uf] = hY/2

\[ E[U_f] = \frac{h}{2} Y \]

As long as c < hY/2, both the firm and the higher-skill worker are better off in this scenario. (The lower-skill worker is worse off, but that’s not surprising.) The total expected benefit for everyone is also higher in this scenario.


Thus, the difference in skill level between the applicants is vital. If candidates are very different in skill level, in a way that the application process can accurately measure, then a long and costly application process can be beneficial, not only for the firm but also for society as a whole.

In these extreme examples, it was either not worth it for the firm, or worth it for everyone. But there is an intermediate case worth looking at, where the long and costly process can be worth it for the firm, but not for society as a whole. I will call this case hyper-competition—a system that is so competitive it makes society overall worse off.

This inefficient result occurs precisely when:
c < (hx2 + hz2)/(hx + hz) Y – (hx + hz)/2 Y < c + (hx/(hx + hz))2 V + (hz/(hx + hz))2 V

\[ c < \frac{h_x^2 + h_z^2}{h_x + h_z} Y – \frac{h_x + h_z}{2} Y < c + \left( \frac{h_x}{h_x + h_z}\right)^2 V + \left( \frac{h_z}{h_x + h_z}\right)^2 V \]

This simplifies to:

c < (hx – hz)2/(2hx + 2hz) Y < c + (hx2 + hz2)/(hx + hz)2 V

\[ c < \frac{(h_x – h_z)^2}{2 (h_x + h_z)} Y < c + \frac{(h_x^2 + h_z^2)}{(h_x+h_z)^2} V \]

If c is small, then we are interested in the case where:

(hx – hz)2 Y/2 < (hx2 + hz2)/(hx + hz) V

\[ \frac{(h_x – h_z)^2}{2} Y < \frac{h_x^2 + h_z^2}{h_x + h_z} V \]

This is true precisely when the difference hx – hz is small compared to the overall size of hx or hz—that is, precisely when candidates are highly skilled but similar. This is pretty clearly the typical case in the real world. If the candidates were obviously different, you wouldn’t need a competitive process.

For instance, suppose that hx = 10 and hz = 8, while V = 180, Y = 20 and c = 1.

Then, if we hire randomly, these are the expected payoffs:

E[Uf] = (hx + hz)/2 Y = 180

E[Ux] = E[Uz] = V/2 = 90

If we use the complicated hiring process, these are the expected payoffs:

E[Ux] = (hx/(hx + hz))2 V = 55.5

E[Uz] = (hz/(hx + hz))2 V = 35.5

E[Uf] = (hx2 + hz2)/(hx + hz) Y – c = 181

The firm gets a net benefit of 1, quite small; while the workers face a far larger total expected loss of 90. And these candidates aren’t that similar: One is 25% better than the other. Yet because the effort expended in applying was so large, even this improvement in quality wasn’t worth it from society’s perspective.

This conclude’s the algebra for today, if you’ve been skipping it.

In this model I’ve only considered the case of exactly two applicants, but this can be generalized to more applicants, and the effect only gets stronger: Seemingly-large differences in each worker’s skill level can be outweighed by the massive cost of making so many people work so hard to apply and get nothing to show for it.

Thus, hyper-competition can exist despite apparently large differences in skill. Indeed, it is precisely the typical real-world scenario with many applicants who are similar that we expect to see the greatest inefficiencies. In the absence of intervention, we should expect markets to get this wrong.

Of course, we don’t actually want employers to hire randomly, right? We want people who are actually qualified for their jobs. Yes, of course; but you can probably assess that with nothing more than a resume and maybe a short interview. Most employers are not actually trying to find qualified candidates; they are trying to sift through a long list of qualified candidates to find the one that they think is best qualified. And my suspicion is that most of them honestly don’t have good methods of determining that.

This means that it could be an improvement for society to simply ban long hiring processes like these—indeed, perhaps ban job interviews altogether, as I can hardly think of a more efficient mechanism for allowing employers to discriminate based on race, gender, age, or disability than a job interview. Just collect a resume from each applicant, remove the ones that are unqualified, and then roll a die to decide which one you hire.

This would probably make the fit of workers to their jobs somewhat worse than the current system. But most jobs are learned primarily through experience anyway, so once someone has been in a job for a few years it may not matter much who was hired originally. And whatever cost we might pay in less efficient job matches could be made up several times over by the much faster, cheaper, easier, and less stressful process of applying for jobs.

Indeed, think for a moment of how much worse it feels being turned down for a job after a lengthy and costly application process that is designed to assess your merit (but may or may not actually do so particularly well), as opposed to simply finding out that you lost a high-stakes die roll. Employers could even send out letters saying one of two things: “You were rejected as unqualifed for this position.” versus “You were qualified, but you did not have the highest die roll.” Applying for jobs already feels like a crapshoot; maybe it should literally be one.

People would still have to apply for a lot of jobs—actually, they’d probably end up applying for more, because the lower cost of applying would attract more applicants. But since the cost is so much lower, it would still almost certainly be easier to do a job search than it is in the current system. In fact, it could largely be automated: simply post your resume on a central server and the system matches you with employers’ requirements and then randomly generates offers. Employers and prospective employees could fill out a series of forms just once indicating what they were looking for, and then the system could do the rest.

What I find most interesting about this policy idea is that it is in an important sense anti-meritocratic. We are in fact reducing the rewards for high levels of skill—at least a little bit—in order to improve society overall and especially for those with less skill. This is exactly the kind of policy proposal that I had hoped to see from a book like The Meritocracy Trap, but never found there. Perhaps it’s too radical? But the book was all about how we need fundamental, radical change—and then its actual suggestions were simple, obvious, and almost uncontroversial.

Note that this simplified process would not eliminate the incentives to get major, verifiable qualifications like college degrees or years of work experience. In fact, it would focus the incentives so that only those things matter, instead of whatever idiosyncratic or even capricious preferences HR agents might have. There would be no more talk of “culture fit” or “feeling right for the job”, just: “What is their highest degree? How many years have they worked in this industry?” I suppose this is credentialism, but in a world of asymmetric information, I think credentialism may be our only viable alternative to nepotism.

Of course, it’s too late for me. But perhaps future generations may benefit from this wisdom.

The straw that broke the camel’s back

Oct 18 JDN 2459141

You’ve probably heard the saying before: “It was the straw that broke the camel’s back.” Something has been building up for a long time, with no apparent effect; then suddenly it crosses some kind of threshold and the effect becomes enormous.

Some real-world systems do behave like this: Avalanches, for instance. There is a very sharp critical threshold at which snow suddenly becomes unstable and triggers an avalanche.

This is how weight works in many video games, and it seems ridiculous: In Skyrim, for instance, one 1-pound cheese wheel can mean the difference between being able to function normally and being unable to move. Fear not, however: You can simply eat that cheese wheel and then be on your way.

But most real-world systems aren’t like this. In particular, camels are not. Yes, zero pieces of straw will not break a camel’s back, and some quantity of straw will. No, there is not a well-defined threshold at which adding just one piece of straw will kill the camel. This is one of those times where formal mathematical modeling can help us to see things that we otherwise couldn’t.

If this seems too frivolous, consider that this model need not be about camels: It could be about the weight a bridge can hold, or the amount of pollution a region can sustain, or the amount of psychological stress a person can bear. I think applying it to psychological stress is particularly appropriate at the moment: COVID-19 has suddenly thrust us all above our usual level of stress, and it’s important to understand where our limits lie.

A really strict formal model useful for engineering purposes would be a stress-strain curve, showing the relationship between stress (the amount of force applied) and strain (the amount of deformation of the object). But for this purpose there are basically two regimes to consider:

Below some weight y (the yield strength)the camel’s back will compress under the weight, but once the weight is removed it will return to normal. A healthy camel can carry up to y in straw essentially indefinitely.

Above that point, additional weight will begin to strain the camel’s back. But this damage will not all occur at once; a larger amount of weight for a shorter time will have the same effect as a smaller amount of weight for a longer time.

The total strain on the camel will thus look something like this, for exposure time t: (w-y)t

There is a total amount of strain that the camel can take without breaking its back. This has units of momentum, so I’m going to use p.

What is the amount of straw that breaks the camel’s back? Well, that depends on how long it is there!

w = p/t + y

This implies that even an arbitrarily large weight is survivable, if experienced for a sufficiently small amount of time. This may seem counter-intuitive, but it’s actually quite realistic: I’m not aware of any tests on camels, but human beings have been able to survive impacts of 40 g for a few milliseconds.

If you are hoping to carry a certain load of straw by camel over a certain distance, and need to know how many camels to use (or how many trips to take), you would figure out how long it takes to cover that distance, then use that as your time parameter to figure out the maximum weight a camel could carry for that long.

So what would happen if you actually added one piece of straw at a time to a camel’s back? That depends on how fast you add them and how long you leave them there!