Why “marginal productivity” is no excuse for inequality

May 28, JDN 2457902

In most neoclassical models, workers are paid according to their marginal productivity—the additional (market) value of goods that a firm is able to produce by hiring that worker. This is often used as an excuse for inequality: If someone can produce more, why shouldn’t they be paid more?

The most extreme example of this is people like Maura Pennington writing for Forbes about how poor people just need to get off their butts and “do something”; but there is a whole literature in mainstream economics, particularly “optimal tax theory”, arguing based on marginal productivity that we should tax the very richest people the least and never tax capital income. The Chamley-Judd Theorem famously “shows” (by making heroic assumptions) that taxing capital just makes everyone worse off because it reduces everyone’s productivity.

The biggest reason this is wrong is that there are many, many reasons why someone would have a higher income without being any more productive. They could inherit wealth from their ancestors and get a return on that wealth; they could have a monopoly or some other form of market power; they could use bribery and corruption to tilt government policy in their favor. Indeed, most of the top 0.01% do literally all of these things.

But even if you assume that pay is related to productivity in competitive markets, the argument is not nearly as strong as it may at first appear. Here I have a simple little model to illustrate this.

Suppose there are 10 firms and 10 workers. Suppose that firm 1 has 1 unit of effective capital (capital adjusted for productivity), firm 2 has 2 units, and so on up to firm 10 which has 10 units. And suppose that worker 1 has 1 unit of so-called “human capital”, representing their overall level of skills and education, worker 2 has 2 units, and so on up to worker 10 with 10 units. Suppose each firm only needs one worker, so this is a matching problem.

Furthermore, suppose that productivity is equal to capital times human capital: That is, if firm 2 hired worker 7, they would make 2*7 = $14 of output.

What will happen in this market if it converges to equilibrium?

Well, first of all, the most productive firm is going to hire the most productive worker—so firm 10 will hire worker 10 and produce $100 of output. What wage will they pay? Well, they need a wage that is high enough to keep worker 10 from trying to go elsewhere. They should therefore pay a wage of $90—the next-highest firm productivity times the worker’s productivity. That’s the highest wage any other firm could credibly offer; so if they pay this wage, worker 10 will not have any reason to leave.

Now the problem has been reduced to matching 9 firms to 9 workers. Firm 9 will hire worker 9, making $81 of output, and paying $72 in wages.

And so on, until worker 1 at firm 1 produces $1 and receives… $0. Because there is no way for worker 1 to threaten to leave, in this model they actually get nothing. If I assume there’s some sort of social welfare system providing say $0.50, then at least worker 1 can get that $0.50 by threatening to leave and go on welfare. (This, by the way, is probably the real reason firms hate social welfare spending; it gives their workers more bargaining power and raises wages.) Or maybe they have to pay that $0.50 just to keep the worker from starving to death.

What does inequality look like in this society?
Well, the most-productive firm only has 10 times as much capital as the least-productive firm, and the most-educated worker only has 10 times as much skill as the least-educated worker, so we might think that incomes would vary only by a factor of 10.

But in fact they vary by a factor of over 100.

The richest worker makes $90, while the poorest worker makes $0.50. That’s a ratio of 180. (Still lower than the ratio of the average CEO to their average employee in the US, by the way.) The worker is 10 times as productive, but they receive 180 times as much income.

The firm profits vary along a more reasonable scale in this case; firm 1 makes a profit of $0.50 while firm 10 makes a profit of $10. Indeed, except for firm 1, firm n always makes a profit of $n. So that’s very nearly a linear scaling in productivity.

Where did this result come from? Why is it so different from the usual assumptions? All I did was change one thing: I allowed for increasing returns to scale.

If you make the usual assumption of constant returns to scale, this result can’t happen. Multiplying all the inputs by 10 should just multiply the output by 10, by assumption—since that is the definition of constant returns to scale.

But if you look at the structure of real-world incomes, it’s pretty obvious that we don’t have constant returns to scale.

If we had constant returns to scale, we should expect that wages for the same person should only vary slightly if that person were to work in different places. In particular, to have a 2-fold increase in wage for the same worker you’d need more than a 2-fold increase in capital.

This is a bit counter-intuitive, so let me explain a bit further. If a 2-fold increase in capital results in a 2-fold increase in wage for a given worker, that’s increasing returns to scale—indeed, it’s precisely the production function I assumed above.
If you had constant returns to scale, a 2-fold increase in wage would require something like an 8-fold increase in capital. This is because you should get a 2-fold increase in total production by doubling everything—capital, labor, human capital, whatever else. So doubling capital by itself should produce a much weaker effect. For technical reasons I’d rather not get into at the moment, usually it’s assumed that production is approximately proportional to capital to the one-third power—so to double production you need to multiply capital by 2^3 = 8.

I wasn’t able to quickly find really good data on wages for the same workers across different countries, but this should at least give a rough idea. In Mumbai, the minimum monthly wage for a full-time worker is about $80. In Shanghai, it is about $250. If you multiply out the US federal minimum wage of $7.25 per hour by 40 hours by 4 weeks, that comes to $1160 per month.

Of course, these are not the same workers. Even an “unskilled” worker in the US has a lot more education and training than a minimum-wage worker in India or China. But it’s not that much more. Maybe if we normalize India to 1, China is 3 and the US is 10.

Likewise, these are not the same jobs. Even a minimum wage job in the US is much more capital-intensive and uses much higher technology than most jobs in India or China. But it’s not that much more. Again let’s say India is 1, China is 3 and the US is 10.

If we had constant returns to scale, what should the wages be? Well, for India at productivity 1, the wage is $80. So for China at productivity 3, the wage should be $240—it’s actually $250, close enough for this rough approximation. But the US wage should be $800—and it is in fact $1160, 45% larger than we would expect by constant returns to scale.

Let’s try comparing within a particular industry, where the differences in skill and technology should be far smaller. The median salary for a software engineer in India is about 430,000 INR, which comes to about $6,700. If that sounds rather low for a software engineer, you’re probably more accustomed to the figure for US software engineers, which is $74,000. That is a factor of 11 to 1. For the same job. Maybe US software engineers are better than Indian software engineers—but are they that much better? Yes, you can adjust for purchasing power and shrink the gap: Prices in the US are about 4 times as high as those in India, so the real gap might be 3 to 1. But these huge price differences themselves need to be explained somehow, and even 3 to 1 for the same job in the same industry is still probably too large to explain by differences in either capital or education, unless you allow for increasing returns to scale.

In most industries, we probably don’t have quite as much increasing returns to scale as I assumed in my simple model. Workers in the US don’t make 100 times as much as workers in India, despite plausibly having both 10 times as much physical capital and 10 times as much human capital.

But in some industries, this model might not even be enough! The most successful authors and filmmakers, for example, make literally thousands of times as much money as the average author or filmmaker in their own country. J.K. Rowling has almost $1 billion from writing the Harry Potter series; this is despite having literally the same amount of physical capital and probably not much more human capital than the average author in the UK who makes only about 11,000 GBP—which is about $14,000. Harry Potter and the Philosopher’s Stone is now almost exactly 20 years old, which means that Rowling made an average of $50 million per year, some 3500 times as much as the average British author. Is she better than the average British author? Sure. Is she three thousand times better? I don’t think so. And we can’t even make the argument that she has more capital and technology to work with, because she doesn’t! They’re typing on the same laptops and using the same printing presses. Either the return on human capital for British authors is astronomical, or something other than marginal productivity is at work here—and either way, we don’t have anything close to constant returns to scale.

What can we take away from this? Well, if we don’t have constant returns to scale, then even if wage rates are proportional to marginal productivity, they aren’t proportional to the component of marginal productivity that you yourself bring. The same software developer makes more at Microsoft than at some Indian software company, the same doctor makes more at a US hospital than a hospital in China, the same college professor makes more at Harvard than at a community college, and J.K. Rowling makes three thousand times as much as the average British author—therefore we can’t speak of marginal productivity as inhering in you as an individual. It is an emergent property of a production process that includes you as a part. So even if you’re entirely being paid according to “your” productivity, it’s not really your productivity—it’s the productivity of the production process you’re involved in. A myriad of other factors had to snap into place to make your productivity what it is, most of which you had no control over. So in what sense, then, can we say you earned your higher pay?

Moreover, this problem becomes most acute precisely when incomes diverge the most. The differential in wages between two welders at the same auto plant may well be largely due to their relative skill at welding. But there’s absolutely no way that the top athletes, authors, filmmakers, CEOs, or hedge fund managers could possibly make the incomes they do by being individually that much more productive.

Tax incidence revisited, part 2: How taxes affect prices

JDN 2457341

One of the most important aspects of taxation is also one of the most counter-intuitive and (relatedly) least-understood: Taxes are not externally applied to pre-existing exchanges of money. Taxes endogenously interact with the system of prices, changing what the prices will be and then taking a portion of the money exchanged.

The price of something “before taxes” is not actually the price you would pay for it if there had been no taxes on it. Your “pre-tax income” is not actually the income you would have had if there were no income or payroll taxes.

The most obvious case to consider is that of government employees: If there were no taxes, public school teachers could not exist, so the “pre-tax income” of a public school teacher is a meaningless quantity. You don’t “take taxes out” of a government salary; you decide how much money the government employee will actually receive, and then at the same time allocate a certain amount into other budgets based on the tax code—a certain amount into the state general fund, a certain amount into the Social Security Trust Fund, and so on. These two actions could in principle be done completely separately; instead of saying that a teacher has a “pre-tax salary” of $50,000 and is taxed 20%, you could simply say that the teacher receives $40,000 and pay $10,000 into the appropriate other budgets.

In fact, when there is a conflict of international jurisdiction this is sometimes literally what we do. Employees of the World Bank are given immunity from all income and payroll taxes (effectively, diplomatic immunity, though this is not usually how we use the term) based on international law, except for US citizens, who have their taxes paid for them by the World Bank. As a result, all World Bank salaries are quoted “after-tax”, that is, the actual amount of money employees will receive in their paychecks. As a result, a $120,000 salary at the World Bank is considerably higher than a $120,000 salary at Goldman Sachs; the latter would only (“only”) pay about $96,000 in real terms.

For private-sector salaries, it’s not as obvious, but it’s still true. There is actually someone who pays that “before-tax” salary—namely, the employer. “Pre-tax” salaries are actually a measure of labor expenditure (sometimes erroneously called “labor costs”, even by economists—but a true labor cost is the amount of effort, discomfort, stress, and opportunity cost involved in doing labor; it’s an amount of utility, not an amount of money). The salary “before tax” is the amount of money that the employer has to come up with in order to pay their payroll. It is a real amount of money being exchanged, divided between the employee and the government.

The key thing to realize is that salaries are not set in a vacuum. There are various economic (and political) pressures which drive employers to set different salaries. In the real world, there are all sorts of pressures that affect salaries: labor unions, regulations, racist and sexist biases, nepotism, psychological heuristics, employees with different levels of bargaining skill, employers with different concepts of fairness or levels of generosity, corporate boards concerned about public relations, shareholder activism, and so on.

But even if we abstract away from all that for a moment and just look at the fundamental economics, assuming that salaries are set at the price the market will bear, that price depends upon the tax system.

This is because taxes effectively drive a wedge between supply and demand.

Indeed, on a graph, it actually looks like a wedge, as you’ll see in a moment.

Let’s pretend that we’re in a perfectly competitive market. Everyone is completely rational, we all have perfect information, and nobody has any power to manipulate the market. We’ll even assume that we are dealing with hourly wages and we can freely choose the number of hours worked. (This is silly, of course; but removing this complexity helps to clarify the concept and doesn’t change the basic result that prices depend upon taxes.)

We’ll have a supply curve, which is a graph of the minimum price the worker is willing to accept for each hour in order to work a given number of hours. We generally assume that the supply curve slopes upward, meaning that people are willing to work more hours if you offer them a higher wage for each hour. The idea is that it gets progressively harder to find the time—it eats into more and more important alternative activities. (This is in fact a gross oversimplification, but it’ll do for now. In the real world, labor is the one thing for which the supply curve frequently bends backward.)


We’ll also have a demand curve, which is a graph of the maximum price the employer is willing to pay for each hour, if the employee works that many hours. We generally assume that the demand curve slopes downward, meaning that the employer is willing to pay less for each hour if the employee works more hours. The reason is that most activities have diminishing marginal returns, so each extra hour of work generally produces less output than the previous hour, and is therefore not worth paying as much for. (This too is an oversimplification, as I discussed previously in my post on the Law of Demand.)


Put these two together, and in a competitive market the price will be set at the point at which supply is equal to demand, so that the very last hour of work was worth exactly what the employer paid for it. That last hour is just barely worth it to the employer, and just barely worth it to the worker; any additional time would either be too expensive for the employer or not lucrative enough for the worker. But for all the previous hours, the value to the employer is higher than the wage, and the cost to the worker is lower than the wage. As a result, both the employer and the worker benefit.


But now, suppose we implement a tax. For concreteness, suppose the previous market-clearing wage was $20 per hour, the worker was working 40 hours, and the tax is 20%. If the employer still offers a wage of $20 for 40 hours of work, the worker is no longer going to accept it, because they will only receive $16 per hour after taxes, and $16 isn’t enough for them to be willing to work 40 hours. The worker could ask for a pre-tax wage of $25 so that the after-tax wage would be $20, but then the employer will balk, because $25 per hour is too expensive for 40 hours of work.

In order to restore the balance (and when we say “equilibrium”, that’s really all we mean—balance), the employer will need to offer a higher pre-tax wage, which means they will demand fewer hours of work. The worker will then be willing to accept a lower after-tax wage for those reduced hours.

In effect, there are now two prices at work: A supply price, the after-tax wage that the worker receives, which must be at or above the supply curve; and a demand price, the pre-tax wage that the employer pays, which must be at or below the demand curve. The difference between those two prices is the tax.


In this case, I’ve set it up so that the pre-tax wage is $22.50, the after-tax wage is $18, and the amount of the tax is $4.50 or 20% of $22.50. In order for both the employer and the worker to accept those prices, the amount of hours worked has been reduced to 35.

As a result of the tax, the wage that we’ve been calling “pre-tax” is actually higher than the wage that the worker would have received if the tax had not existed. This is a general phenomenon; it’s almost always true that your “pre-tax” wage or salary overestimates what you would have actually gotten if the tax had not existed. In one extreme case, it might actually be the same; in another extreme case, your after-tax wage is what you would have received and the “pre-tax” wage rises high enough to account for the entirety of the tax revenue. It’s not really “pre-tax” at all; it’s the after-tax demand price.

Because of this, it’s fundamentally wrongheaded for people to complain that taxes are “taking your hard-earned money”. In all but the most exceptional cases, that “pre-tax” salary that’s being deducted from would never have existed. It’s more of an accounting construct than anything else, or like I said before a measure of labor expenditure. It is generally true that your after-tax salary is lower than the salary you would have gotten without the tax, but the difference is generally much smaller than the amount of the tax that you see deducted. In this case, the worker would see $4.50 per hour deducted from their wage, but in fact they are only down $2 per hour from where they would have been without the tax. And of course, none of this includes the benefits of the tax, which in many cases actually far exceed the costs; if we extended the example, it wouldn’t be hard to devise a scenario in which the worker who had their wage income reduced received an even larger benefit in the form of some public good such as national defense or infrastructure.