JDN 2457471
There is a well-known principle in social science called wisdom of the crowd, popularized in a book called The Wisdom of Crowds by James Surowiecki. It basically says that a group of people who aggregate their opinions can be more accurate than any individual opinion, even that of an expert; it is one of the fundamental justifications for democracy and free markets.
It is also often used to justify what is called the efficient market hypothesis, which in its weak form is approximately true (financial markets are unpredictable, unless you’ve got inside information or really good tools), but in its strong form is absolutely ludicrous (no, financial markets do not accurately reflect the most rational expectation of future outcomes in the real economy).
This post is about what the wisdom of the crowd actually does—and does not—say, and why it fails to justify the efficient market hypothesis even in its weak form.
The wisdom of the crowd says that when a group of people with a moderate level of accuracy all get together average their predictions, the resulting estimate is better, on average, than what they came up with individually. A group of people who all “sort of” know something can get together and create a prediction that is much better than any one of them could come up with.
This can actually be articulated as a mathematical theorem, the diversity prediction theorem:
(If you want to see the full equation, you can render the LaTeX here.)
Collective error = average individual error – prediction diversity
(\bar{x} – \mu)^2 = \frac{1}{n} \sum (x – \mu)^2 – \frac{1}{n} \sum (x – \bar{x})^2
This is a mathematical theorem; it’s beyond dispute. By the definition of the sample mean, this equation holds.
But in applying it, we must be careful; it doesn’t simply say that adding diversity will improve our predictions. Adding diversity will improve our predictions provided that we don’t increase average individual error too much.
Here, I’ll give some examples. Suppose we are guessing the weight of a Smart car. Person A says 1500 pounds; person B says 3000 pounds. Suppose the true weight is 2000 pounds.
Our collective estimate is the average of 1500 and 3000, which is 2250. So it’s a bit high.
Suppose we add person C, who guesses the weight of the car as 1800 pounds. This is closer to the real value, so we’d expect our collective estimate to improve, and it does: It’s now 2100 pounds.
But where the theorem can be a bit counter-intuitive is that we can add someone who is not particularly accurate, and still improve the estimate: If we also add person D, who guessed 1400 pounds, this seems like it should make our estimate worse—but it does not. Our new estimate is now 1925 pounds, which is a bit closer to the truth than 2100—and furthermore better than any individual estimate.
However, the theorem does not say that adding someone new will always improve the estimate; if we add person E, who has no idea how cars work and says that the car must weigh 50 pounds, we throw off the estimate so that it is now 1550 pounds. If we add enough such people, we can make the entire estimate wildly inaccurate: Add four more copies of person E and our new estimate of the car’s weight is a mere 883 pounds.
In all cases the theorem holds, however. Let’s consider the case where adding person E ruined our otherwise very good estimate.
Before we added person E, we had four estimates:
A said 1500, B said 3000, C said 1800, and D said 1400.
Our collective estimate was 1925.
Thus, collective error is (1925 – 2000)^2 = 5625, uh, square pounds? (Variances often have weird units.)
The individual errors are, respectively:
A: (1500 – 2000)^2 = 250,000
B: (3000 – 2000)^2 = 1,000,000
C: (1800 – 2000)^2 = 40,000
D: (1400 – 2000)^2 = 360,000
Average individual error is 412,500. So our collective error is much smaller than our average individual error. The difference is accounted for by prediction diversity.
Prediction diversity is found as the squared distance between each individual estimate and the average estimate:
A: (1500 – 1925)^2 = 180,625
B: (3000 – 1925)^2 = 1,155,625
C: (1800 – 1925)^2 = 15,625
D: (1400 – 1925)^2 = 275,625
Thus, prediction diversity is the average of these, 406875. And sure enough, 412,500 – 406,875 = 5625.
When we add on the fifth estimate of 50 and repeat the process, here’s what we get:
The new collective estimate is 1550. The prediction diversity went way up; it’s now 888,000. But the average error rose even faster, so it is now 1,090,500. As a result, the collective error got a lot worse, and is now 202,500. So adding more people does not always improve your estimates, if those people have no idea what they’re doing.
When it comes to the stock market, most people have no idea what they’re doing. Even most financial experts can forecast the market no better than chance.
The wisdom of the crowd holds when most people can basically get it right; maybe their predictions are 75% accurate for binary choices, or within a factor of 2 for quantitative estimates, something like that. Then, each guess is decent, but not great; and by combining a lot of decent estimates we get one really good estimate.
Of course, the diversity prediction theorem does still apply: Most individual investors underperform the stock market as a whole, just as the theorem would say—average individual prediction is worse than collective prediction.
Moreover, stock prices do have something to do with fundamentals, because fundamental analysis does often work, contrary to most forms of the efficient market hypothesis. (It’s a very oddly named hypothesis, really; what’s “efficient” about a market that is totally unpredictable?)
But in order for stock prices to actually be a good measure of the real value of a company, most of the people buying and selling stock would have to be using fundamental analysis. In order for stocks to reflect real values, stock choices must be based on real values—that’s the only mechanism by which real values could ever enter the equation.
While there are definitely a lot of people who use fundamental analysis, it really doesn’t seem like there are enough. At least for short-run ups and downs, most decisions seem to be made on a casual form of technical analysis: “It’s going up! Buy!” or “It just went down! Buy!” (Yes, you hear both of those; the latter is closer to true for short-run fluctuations, but the real pattern is a bit more complicated than that.)
For the wisdom of the crowd to work, the estimates need to be independent—each person makes a reasonable guess on their own, then we average over all the guesses. When you do this for simple tasks like the weight of a car or the number of jellybeans in a jar, you get some really astonishingly accurate results. Even for harder tasks where people have a vague idea, like the number of visible stars in the sky, you can do pretty well. But if you let people talk about their answers, the aggregate guess often gets much worse, especially if there are no experts in the group. And we definitely talk about stocks an awful lot; one of the best sources for utterly meaningless post hoc statements in the world is the financial news section, which will always find some explanation for any market change, often tenuous at best, and then offer some sort of prediction for what will happen next which is almost always wrong.
This lack of independence fundamentally changes the system. The main thing that people consider when choosing which stocks to buy is which stocks other people are buying. This is called a Keynesian beauty contest; apparently these beauty contests used to be a thing in the 1930s, where you’d send in pictures of your baby and then people would vote on which baby was the cutest—but the key part in Keynes’s version is that you win money not based on whether your baby wins, but based on whether the baby you vote for wins. So you don’t necessarily vote for the one you think is cutest; you vote for the one you think other people will vote for, which is based on what they think other people will vote for, and so on. There are ways to make that infinite series converge, but there are also lots of cases where it diverges, and in reality what I think happens here is our brains max out and give up. (According to Dennett, we can handle about 7 layers of intentionality before our brains max out.)
A similar process is at work in the stock market, as well as with strategic voting—yet another reason why we should be designing our voting system to disincentivize strategic voting.
What we have then is a system with a feedback loop: We buy Apple because we buy Apple because we buy Apple. (Just as we use Facebook because we use Facebook because we use Facebook.)
Feedback loops can introduce chaotic behavior. Depending on the precise parameters involved, all of this guessing could turn out to converge to the real value of companies—or it could converge to something else entirely, or keep fluctuating all over the place indefinitely. Since the latter seems to be what happens, I think the real parameters are probably in that range of fluctuating instability. (I’ve actually programmed some simple computer models with parameters in that chaotic range, and they come out pretty darn close to the real behavior of stock markets—much better than the Black-Scholes model, for instance.) If you want a really in-depth analysis of the irrationality of financial markets, I highly recommend Robert Shiller, who after all won a Nobel for this sort of thing.
What does this mean for the efficient market hypothesis? That it’s basically a non-starter. We have no reason to believe that stock prices accurately integrate real fundamental information, and many reasons to think they do not. The unpredictability of stock prices could be just that—unpredictability, meaning that stock prices in the short run are simply random, and short-term trading is literally gambling. In the long run they seem to settle out into trends with some relation to fundamentals—but as Keynes said, in the long run we are all dead, and the market can remain irrational longer than you can remain solvent.