…The more rightward-skewed the distribution is, whether Pareto-Levy, log normal, or some related form, the more difficult it is to hedge against risk by supporting sizable portfolios of innovation projects. The potential variability of economic outcomes with Pareto-Levy distributions is so great that large portfolio draws from year to year can have consequences for the macroeconomy.1
We live in a Normal world. Most phenomenon have a central tendency, and things that are not average tend not to be too far from average. Almost all people are within a fairly narrow range of heights; there are a few outliers, but only a few. If you are meeting an American man for the first time, you are confident that most of the time he will be between the heights of 5'7″ and 6'1″, almost all of the time he will be between 5'4″ and 6'4″, and pretty surprised if he is shorter than 5'1″ or taller than 6'7″. Height in a population is normally distributed, it follows a bell curve. US adult men have an average height of 5'10" with a standard deviation of 3″. If you're looking for men taller than 7'4″, you'll find perhaps one for every billion people. A normal curve drops off sharply as you move away from its mean.
Normal distributions are normal because they are everywhere. And they are everywhere because of a mathematical property called the Central Limit Theorem: a large number of independent, random inputs that all feed into a single outcome results in a normal distribution, regardless of the individual distributions of the inputs2 So something like height, determined by a large number of different randomly distributed factors, ends up normally distributed.
Normal distributions are well-understood, and easy to work with. Almost all of modern finance theory is built around the assumption that things like prices and returns are normally distributed (or lognormally distributed: a lognormal distribution becomes a normal distribution if you take the logarithm of the x-axis, useful when an increase in x is multiplicative rather than arithmetic.) Normal distributions underlie insurance and allow investors to minimize risk using modern portfolio theory
But not everything is normally distributed; other processes can lead to other distributions. Earthquakes do not have a typical size: there is no central tendency. Cities do not have a typical size, wars do not have a typical intensity. These things are power-law distributed.
A power law distribution is a curve that looks like this:
Small outcomes are most likely, and large outcomes less likely. The formula for the line is: p(x)=Cx−α, where α (alpha)3 defines the shape of the power law and C is a normalization constant to make the total area under the curve sum to 14.
Power laws have a property that normal distributions do not: they have "fat tails." Normal curves fall off much more quickly the further out the x-axis you get.
Here's some detail on the tail, so you can see the faster drop-off of the normal distribution.
The most important thing about a power law distribution is the alpha. The smaller the alpha, the heavier the right tail of the curve is.
Here is some detail on the tails, so you can see more clearly that lower alphas mean a heavier tail.
Some phenomenon thought to follow power laws, and their alphas:
Intensity of wars | 1.805 |
Solar flare intensity | 1.835 |
Frequency of use of words | 2.205 |
Population of U.S. cities | 2.305 |
Magnitude of earthquakes | 3.045 |
Protein interaction degree | 3.16 |
Email address book size | 3.56 |
Sales of books | 3.76 |
Papers authored | 4.36 |
Are Venture Capital Returns Power-Law Distributed?
The professional innovation community takes it as a given that venture returns are power-law distributed. In Peter Thiel's class at Stanford he said "…actual returns are incredibly skewed. The more a VC understands this skew pattern, the better the VC. Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution."
Not everyone agrees7. Power law distributions can be hard to distinguish from the tail of lognormal distributions or from a distribution built out of several exponential distributions. People fit the data to all of these.
Lognormal and normal distributions are so widespread they may seem universal (they are also well studied and easier to work with, so generally the path of least resistance), and many theoreticians prefer them to the relative novelty of the power law. Power law proponents, on the other hand, liken the effort to devise non-intuitive, ad-hoc distributions to fit the data to Ptolemaic astronomy. Benoit Mandelbrot, of fractal fame, in a comment on an academic paper that fitted a multi-part exponential to financial data better fit by a power law8, wrote "An acknowledged feature of financial prices is that, compared to Gaussianity and independence or Markovian behaviour, their variability is extremely 'anomalous'…power laws acknowledge that the anomaly extends, at least, to the scale of centuries. If so, everything is simplified at no cost by using a model that implies that the anomaly extends forever." He concluded "Power-law behaviours exemplify a 'wildly random' phenomenon. They do not go away by only looking at them through hasty and ad hoc approximations that exemplify 'mild randomness' and underestimate the difficulty and the conceptual novelty of the field."
But pretending power law behavior is a variant of normal ignores the reality of extreme outcomes.
The professors who live by the bell curve adopted it for mathematical convenience, not realism. It asserts that when you measure the world, the numbers that result hover around the mediocre; big departures from the mean are so rare that their effect is negligible. This focus on averages works well with everyday physical variables such as height and weight, but not when it comes to finance. One can disregard the odds of a person's being miles tall or tons heavy, but similarly excessive observations can never be ruled out in economic life…In other words, we live in a world of winner-take-all extreme concentration. Similarly, a very small number of days accounts for the bulk of stock market movements: Just ten trading days can represent half the returns of a decade.
The economic world is driven primarily by random jumps. Yet the common tools of finance were designed for random walks in which the market always moves in baby steps. Despite increasing empirical evidence that concentration and jumps better characterize market reality, the reliance on the random walk, the bell-shaped curve, and their spawn of alphas and betas is accelerating, widening a tragic gap between reality and the standard tools of financial measurement9.
Venture capital returns are not normal. Most investments return a small multiple or lose money, but many return larger multiples. And a few have had outcomes well outside what any normal curve would encompass. Venture returns are best described by a power law distribution.
What Would Cause Returns to be Power Law Distributed?
Here is a simple model of company growth and exit10 that creates a power law distribution of venture capital return multiples (i.e. 1x, 2x, 3x, etc.) It is only meant to hold for returns of more than 111-because no matter what the distribution of actual company outcomes looks like, preference provisions standard in VC contracts distort the part below 1x12.
At time 0 a company's value as a multiple of its initial value is 1. Value then grows continuously at a rate13 g so at time T, the company's value multiple is X=egT.
Time to exit is exponentially distributed, with an average time to exit of i. So the probability of exit at time t is p(T=t)=1ie−t/i.
Taking these two together, the probability of an exit value of x is:
p(X=x)=1gix−(1gi+1)
This is a power law distribution with α=1/gi+1.
OK, math aside: this model says that a power law return can be generated by a simple mechanism that relies only on growth and time to exit. It also says that as the product of growth and expected time to exit gets larger, the tail of the power law distribution gets fatter.
Here's a chart showing the model's prediction of alpha at various average times to exit and year over year growth rates14
Our model does generate a power law distribution of returns. But is it intuitively reasonable? It is simplistic15. It glosses over details that every practitioner knows: growth rates slow as companies get larger, exits cluster around raises, etc. But sometimes simple models can contain a good part of the explanatory power of more complicated models. Exponential growth is reasonable, at least during the time-frame of a venture investment, and an exponential time to exit distribution seems reasonable16 A better model could be built, but this one's a good first order approximation.
Venture Capital Power Law Distributions
Let's plug in some rules of thumb and see what the model predicts.
Venture capitalists hold investments for an average of 4 years. They expect year over year growth of about 30%, meaning a continuously compounded growth rate of 26%. With these the model gives us an alpha of (1/(.26 * 4)) + 1 = 1.96. How does this compare to the real world?
Below are estimates of venture capital power law alphas from various sources. Note that some of them look at the distribution of returns, some look at the distribution of values, and some look at the distribution of revenues. I believe the alpha in all approaches should be the same, if we assume reasonable initial valuations and that value tends to track revenue in most companies17. I calculated the italicized alphas myself using thin data found in various summary charts; I did not have access to the underlying data. You should assume that the ones I calculated are less precise (on the order of +/- 0.1 to 0.2.)
Sorted by alpha from smallest to largest:
Return multiples, fund size <$100m | 1.6818 |
Total Value to Paid In, Small Funds ($50m-$250m), 1981-2003 | 1.7519 |
PSED Study, revenue growth yr 2 to 5 | 1.7620 |
Total Value to Paid In, Large Funds (>$250m), 1981-2003 | 1.7821 |
Kauffman Study, revenue growth yr 2 to 5 | 1.822 |
North American angel investment returns | 1.823 |
Return multiples, fund size $250m-$500m | 1.8424 |
Return multiples, fund size $100m-$250m | 1.8524 |
Inc 500, revenue growth year 2 to 5 | 1.8625 |
Derived from Correlation Ventures data | 1.8826 |
Return multiples, fund size > $1b | 1.8927 |
All VC-backed startups, per Horsley-Keogh | 1.928 |
All VC-backed startups, per Venture Economics | 1.9728 |
British angel investment returns | 1.9729 |
Unicorn valuations | 2.1330 |
Return multiples, fund size $500m-$1b | 2.2731 |
As per our model, these cluster around the value of 1.96 we calculated.
Some other things to note:
Some alphas for non-VC innovative activity
Value of patents | 1.332 |
U.S. Patents | 1.4333 |
Value of patents | 1.45-1.6734 |
Harvard Patents | 1.7135 |
German Patents | 1.8735 |
Size of all U.S. Firms | 2.0636 |
Corporate R&D (simulation from sparse data) | 2.2137 |
Pharmaceutical development-1970s | 2.2237 |
Size of Largest 500 US Firms | 2.2538 |
Pharmaceutical development-1980s | 2.3639 |
Movies with stars | 2.7240 |
Movie income | 2.9140 |
Movies without stars | 3.2640 |
Infinite Mean
Power Law distributions must have an alpha greater than one41. They do not have a standard deviation if alpha is less than three. They do not have an average if alpha is less than two.
What does not having an average mean? Think about a normal distribution: if you make a large number of picks from a normal distribution, the average will be right in the middle of the distribution. If you measure the height of 100,000 U.S. men, the average will be 5'10", I guarantee it. The more picks you make, the closer the average of your picks gets to the average of the normal distribution.
If you do the same thing with a power law distribution with an α<2, the average will tend to grow as you make more picks. If you make an infinite number of picks, the average will be infinite. This is strange behavior42. It is also what makes power laws so hard to intuit.
The largest value you are likely to get from a power law distribution depends on the number of picks you take from it43: <xmax>∼n1/(α-1).
The graph to the left shows <xmax> for various alphas, given a certain number of picks. When α=2, then the mean value of the largest pick is n. That is, if you invest in 10 companies, the likeliest largest multiple is 10x. When α<2 then the mean value of the largest pick is greater than n. In other words, if alpha is less than or equal to 2, one company is likely to return the entire amount invested in all of the successful companies. With some luck, it returns the fund.
Of course, when alpha is larger than 2, the mean value of the largest pick is much smaller. When alpha equals 3, for instance, <xmax> grows as the square root of the number of picks.
The average of all the picks grows quickly as alpha gets smaller. In our model, where we have an expected growth rate, g, and an average time to exit, i, it would make sense to expect an average return multiple on a given company of m=egi (I'll call this deterministic growth.) If venture capital were normal, that would be true. But the mean of a power law distribution is (α-1)/(α−2). Using our model's result of α=1/gi+1 to substitute into the power law mean formula, we can compare the deterministic mean to the power law mean.
It's no surprise that these are similar when gi is close to zero (equivalent to a high alpha.) But the power law mean grows much more quickly than the deterministic mean as growth rates get larger. If the average time to exit is four years, then at a growth rate of 20%, the power law mean is more than twice the deterministic mean. This upside surprise is what draws investors to low alpha power laws. But this strategy comes with risk.
In his book The Black Swan, Taleb warns against the financial sector using risk measurement tools like VAR and Black-Scholes that were built on an expectation of normal or lognormal returns. Normal and lognormal distributions give too little weight to the fat tails of many of the actual financial sector probability distributions. The chance that something that seems unlikely in the Normal world (what Taleb calls "Mediocristan") is actually not that improbable in a power law world ("Extremistan") can result in what looks in hindsight like reckless behavior. Taleb ascribes the failure of Long Term Capital Management to this, brought down by events so many standard deviations away from the mean that it would have been safe to ignore them in a Normal world. In a power law world ignoring them meant economy-shaking losses.
If the public financial markets are Extremistan, then venture capital is Absurdistan. The fat tails in the public markets lead to black swans, but they're nowhere near as fat as the tails in venture capital.
Alphas Close to Two
Betting against a power law return (as at LTCM) can cause some nasty surprises, but going long on a fat tail is a good bet, so long as you can make enough investments and be patient enough to find the rare anomaly. Sure, you sacrifice predictability, and that's an issue for the investors in your fund. But once you've gone under two, why not keep going? The fatter the tail, the higher the probability of outsize events. Once you've sacrificed predictability, you're in for a penny. Why not be in for a pound? Why do the VC alphas cluster so closely around 2, the alpha where the mean goes to infinity? Why not even lower?
One reason is timing. If VCs have a 10-year fund life and they invest in the first two or three years, they have seven or eight years to realize gains. If exits are distributed exponentially, then if VCs want to exit 80% of their investments within eight years of making them, they need to have an average time to exit of about 5 years. If they want to exit 90%, they need an average time to exit of about 3.5 years44. This means that investing in patents-with an alpha somewhere between 1.3 and 1.7-is out, it would take too long to realize the investment.
This points to the real problem: look at the chart a few pages ago of year/year growth as a function of time to exit. For a given alpha, a shorter time to exit requires a larger growth rate. If it takes 20 years to exit a patent (alpha = 1.5) it implies a year over year growth rate in value of about 10%. If you wanted to exit in five years you would need a year over year growth rate of closer to 50%. To get to an alpha close to 2, as in venture capital, with an average time to exit of 5 years, the year over year growth rate of the portfolio companies needs to be 22%. For a time to exit of 3.5 years, the growth rate needs to be 33%.
These are high growth rates, and if the best VCs are the ones who can maintain the lowest alphas45then they are the ones who have the highest growth rates in their portfolios.
At a given alpha, the more investments you make, the better, because your mean return multiple increases with the number of investments, as does the likeliest highest multiple. Dave McClure makes this case:
Most VC funds are far too concentrated in a small number (<20-40) of companies. The industry would be better served by doubling or tripling the average # of investments in a portfolio, particularly for early-stage investors where startup attrition is even greater. If unicorns happen only 1-2% of the time, it logically follows that portfolio size should include a minimum of 50-100+ companies in order to have a reasonable shot at capturing these elusive and mythical creatures.
Peter Thiel flatly contradicts this:
Given a big power law distribution, you want to be fairly concentrated. If you invest in 100 companies to try and cover your bases through volume, there's probably sloppy thinking somewhere. There just aren't that many businesses that you can have the requisite high degree of conviction about.
McClure believes he can find hundreds of companies with high enough growth to maintain his requisite alpha. Thiel thinks this is not possible. Venture capitalists have always faced this tension: the average growth rate of all small businesses in the US is closer to 7.5% than 30%. The pool of companies that can grow fast enough is limited. How many companies can you find that will grow fast enough, knowing that when you're wrong about the growth rate, you're probably wildly wrong?
But why 2?
A lower alpha is better, but getting a lower alpha is constrained by finding enough companies who can generate the required amount of growth in the time a VC has to go through a cycle of investing and exiting. But it seems a bit coincidental that these things balance out so close to the point where the power law distribution mean goes infinite.
The best explanation is supply and demand. When alphas of less than two are available-the supply of fast-growth companies has increased-venture capitalists have an incentive to make more investments, so they raise more money and start more funds, increasing the demand for these companies until the alpha returns to 246.
Unresolved Questions
1. Failure rates
Chris Dixon notes that better fund returns-implying a fatter tail-are tied to more failures-implying a fatter head. This is not power law distribution behavior. The area under a power law distribution sums to one, so if the tail gets fatter, the rest of the distribution gets thinner, including the head. Look at the fourth chart in this post to see this.
But the model we are using applies only only to companies with a return multiple of more than 1, only those that succeed. It is not clear whether failure rates should follow the implied power law that drives the returns power law distribution.
Picking growth rates is an inherently uncertain process and the venture capitalist is likely to be wrong. Our model, ironically, assumes that when picking growth rates there is a central tendency and that errors cancel each other out-that the growth rates are 'normally' distributed around the picked growth rate. There is no evidence this is true. An alternative, one that many practitioners subscribe to, is that companies that do not achieve their targeted growth rates simply fail. While this behavior does not seem to fit with any model of underlying firm growth47 it could arise from the staged-funding model of venture capital: companies that underperform compared to expectation can not raise further funds and go out of business. Measuring the relationship between alpha and failure rate would help shed some light on this.
2. Growth
In our model, varying amounts of VC imply varying distributions of growth rates of early-stage companies. Work on the distribution of growth rates has been focused on growth in firm size (measured by revenues, employees, or the like), not on firm value48. The distribution of firm size growth seems relatively stable over time. If this is true then any increase (or decrease) in venture capital funding is due to anticipated growth (decline) in the ratio of firm value to firm size. This, on the one hand, seems obvious. But, on the other, it seems not to account for new industry creation. A time study of the evolution of firm growth rate distributions in an emerging industry would lead to useful predictions of money available for venture capital.
3. How much is a power law option worth?
Early-stage venture capital valuations are higher than standard finance theory would predict. No reasonable discounted cash flow model would assign $5m+ valuations to a person with a bright idea. When asked, VCs often cite the 'option value' of the investment.
Black-Scholes, the standard option pricing model, was built on the assumption that prices move normally (and specifically, that there is a finite variance to the underlying asset's return distribution.) While option formulas assuming non-normal distributions have been proposed49, there seems to be no work connecting startup valuations to the pricing of options on power law outcomes. While a theory like this would probably not influence actual VC valuations, it would be valuable in the debate over how much we should spend on R&D. My guess is that the rational amount to spend is quite a bit larger than the amount we spend today.
4. Is it really a power law distribution?
What does an infinite mean imply? The quote that started this post said "The potential variability of economic outcomes…is so great that large portfolio draws from year to year can have consequences for the macroeconomy." If returns are power-law distributed up to very high multiples (and I have not seen any data suggesting a tail-off, a la earthquakes) then this is undoubtedly true. If you think of value as economic value, not dollar value, then there is perhaps no limit to the largest multiplier possible. One of the consequences of more companies being funded today-if the industry is maintaining an alpha less than two-is the increased probability that we will see something so far outside Mediocristan, so far along the fat tail, that it will fundamentally change how we live.