By Michael J. Wiener
Special to the Financial Independence Hub
How much can we spend from a portfolio each year in retirement? An early answer to this question came from William Bengen and became known as the 4% rule. Recently, Ben Felix reported on research showing that it’s more sensible to use a 2.7% rule.
Here, I examine how a seemingly minor detail, the size of the sampling blocks of stock and bond returns, affects the final conclusion of the safe withdrawal percentage. It turns out to make a significant difference. In my usual style, I will try to make my explanations understandable to non-specialists.
The research
Bengen’s original 4% rule was based on U.S. stock and bond returns for Americans retiring between 1926 and 1976. He determined that if these hypothetical retirees invested 50-75% in stocks and the rest in bonds, they could spend 4% of their portfolios in their first year of retirement and increase this dollar amount with inflation each year, and they wouldn’t run out of money within 30 years.
Researchers Anarkulova, Cederburg, O’Doherty, and Sias observed that U.S. markets were unusually good in the 20th century, and that foreign markets didn’t fare as well. Further, there is no reason to believe that U.S. markets will continue to perform as well in the future. They also observed that people often live longer in retirement than 30 years.
Anarkulova et al. collected worldwide market data as well as mortality data, and found that the safe withdrawal rate (5% chance of running out of money) for 65-year olds who invest within their own countries is only 2.26%! In follow-up communications with Felix, Cederburg reported that this increases to 2.7% for retirees who diversify their investments internationally.
Sampling block size
One of the challenges of creating a pattern of plausible future market returns is that we don’t have very much historical data. A century may be a long time, but 100 data points of annual returns is a very small sample.
Bengen used actual market data to see how 51 hypothetical retirees would have fared. Anarkulova et al. used a method called bootstrapping. They ran many simulations to generate possible market returns by choosing blocks of years randomly and stitching them together to fill a complete retirement.
They chose the block sizes randomly (with a geometric distribution) with an average length of 10 years. If the block sizes were exactly 10 years long, this means that the simulator would go to random places in the history of market returns and grab enough 10-year blocks to last a full retirement. Then the simulator would test whether a retiree experiencing this fictitious return history would have run out of money at a given withdrawal rate.
In reality, the block sizes varied with the average being 10 years. This average block size might seem like an insignificant detail, but it makes an important difference. After going through the results of my own experiments, I’ll give an intuitive explanation of why the block size matters.
My contribution
I decided to examine how big a difference this block size makes to the safe withdrawal percentage. Unfortunately, I don’t have the data set of market returns Anarkulova et al. used. I chose to create a simpler setup designed to isolate the effect of sampling block size. I also chose to use a fixed retirement length of 40 years rather than try to model mortality tables.
A minor technicality is that when I started a block of returns late in my dataset and needed a block extending beyond the end of the dataset, I wrapped around to the beginning of the dataset. This isn’t ideal, but it is the same across all my experiments here, so it shouldn’t affect my goal to isolate the effect of sampling block size.
I obtained U.S. stock and bond returns going back to 1926. Then I subtracted a fixed amount from all the samples. I chose this fixed amount so that for a 40-year retirement, a portfolio 75% in stocks, and using a 10-year average sampling block size, the 95% safe withdrawal rate came to 2.7%. The goal here was to use a data set that matches the Anarkulova et al. dataset in the sense that it gives the same safe withdrawal rate. I used this dataset of reduced U.S. market returns for all my experiments.
I then varied the average block size from 1 to 25 years, and simulated a billion retirements in each case to find the 95% safe withdrawal rate. This first set of results was based on investing 75% in stocks. I repeated this process for portfolios with only 50% in stocks. The results are in the following chart.
The chart shows that the average sample size makes a significant difference. For comparison, I also found the 100% safe withdrawal rate for the case where a herd of retirees each start their retirement in a different year of the available return data in the dataset. In this case, block samples are unbroken (except for wrapping back to 1926 when necessary) and cover the whole retirement. This 100% safe withdrawal rate was 3.07% for 75% stocks, and 3.09% for 50% stocks.
I was mainly concerned with the gap between two cases: (1) the case similar to the Anarkulova et al. research where the average sampling block size is 10 years and we seek a 95% success probability, and (2) the 100% success rate for a herd of retirees case described above. For 75% stock portfolios, this gap is 0.37%, and it is 0.32% for portfolios with 50% stocks.
In my opinion, it makes sense to add an estimate of this gap back onto the Anarkulova et al. 95% safe withdrawal rate of 2.7% to get a more reasonable estimate of the actual safe withdrawal rate. I will explain my reasons for this after the following explanation of why sampling block sizes make a difference.
Why do sampling block sizes matter?
It is easier to understand why block size in the sampling process makes a difference if we consider a simpler case. Suppose that we are simulating 40-year retirements by selecting two 20-year return histories from our dataset.
For the purposes of this discussion, let’s take all our 20-year return histories and order them from best to worst, and call the bottom 25% of them “poor.”
If we examine the poor 20-year return histories, we’ll find that, on average, stock valuations were above average at the start of the 20-year periods and below average at the end. We’ll also find that investor sentiment about stocks will tend to be optimistic at the start and pessimistic at the end. This won’t be true of all poor 20-year periods, but it will be true on average.
When the simulator chooses two poor periods in a row to build a hypothetical retirement, there will often be a disconnect in the middle. Stock valuations will jump from low to high and investor sentiment from low to high instantaneously, without any corresponding instantaneous change in stock prices. This can’t happen in the real world. Continue Reading…