Salil Mehta at Statistical Ideas commented on my recent post that compared the Atlanta Fed GDPNow model to the FRBNY Nowcast model.
Mehta notes that both models claim accuracy within one percentage point. However, that’s no longer mathematically possible given the difference between the two models is 2.3 percentage points.
Rogue one: Faithful GDP nowcasts by Salil Mehta
There is a 2/3 chance that both competing Federal Reserve 2017 Q1 GDP nowcasts are wrong! That’s an audacious prediction for the storied NY and Atlanta institutions (one of them led by my former big boss Timothy Geithner), and yet there is no way around the current confusion they are in. This is also critically important as one is showing a robust 3.2% growth reading, while the other is at 0.9% (the 2nd lowest reading in nearly 3-years) and essentially indicates that we are descending towards recession.
Are we descending towards recession? While we don’t forecast that, we certainly think there is only a single digit probability of a >3% GDP. How could the NY Fed plausibly give such a madly high estimate (which if true would be the second highest in 2-years)? Yet, there you have it, two extreme readings, and a 2.3% (3.2%-0.9%) chasm between them.
We show here that the Federal Reserve’s conclusions are somewhat ridiculous, though shouldn’t be since they impact the open market committee monetary decisions that the world looks to. And there are humbling lessons from these nascent Big Data, overfit models.
The chart here shows some basic information regarding the current GDP nowcasts. As we via the two blue bars, we have the Atlanta nowcast on the left (the bar was recently as high as 3.4% earlier this year). And the NY nowcast on the right (the bar was recently as low as 1.5%). That’s right, both nowcasts passed each other, while aggressively moving further in the opposite direction! The large swings in each are also doubtful, given each nowcast’s eventually advertised, margin of error. For a good chronology of these nowcast reports, refer to MishTalk.
Each nowcast boasts a margin of error of just ~1%, and this clearly poses a problem since the average of these two nowcasts (shown in orange at 2.1%) is clearly outside of both the Atlanta and the NY stated margin of error! As supportive reference, we also show (in green) that the current 2016 Q4 GDP is nearby at 1.9%. Now we should ask some important questions about how we keep getting into more strange nowcasts in the past year that they both have operated. The first thing to appreciate is that the nowcasts are supposed to predict very tight errors that are uncorrelated to the variance in the actual GDP itself. And good nowcasts should have errors independent of one another, except since the NY and Atlanta Fed operate independent of one another there is a good chance that there may be some modeling similarities. We modestly assume this and derive through the variance formula (VarianceAtlanta+VarianceNY+2σAtlantaσNYρAtlanta,NY) that the margin of error of the difference between the models is just less than 1% (silver vertical interval arrows in chart above). This is a highly plausible tight expected variance. Sample size is also trivial here as we don’t have the true expectation to model a limit from. And with this, the probability of seeing an inadvertent 2.3% difference between the two correct Federal Reserve models is <5%. Or that their publicized margin of error is awkwardly too low (to the point we’ll show that randomly guessing the GDP would be safer).
We also have the probability that one of the nowcasts is correct, which due to symmetry means applying one of the nowcast stated margin of errors to the other nowcast value. Or that the one nowcast is unintentionally correct, which would be like the <5% probability above. So, in total the probability of one of the Federal Reserve nowcasts being correct and the other being wrong is ~10% (not the >½ that many may generously assume as they critique these divergent outputs).
That leaves us with two other possible outcomes still! That is to split up the remaining 85% probability that both models are individually wrong into: (a) the average of the two models is still correct, and (b) even the average of both models is wrong. This leaves us with no choice but to conclude that the probability that there is ~40% probability that the correct 2016 Q1 GDP is nowhere near either nowcasts nor the average of the two, and a >½ probability that the GDP is near the 2.1% average that inappropriately happens to be well outside both two nowcasts’ margin of error. And between those two we can safely claim that there is a 2/3 chance that both models are total wrong (and merely <5% chance they are still both right). There is perhaps a 30% chance one can smartly use both models in a deliberate way, though this is conditional on how they use the information and not at their endorsed specified face value.
Now a more practical assumption of this is that the margins of error should be more than doubled (to 2.6%!), in which case no one would even use such nowcast models. However, the probability breakdown in such a scenario is this:
- <30% chance both models -with their current 2.3% chasm- are correct
- ~60% chance one of the models is wrong and one is correct
- ~10% chance both models are still wrong individually, though the average in rare cases is correct
In all cases, all three probabilities sum to 100% as they should. And they lend a healthy sense of respect that incessantly observing each of these GDP nowcasts is commonly a waste of time, and that rarely will one gain insight from it other than from ex post luck. It’s the same as the more senior open market committee models vainly attempt to forecast other macro-economic variables. Sometimes simply looking at the most recent quarter’s GDP (in this case ~2%) is as good of a guess as any. So is giving a little more weight to the near-0 probability most have of an outright contraction this quarter. As a business CEO, one would want to be arranged for anything at this point, which a genuine 2.6% margin of error about GDP infers.
End Salil Mehta
I have been following this divergence for some time. I expect model revisions following the next GDP report.
I last covered the divergence in GDPNow Forecast Dips to 0.9%: Divergence with Nowcast Hits 2.3 Percentage Points – Why?
Here are my charts.
Neither report had a significant movement on February 24. Let’s start there for a closer look.
- On March 1, Construction spending took 0.7 percentage points off GDPNow but only 0.046 percentage points off Nowcast.
- On March 2, light vehicle sales took 0.3 percentage points off GDPNow. Nowcast did not factor in light vehicles sales.
- On March 6, the manufacturing report took 0.2 percentage points off GDPNow. The same inventory report (different name) added 0.031 percentage points to Nowcast.
- On March 7, the import-export trade report did nothing for GDPNow. The import-export trade report added a net 0.032 to Nowcast.
- On March 10, the jobs report subtracted 0.3 percentage points from GDPNow. The Jobs report added 0.003 (nothing) to Nowcast.
- March 1, 2, 6, and 7 reports subtracted 1.2 percentage points from GDPNow.
- March 1, 2, 6, and 7 reports added 0.017 percentage points to Nowcast.
- The big difference is in how the models treat (or don’t treat at all), construction spending and light vehicle sales.
- The jobs report decline (an additional 0.3 percentage point decline for GDPNow) may be related to a model change.
I have a simple rule: Whenever one of the models seems ridiculously high, I place a mental bet against that outcome.
Last quarter I leaned towards the Nowcast forecast. This quarter I think GDPNow will be much closer.
Many recent economic reports have been weak. And failure to take into consideration vehicle sales when they have been holding up retail sales seems like a mistake.
However, construction spending, which took 0.7 percentage points off GDP Now, is so volatile and so frequently revised that it is hard to have much faith in that particular slide.
Finally, one additional problem with both models is they do not think. For a synopsis, please see Formulas Don’t Think: Investigating Weather-Related GDP.
Mike “Mish” Shedlock
If they could predict the future they would not be working for a living.
The economy is a complex adaptive system. It can’t be predicted by modeling a few variables. It also can’t be controlled by manipulating a few variables.
In my business the joke is MAI (Member Appraisal Institute) really stands for Made As Instructed. That ain’t no joke when it comes to Economists.
It’s hard to make predictions – especially about the future.
— Yogi Berra
Tony Bennett said:
After consideration … the only sensible solution … construct a mud pit for Tug of War between the two teams.
Tony Bennett said:
Industrial Production for February out this morning … 0.0% (expected +0.2%) mainly due big drop in utilities from nice weather (which should give bump to other numbers since consumers haven’t had to dig out from big storms … but that is another story) … BUT one of the positives was manufacturing at +0.5% … which leads me new vehicle (channel stuffing) production.
GM dealer days of inventory
January 31st 2017 … 91 days
February 28th 2017 … 108 days (878,590 vehicles)
January 31st 2016 … 67 days
February 29th 2016 … 74 days (629,878 vehicles)
Doesn’t take a crystal ball to see that new vehicle production “may” be cut back in near term.
Source: “not so” Federal Reserve = Fake News
Medex Man said:
I think it is funny that a two branches of the same flawed central planning, “all knowing, all seeing” bureaucracy each have a separate staff of economists. Since the Fed knows all and sees all and has perfect forecasting models, why the hell do they need two sets of economists in the first place?
And why does the Federal Reserve SYSTEM have not two, but TWELVE different versions of reality?
These are academics who claim to have superior intellects, superior math skills, superior modeling capabilities, and superior access to underlying data — and these experts can’t agree on even a basic framework for the economy?
But CNBC and the like expect us to believe the Fed is managing the national economy? They can’t even manage the 12 district banks
In a functioning economy, someone would arbitrage the high forecast against the low forecast and (over time) eliminate the worst performing Fed economists. Instead, the infestation of inexperienced academics just gets bigger and bigger, never held accountable when (not if) they are wrong
“And why does the Federal Reserve SYSTEM have not two, but TWELVE different versions of reality?…In a functioning economy, someone would arbitrage the high forecast against the low forecast and (over time) eliminate the worst performing Fed economists.” -Medex Man
Actually those are very good ideas, and pretty much mirror the Edward Demming approach to statistical process quality control. The FED Central Planning Bureaucracy is really a New Deal-style, make-work program for economics grads trained in model making. Winnowing from 12 to the best would blend well with the Trump Apprentice show: “Your fired” notices to the losers, who would be left with generous unemployment benefits while transitioning to a trade in the real economy. As a Reality Show with sponsors, the GDP Show might even turn a profit, which it is not doing now. People and groups from the Outside could put in their own GDP bids/guesses. Thus, better GDP predictors might emerge from outside the FED.
Medex Man said:
Watching self important academics “compete” over who is the least inaccurate in guessing the next GDP number doesn’t make for good TV.
Fire all 12 groups, no severance.
You miss my main point, which is the Demming statistical process to achieve increasing quality control, which in this case could result in more accurate numbers if it is even possible in the prediction business. Not sure it is. Nor do I think much of polls of polls (a big failure in 2016 election). But Demming statistical methods, which originated in his USDA agricultural work, are in a broad conceptual sense a mathematical pathway. Getting an accurate number in advance (prediction) might be worth paying for, and might attract a range of competitors outside the FED. Unfortunately we cannot YET fire 12 groups. Though perhaps the FED will get there, if it follows the bureaucratic pathway of infinite expansion.
TV does a good job with random draws of lottery numbers. GDP number components would be quite a lesson (educational) and more interesting than drawing random numbers (e.g. lottery). Could be quite suspenseful with jobs on the line like The Apprentice. You could also sell tickets, like the lottery to predict the final GDP number. Numbers rackets have always been popular.