RotoViz 101: Which NFL Team Stats Are Predictive (And Which Aren’t)?
In this series of posts I’ll discuss which of the various box score and advanced stats forecasters should pay attention to when projecting teams and players into the upcoming season.
Player projections begin with opportunity. The overall league offensive environment is important, and it’s one that Ben Gretch has looked at already this offseason. To get a sense of what opportunity will be available, either via targets or touches, you typically forecast each team’s total play count, along with a run/pass percentage split.
This is actually fairly difficult to do accurately. It’s even harder if you use the wrong metrics. Since perhaps the biggest error an analyst can make is to confuse a descriptive stat (“here’s what happened”) with a predictive stat (“here’s what may happen in the future”), I compiled a list of various offensive and defensive stats and tested how well they predict themselves year-over-year.
To test the year-over-year predictiveness I used r-squared. It can be thought of as a measure of the stability, or stickiness, of a particular stat or metric. Stable metrics are always better for forecasting. They allow us to tether our evaluations and opinions to firm analytical ground.
Conversely it is extremely valuable to know which stats and metrics are unstable, or subject to huge variance year-over-year. As analysts we can discount these and try to account for the unknown they represent in our models.
With all that out of the way, here are the year-over-year r-squared values for 29 defensive and 32 offensive stats and metrics with (bonus!) completely arbitrary color-coding.
- Efficiency, on both sides of the ball, is highly variable. I’m sorry I can’t type that with a straight face. Efficiency is a shit-show. No, that isn’t quite right. It’s a monkey-filled shit-throwing clown-circus led by a drunk ringmaster. Better.
- If you use year-N Defensive DVOA to help you project strength of schedule in year N+1 you are making a terrible mistake. DVOA is a wonderful metric for telling us how a defense performed the previous year. It tells us nothing – literally – about how a defense will perform in the future.
- Defenses that are attacked through the air in year N tend to be attacked through the air at vaguely similar rates in year N+1. We can probably use this as a proxy for “good defense to start QB/WR/TE/pass-catching RB against.” Note that this may mean the team has a very good offense that causes the opposing team to play catch up in the second half, and not that the pass defense is necessarily bad.
- Rushing is a crap shoot. This has massive implications for RBs. It also poses questions about just how much a run-blocking offensive line is worth.
There are other takeaways from the data, but they mainly concern how you should approach modeling the NFL. In general, we can predict volume decently. In general, we cannot predict efficiency in any meaningful way. It follows then that our projections should take a range of possible volume projections and multiply them by a range of possible per-unit outcomes. A great way to do this is with Bayesian modeling using beta and gamma distributions as priors, or with RotoViz’s similarity score apps.