RotoViz 101: Which NFL Team Stats Are Predictive (And Which Aren’t)?

In this series of posts I’ll discuss which of the various box score and advanced stats forecasters should pay attention to when projecting teams and players into the upcoming season. Player projections begin with opportunity. The overall league offensive environment is important, and it’s one that Ben Gretch has looked at already this offseason. To get a sense of what opportunity will be available, either via targets or touches, you typically forecast each team’s total play count, along with a run/pass percentage split. This is actually fairly difficult to do accurately. It’s even harder if you use the wrong metrics. Since perhaps the biggest error an analyst can make is to confuse a descriptive stat (“here’s what happened”) with a predictive stat (“here’s what may happen in the future”), I compiled a list of various offensive and defensive stats and tested how well they predict themselves year-over-year. To test the year-over-year predictiveness I used r-squared. It can be thought of as a measure of the stability, or stickiness, of a particular stat or metric. Stable metrics are always better for forecasting. They allow us to tether our evaluations and opinions to firm analytical ground. Conversely it is extremely valuable to know which stats and metrics are unstable, or subject to huge variance year-over-year. As analysts we can discount these and try to account for the unknown they represent in our models. With all that out of the way, here are the year-over-year r-squared values for 29 defensive and 32 offensive stats and metrics with (bonus!) completely arbitrary color-coding.

Subscribe to the best value in fantasy sports

You're all out of free reads for now and subscribing is the only way to make sure you don't ever miss an article.

By Josh Hermsmeyer | @friscojosh | Archive

Comments   Add comment

  1. Since you can predict volume better than efficiency, would it be worth it to check situations that are more likely to lead to scores?
    As an example, rushing expected fantasy points, receiving expected fantasy points or red zone touches?

    If there is any stickiness to these stats, it would help highlight those players that are in situations that are at least favorable.

    Looking at the data, the expectation would be that rushing expected points would be less sticky than number of rushes so it might not be useful.

  2. is there an R squared value "cutoff" where you would start considering stats as "firm analytical ground" to make y/y predictions from? Great article btw

  3. rushing expected points on the team level has a y/y R-squared of 0.09
    receiving expected points on the team level has a y/y R-squared of 0.247
    red zone touches on the team level has a y/y R-squared of 0.124

  4. Hey Nick. Typically I would want a model to give me better than an out-of-sample r-squared of 0.5 to put any real faith it it. These stats are all low, and it casts serious doubt on our ability to project the statistical output of NFL teams prior to the season.

    It's not always the case, but a good rule of thumb is your model will not be more predictive than the sum of it's parts. That's why seeing all the components on their own it useful, even if their correlations with other data we care about may actually be higher.

  5. I wonder what would happen to the R^2 values if we only looked at year-to-year changes on offensive stats when the offensive coordinator stayed the same, and defensive stats only when the defensive coordinator stayed the same. Additionally, it would be interesting to know what happens in year 2 of a coordinator versus year 1. I would guess many R^2's improve, though probably not enough to "put faith" in them.

Discuss this article on the RotoViz Forums