Sabermetrics, Stabilization Rates, and Regression to the Mean
Fantasy Douche wrote a great post the other day asking whether Jarvis Landry being good even matters. It reminded me of something I consider from time to time regarding the conclusions that can be drawn from baseball statistics, and how difficult the task is in football.
It’s interesting to consider how data is compiled in the two sports. Baseball is largely a series of one-on-one interactions. Pitcher pitches. Hitter hits. Rinse and repeat. Each hitter comes up four or five times a game, 162 games a year. Controlling for the external variables that can impact the success or failure outcome of that one-on-one interaction seems, if not easy, at least doable.
While a 16-game stretch of good or bad play is usually written off as a small sample in baseball — a hot streak or a slump — in football, it’s all we have. In the offseason, players change teams, maybe a quarter of the coaches in the league are replaced, and it becomes nearly impossible to control for the same factors we could in the previous season. Maybe one way to think about this is a player’s career is divided into a series of 16-game splits, not much more predictive in nature than taking smaller splits from within a season.
Of course, it’s easy to place significantly more weight on the 16-game splits because they amount to the defined measure that is a season. But let’s look at some baseball numbers and see if that’s really a logical way to think of stats.
A 2009 study over at Fangraphs determines “the point at which split-half reliability tests produced a correlation of 0.70 or higher” — the point at which it stabilizes — for certain baseball statistics. These are the points at which a sample of a specific stat, for a specific player, becomes reliably correlated with the larger population for that player and that stat. These are estimates, but pretty educated ones.
50 PA: Swing % 100 PA: Contact Rate 150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA 200 PA: Walk Rate, Groundball Rate, GB/FB 250 PA: Flyball Rate 300 PA: Home Run Rate, HR/FB 500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate 550 PA: ISO
What’s fascinating about this is, even in a sport where there are remarkably few outside variables to account for, something as simple as the percentage of pitches a player swings at doesn’t stabilize until at least 50 plate appearances. Considering a league average of just under four pitches per plate appearance, we’re talking almost 200 pitches.
An actual skill such as the ability to make contact with a pitch doesn’t stabilize until 100 times up to bat, or perhaps a couple hundred swings.1
How Does That Apply To Football?
Obviously there is no way to directly relate baseball statistics to football. But even if you have reason to believe specific skills would stabilize significantly earlier in the game of football than baseball, the above numbers are pretty interesting in the context of the conclusions we draw, particularly on young players.
NFL players play 16 games in a season, and see varying number of opportunities depending on how you look at things. Running backs that see significant playing time will touch the ball at least a few hundred times. Receivers may see fewer than 100 targets, but there’s a good case to make that every route they run is a data point.
Still, if we try to adjust for things like line play, quarterbacks, or defenses faced — to isolate an individual’s skill — it becomes very hard to normalize a data set of that size and draw appropriate conclusions.
Apart from his Landry post, Fantasy Douche has also pointed out that people are shoveling dirt on Jeremy Langford‘s grave this offseason. It calls to mind the conclusions that were drawn about Devonta Freeman after his rookie season, and the general variability between year-over-year efficiency stats across the sport.
Apart from the size of the sample, another interesting takeaway from the above baseball statistics is the variation in stabilization points for different skills. Continuing with the WR example, it seems fair to wonder whether there would be various stabilization points for catch rate or separation achieved, right? What about stats like forced missed tackles or touchdown rates?
Regression to the Mean
This is why we preach the value of opportunity in fantasy football. For a player like Jarvis Landry, it’s easy to look at his 2015 sample — perhaps his poor yards per target — and draw a conclusion about his talent. But a more appropriate response to predicting the future would be to expect that there’s a good chance his yards per target is due to regress upward toward the mean, while not neglecting the fact that it’s possible he may have reached his stabilization point with respect to yards per target. The sample is in all likelihood not large enough — and the variables impacting his statistics not able to be controlled for accurately enough — for us to make judgments on what his efficiency rates will be like in 2016. This isn’t rocket science, but I think it dovetails with what Fantasy Douche was saying. Sure, one could argue Landry is more likely to have a below average YPT in 2016, but arguing he’ll match his extremely low number again in 2015 would probably be an error.
There are simply limitations to the conclusions we can draw from the data we’re given in the game of football. And while everything has limitations, the limitations in football are more pressing to the point where our desire to take a stance far surpasses our willingness to acknowledge them. To be clear, that’s no reason to avoid running analyses trying to learn anything we can with the data we do have.
The more you’re beaten over the head with a player being hashtag bad, the more you should be willing to look past his priors and see if his projected opportunity is worth the gamble. This is something I need to remind myself frequently. There are a lot of months between football seasons, and it’s a fun game to talk about and predict, but keep in mind that value propositions abound when everyone seems to agree a player isn’t very good based on just a season or two.
Subscribe to the best.
- Swing percentages vary from roughly 35 to 60 percent, but obviously see inverse correlation with pitches per plate appearance. We’re looking at something like two swings per plate appearance as a decent average. (back)