Advice

Evaluating My 2016 Young WR Model Performance

Prior to the 2016 season I created a model that tried to predict PPR production on a per-game basis for young wide receivers, which I defined as WRs in their first four years in the league. This article is going to break down the model results. For all the data the models were built from, as well as the 2016 model projections themselves, refer to the linked article.

Reviewing the Models

First I’m going to start with a brief review of the models.

I created a separate model for each of the first four seasons of an NFL receiver’s career. The factors I looked at included:

  1. Year (SEAS)
  2. Logarithm of draft position (L.DPOS)
  3. Draft age z-score (AGE.Z)1
  4. Final year collegiate market share z-score (MS.Z)
  5. PPR points per game the prior year (PPR.N)
  6. Games played the prior year (GMS.N)

Rookie Model

The rookie year model had the following significant predictors:

Term Estimate p-value
Intercept -367.866 0.0045
SEAS 0.189 0.0033
AGE.Z -0.276 0.1706
MS.Z 0.482 0.014
L.DPOS -1.689 <.0001
(SEAS-2010.15)*(AGE.Z+0.402) -0.136 0.0503
(SEAS-2010.15)*(L.DPOS-4.598) -0.115 0.0835

In this model, age was more important than market share (on a scaled basis). This model produced a backtested R-squared value of only 0.298 with a RMSE of 3.357 on 318 observations. Projecting rookies is really hard.

Second Year Model

Here’s the model for second year receivers:

Term Estimate p-value
Intercept 0.765 0.1757
GMS.N 0.260 <.0001
PPR.N 0.461 <.0001
AGE.Z -0.686 0.0045
MS.Z 0.882 0.0002
(GMS.N-9.450)*(PPR.N-6.060) 0.063 <.0001
(AGE.Z+0.533)*(MS.Z-0.406) -0.658 0.0109

This model has a backtested R-squared of 0.514 with a RMSE of 3.451.

The term for SEAS dropped off. Also, in a player’s second year, we should give slightly more weight to his college market share than his draft age. Prior year PPR points per game and rookie year playing time are the most significant factors.

Third Year Model

Term Estimate p-value
Intercept 4.818 0.014
GMS.N 0.243 0.012
PPR.N 0.391 <.0001
MS.Z 1.029 0.0001
L.DPOS -0.874 0.0042
(GMS.N-11.348)*(PPR.N-7.596) 0.072 <.0001

This model produced a backtested R-squared of 0.513 and an RMSE of 3.641. Notice the AGE.Z term has dropped off completely, meaning by a receiver’s third year in the NFL we shouldn’t even factor in draft age. Draft position, NFL production, and collegiate production tell the whole story.

Fourth Year Model

Term Estimate p-value
Intercept 0.924 0.3075
GMS.N 0.219 0.0069
PPR.N 0.499 <.0001
(PPR.N-8.611)*(PPR.N-8.611) 0.027 0.0382

The R-squared for this model is 0.467 and the RMSE is 3.748. In this model model, any notion of draft position, age, and college production has dropped off. This is the point in a receiver’s career at which he has fully established (or failed to establish) himself as an NFL receiver.

Model Results

Alright, let’s jump into how the model fared.

Overall the models performed well.

2017-01-11

The R-squared value of 0.427 is strong. It’s saying the factors I used to build my models were able to explain 42.7 percent of the variation in PPR points per game performance among this group of WRs. That still leaves nearly 57 percent of the 2016 results unexplained, but for a multiple linear regression model with only a handful of factors, that’s still damn good. But to figure out where to improve the overall results, let’s break the results down by experience class.

2017-01-11-1

This is the same graph as before, but color coded to visualize each of the four experience classes of young WRs. The graph alone doesn’t tell you much, but these R-squared values might:

Experience R-Squared
Rookie 0.1499
2nd Year 0.3375
3rd Year 0.5008
4th Year 0.5297

The results improved with each year of experience. At the same time, the models themselves simplified with each year of experience.2

One thing I found interesting, is if I trim down the rookie data set to only those players who had a target in at least 6 games,3 the R-squared value jumps to 0.310 — a large improvement. The biggest takeaway from the rookie model for me, is that it’s really hard to predict which rookie WRs will get playing time. Only a small handful start the year as starters, and the rest of them are either season-long backups, or players who gain a starting role through injury or performance (like Malcolm Mitchell). However, the ones that played enough (read: 6+ games) were at least somewhat projectable.

So how can I improve the rookie model? A few things come to mind. First, I want to use something like Kevin Cole’s opportunity scores to quantify which rookies are in opportunistic situations. That may have increased projections to varying degrees for Corey Coleman, Laquon Treadwell, Tyler Boyd, Sterling Shepard, Michael Thomas, and Will Fuller. On the flip side, it probably would have decreased projections for Josh Doctson, Tajae Sharpe, and Leonte Carroo. Next, I think using a receiver’s NFL QB quality is something that really would have helped. There’s no doubt Drew Brees played an integral role in Thomas’ success, while the revolving door at QB for the Browns probably held Coleman back a bit.4 How to measure that quality, I’m not sure. Maybe AYA or the like. Finally, there’s probably a few metrics around breakout age, whether the receiver left college early, and touchdown numbers that I can use to improve the rookie model as well.

I’ll continue to pound away at it this offseason to try to improve the rookie, and second year model projections. And while I’m pretty pleased with the third and fourth year results, I’m always looking to tweak my models to squeeze the most predictive power out of them possible.

Subscribe for a constant stream of league-beating articles available only with a Premium Pass.

  1. Taken from Jon Moore’s Phenom Index  (back)
  2. Except the rookie model, because we don’t have prior year points per game at the NFL level.  (back)
  3. Bye bye, Laquon Treadwell.  (back)
  4. In addition to injuries.  (back)
By RotoDoc | @RotoDoc | Archive

Comments   Add comment