Foreman, Fournette and Mixon Lead the 2017 RB Success Model

Get a free NFL subscription for 3 days.
We have roughly a month until the 2017 NFL draft, when we will learn where our favorite (or not so favorite) prospects will land this coming season. While draft position and landing spot are huge factors for forecasting the success of any running back prospect, I’ve found that we can accurately predict whether a running back will be successful largely based on his production profile and athletic measurables.
We know that collegiate production isn’t everything for wide receivers, it’s the only thing. For running backs, the situation is wholly different. Production matters, but size-adjusted speed is king for determining which running backs will be successful in the NFL.
You can define success many ways, but I’m choosing to use a top-12 fantasy point season (PPR) for running backs. The model’s dependent variable for early NFL success is whether or not a player had such a season within his first three years in the NFL.
We used age, production, and combine measurables to train and test the updated 2017 running back model. The model used 350 running back prospects that entered in the NFL from 2000-2014, splitting the data roughly 2-to-1 into training and testing sets.
After plugging dozens of different production and combine statistics into the model and slowly taking away, one-by-one the least statistically significant, we were left with four (two combine, two production) that provide the most explanatory and predictive power (listed in order of statistical significance):
1. 40-yard dash
2. Weight
3. Final season rushing yards per game
4. Final season receptions per game
As you’d expect, the model favors faster, heavier prospects who had strong rushing and receiving production in their final college season. The 40-yard dash is by far the most influential statistic for predicting NFL success, followed by weight. The model’s c-statistic on the test set is nearly 0.80, which is generally considered a strong score for a logistic regression model.
We are big believers in the predictive value of the three-cone drill here at RotoViz, and I found in my regression tree analysis that when looking at strictly combine measurables, the three-cone drill is significant for slower backs. But many RB prospects choose to skip the agility drills at the combine, leaving us with a hard choice to either exclude them from the analysis or estimate the missing times. I estimated the missing times using a linear regression on a prospect’s weight and 40-yard dash, which are strong predictors. Even with these estimates, my analysis found that adding three-cone times to weight and 40-yard dash didn’t enhance prediction. I don’t think you should ignore agility in your running back prospect analysis, but I wouldn’t give a prospect additional credit for a fast three-come time if he already has strong weight-adjusted speed.
To get an historical perspective on the types of prospects the model favors, here are the top-15 scores for the entire 2000-2014 data set. Remember, draft position is not one of the inputs in the model, I only added it here for reference. You can think of the “Top 12 Predict” score as the likelihood that the running back will meet the model threshold of registering at least one top-12 PPR season in his first three years as a pro.
You’ll see that this draft-agnostic model was good at predicting success, even though only eight of the 15 above went in the first round of the NFL draft. The model does have some misses, but even technical misses like Michael Turner, Rashard Mendenhall and Jonathan Stewart were more near-hits or late-bloomers than abject failures.
Now the part we’ve all been waiting for: Let’s apply our historically accurate model to the 2017 draft class. Here are the top-10 scores.
Player | School | Draft Year | Draft Position | Weight | Forty | RuYds/Gm | Rec/Gm | Top-12 | Top-12 Predict |
Chris Johnson | East Carolina | 2008 | 24 | 191 | 4.24 | 109.5 | 2.8 | Yes | 0.58 |
Darren McFadden | Arkansas | 2008 | 4 | 210 | 4.33 | 140.8 | 1.6 | Yes | 0.58 |
Matt Forte | Tulane | 2008 | 44 | 218 | 4.46 | 177.2 | 2.7 | Yes | 0.58 |
Kevin Jones | Virginia Tech | 2004 | 30 | 228 | 4.38 | 126.7 | 1.1 | Yes | 0.57 |
Michael Turner | Northern Illinois | 2004 | 154 | 244 | 4.49 | 137.3 | 1.6 | No | 0.57 |
JJ Arrington | California | 2005 | 44 | 214 | 4.40 | 168.2 | 1.8 | No | 0.56 |
Demarco Murray | Oklahoma | 2011 | 71 | 213 | 4.41 | 86.7 | 5.1 | Yes | 0.55 |
Latavius Murray | Central Florida | 2013 | 181 | 223 | 4.38 | 100.5 | 2.5 | Yes | 0.55 |
Ladainian Tomlinson | Texas Christian | 2001 | 5 | 221 | 4.46 | 196.2 | 0.9 | Yes | 0.50 |
Rashard Mendenhall | Illinois | 2008 | 23 | 225 | 4.45 | 129.3 | 2.6 | No | 0.50 |
Reggie Bush | USC | 2006 | 2 | 203 | 4.36 | 133.8 | 2.8 | Yes | 0.50 |
Adrian Peterson | Oklahoma | 2007 | 7 | 217 | 4.40 | 144.6 | 1.4 | Yes | 0.47 |
Jonathan Stewart | Oregon | 2008 | 13 | 235 | 4.48 | 132.5 | 1.7 | No | 0.47 |
Larry Johnson | Penn State | 2003 | 27 | 228 | 4.55 | 160.5 | 3.2 | Yes | 0.45 |
Ronnie Brown | Auburn | 2005 | 2 | 230 | 4.43 | 76.1 | 2.8 | No | 0.43 |
Comments Add comment
Discuss this article on the RotoViz Forums
13 more replies
When I go back and look at the RB tree you put together last year,
http://rotoviz.com/2016/02/which-measurables-really-matter-for-running-backs/?hvid=4cwUNh
Am I wrong that I would find McCaffrey in the far right node that has a 78% success rate? I get there are many ways of looking at these things, but it is a bit surprising to see such a discrepancy from the same author. Which article / method has better predictive results?
There is a formula, but it's not easily calculated like a linear regression. Not sure it would be much value to share.
@colekev_FF Ssoooo he ran a 4.45 at his pro day, thats pretty amazing at that size
I think you're right that the tree nodes are smaller and more difficult to rely on for statistical significance. The value of the trees was more to put the combine drills into a digestible format based on past results. The trees can be overfit, or closely follow past data at the expense of being predictive.
Thanks for doing this. The tree model always bothered me since it created binary branches out of continuous data, and resulted in findings where a 0.01 difference in forty or agility time would create massive swings in "success" predictions.