Advice

Using Analytics to Project Carson Wentz and Jared Goff

Recently RotoViz debutante Chris Hatcher wrote an interesting piece about quarterback ball velocity and how that translates into one metric of passing success at the NFL level. I decided to take his work a step further and used logistic regression to model QB success.

There are multiple ways to define QB success. One might be with fantasy stats. Another might be raw totals. The metric Chris used was whether or not a QB threw for an AYA of 7.0 or higher for at least one season in his career, while also starting 8+ games in said season. I will also use this as my criteria for success. I like this criteria for a few reasons:

  1. The data was easy to get1
  2. It leaves out rushing statistics, so we’re focusing only on passing
  3. It incorporates touchdowns and interceptions along with yards
  4. It strongly correlates with NFL win percent
  5. It doesn’t matter how long a QB plays, if they reach the threshold, they usually do so early in their careers. This is important because you might expect the criteria to bias against players with only one or two seasons under their belt. But both Marcus Mariota and Jameis Winston met the threshold in their first year. I tested number of seasons in the NFL against probability of success, and found there was no correlation (p=0.8). This held true even when controlling for other variables, such as draft position.

The data set I used for ball velocity is the same data set Chris used – combine data on every thrower since 2008 as recorded by Ben Allbright. I then grabbed other combine results and player stats to use when training and testing my model. In the few cases where combine data wasn’t available, I used pro-day numbers and on one occasion (for John David Booty) I imputed his 3-cone time from other metrics.

In all, there are 94 quarterbacks from 2008-2015 with velocity data. Of these 94 QBs, 73 played at least one snap at the NFL level (77.7 percent). Breaking that down further, only 15 of the 73 QBs that played met the success criteria that Chris established in his article. In other words, most QBs fail.

There are plenty of quarterbacks who either didn’t throw at the combine, or at least didn’t have the ball velocity stat recorded. As such, this analysis only applies to QBs who had a ball velocity recorded.

Building the Model

I held back a random sample of 20 players, thus training my model on 74 of the 94 players. This left me with four players in the test set of 20 who met the threshold. The reason for holding back a random sample rather than building the model on, say, 2008-2013 data and testing on 2014-2015 is to eliminate any potential bias that might be introduced by looking only at the older data. As we know, the NFL has trended toward a passing league especially with pass catching running backs, so I wanted to account for that.

When training my model, I chose to use the Bayesian Information Criterion (BIC) to avoid overfitting to the training set. In doing so, I found the model with the lowest BIC2 included the following paramters:

Parameter Wald P-value
Log(D.Pos) 9.52 0.002
Last.AYA 4.3 0.0381
Vel 4.19 0.0407
Cone 2.66 0.1032

In other words, draft position3 was the most statistically significant parameter when predicting this metric of success. Final year AYA was next, in essentially a dead heat with velocity. Interestingly, the 3-cone drill time was borderline significant, but helped the predictive power of the model so it was left in. However, this isn’t all that surprising, because rushing QBs tend to have a better AYA even when controlling for other variables. Since the 3-cone, shuttle, and 40-yard dash are all positively correlated with each other, we can make an educated guess that the 3-cone represents some aspect of rushing ability.

Because there was missing data in some of these factors, this left me with 68 players in the training set and 17 players in the test set.

Model Results

The model did very well against data it was built with, placing 63 of the 68 players (92.6 percent) in the correct category.

Predicted
N Y
Actual N 55 2
Y 3 8

However, this is a retrodictive look. A better test comes by making predictions on withheld data.

When I tested the model on the held back set, it correctly predicted 13 misses out of 13, and three successes out of four, for a total of 16 of 17 correct predictions (94.1 percent).

Predicted
N Y
Actual N 13 1
Y 0 3

This validated these parameters, so I rebuilt the model on the full data set using the same four variables to have a more robust model. You’ll notice the p-values improved across the board…more data helps! Surprisingly, velocity and 3-cone became more significant than final season AYA for the model built on the full data set. Perhaps the velocity piece shouldn’t come as a surprise. After all, Peyton Manning put up the worst AYA of his career (5.0) in his final NFL season with a shot arm.

Parameter Wald P-value
Log(D.Pos) 11.39 0.0007
Vel 5.43 0.0197
Cone 4.72 0.0298
Last.AYA 3.81 0.0509

The final model only missed on these six players:

Name Actual Predicted
Mike Reilly N Y
Jake Locker N Y
Tyrod Taylor Y N
Austin Davis Y N
Kirk Cousins Y N
Nick Foles Y N

Had this model been in use and developed back in 2011, it could have told the Minnesota Vikings to avoid spending the 12th overall pick on Christian Ponder. You can find the full results at bottom.

2016 Class

So how does the model apply to the 2016 class? Here are their numbers:

NameLast.AYAConeD.PosLog(D.Pos)VelP(Success)
Carson Wentz8.76.8620.75799.36%
Jared Goff9.47.1710.05899.26%
Paxton Lynch9.47.14263.35981.74%
Vernon Adams11.26.823305.85314.97%
Christian Hackenberg7.27.04513.95614.88%
Josh Woodrum7.36.743305.85610.84%
Kevin Hogan106.91625.15310.68%
Brandon Allen9.97.062015.3558.18%
Jacoby Brissett7.17.17914.5563.27%
Dak Prescott8.77.111354.9542.80%
Cody Kessler8.57.32934.5551.76%
Jeff Driskel9.47.192075.3520.53%
Nate Sudfeld9.27.421875.2540.35%
Brandon Doughty10.47.492235.4530.22%
Joel Stave6.57.293305.8560.21%
Connor Cook8.17.211004.6500.17%
Cardale Jones8.31394.9
Jake Rudock7.77.061915.3
Trevone Boykin9.53305.855

Carson Wentz actually leads Jared Goff by the slimmest of margins, but both are almost mortal locks to meet the success criteria. Paxton Lynch is the only other QB favored to do so from this class.

It’s interesting to see Vernon Adams at fourth most likely, right in line with Justin Winn’s post-combine QB rankings, despite going undrafted and overlooked. His 3-cone time really helped, along with an AYA of 11.2 in his only college season. Perhaps I need to bake in a college experience factor and see if that has any predictive power. That could temper expectations for Adams if so.

Connor Cook came in fourth in the final RotoViz Scouting Index, then was selected by the Oakland Raiders at pick 100. Oops. He shows up dead last among those in the 2016 class where all the data was available. His ball velocity of only 50 MPH really hurts his chances. Add in questions about his leadership, and he’s a guy to avoid like the plague.

Moving Forward

I expect more variables to pop up as significant as the data set becomes larger with time. Age, for example, was significant but also pushed into overfitting territory. Weight also showed up as significant, with lighter QBs actually faring better. This is likely because it is correlated with the rushing prowess of a QB, so I didn’t want to bake in an extra model term because that would introduce an unacceptable level of multicollinearity beyond what already exists.

NameYearLast.AYAConeD.PosLog(D.Pos)VelP(Success)
Cam Newton201111.26.9210.05699.86%
Marcus Mariota201511.56.8720.75699.80%
Carson Wentz20168.76.8620.75799.36%
Jared Goff20169.47.1710.05899.26%
Blake Bortles20149.67.0831.15695.85%
Mark Sanchez20099.47.0651.65795.40%
Colin Kaepernick20118.66.85363.65992.05%
Jameis Winston20157.77.1610.05590.50%
Paxton Lynch20169.47.14263.35981.74%
Joe Flacco20088.66.82182.95579.66%
Andy Dalton20119.96.93353.65676.56%
Teddy Bridgewater201410.37.17323.55874.64%
Jake Locker20116.66.7782.15470.94%
Mike Reilly200910.16.763305.85865.56%
Russell Wilson201211.86.97754.35565.44%
Josh Freeman20097.87.11172.85753.10%
Jimmy Garoppolo2014107.04624.15645.24%
Brandon Weeden20128.67.36223.15944.30%
Chandler Harnish20129.16.782535.55741.79%
Kirk Cousins201287.051024.65936.48%
E.J. Manuel20138.87.08162.85435.87%
Austin Davis20127.66.733305.85831.65%
Logan Thomas20146.57.051204.86023.40%
Case Keenum201210.66.873305.85520.31%
Christian Ponder201176.85122.55118.20%
Pat Devlin201111.57.083305.85616.67%
Vernon Adams201611.26.823305.85314.97%
Christian Hackenberg20167.27.04513.95614.88%
Nick Foles20127.67.14884.55814.84%
Bryce Petty20159.66.911034.65313.35%
Scott Tolzien20119.46.843305.85512.52%
Paul Smith20089.57.023305.85712.28%
Josh Woodrum20167.36.743305.85610.84%
Kevin Hogan2016106.91625.15310.68%
Keith Wenning20149.17.071945.3568.22%
Brandon Allen20169.97.062015.3558.18%
Tyler Bray20138.37.23305.8595.60%
Matt Scott20137.16.693305.8544.78%
Brett Hundley20158.66.931475.0534.33%
Kevin O’Connell20086.87.01944.5554.22%
Brian Brohm20088.67.13564.0534.05%
Jacoby Brissett20167.17.17914.5563.27%
Tyrod Taylor20119.56.781805.2503.16%
Levi Brown20108.57.073305.8563.01%
Dak Prescott20168.77.111354.9542.80%
Brett Smith20147.46.983305.8562.66%
Drew Willy20097.67.183305.8582.45%
Sean Mannion20156.97.29894.5572.36%
Stephen Morris20148.57.363305.8592.34%
Rhett Bomar20095.76.911515.0552.22%
Pat White20097.17.06443.8521.97%
Tyler Wilson201387.221124.7551.92%
Cody Kessler20168.57.32934.5551.76%
Tom Savage20147.67.331354.9571.69%
A.J. McCarron20149.87.181645.1531.66%
Ryan Nassib20138.17.341104.7561.65%
Curtis Painter20095.772015.3561.49%
Ricky Stanzi20119.46.951354.9501.45%
James Vandenberg20135.26.953305.8571.37%
Jeff Mathews20147.87.143305.8561.24%
Landry Jones20137.97.121154.7531.14%
Chase Daniel20098.27.283305.8571.11%
Chad Henne20086.77.17574.0530.93%
Tajh Boyd20149.87.332135.4540.78%
Cody Fajardo20155.86.953305.8550.69%
Brandon Bridge20156.17.183305.8570.55%
Jeff Driskel20169.47.192075.3520.53%
T.J. Yates201186.963305.8520.53%
Tom Brandstater20096.76.933305.8530.48%
Nathan Enderle20116.37.131605.1540.43%
Max Hall20108.87.073305.8520.43%
Colby Cameron20138.76.983305.8510.42%
Nate Sudfeld20169.27.421875.2540.35%
Connor Shaw201410.17.073305.8500.34%
Dan LeFevour20108.16.931815.2490.29%
Dustin Vaughan20149.17.253305.8530.28%
Brandon Doughty201610.47.492235.4530.22%
Bryn Renner20147.57.223305.8540.21%
Joel Stave20166.57.293305.8560.21%
Mike Kafka20106.56.963305.8520.20%
Connor Cook20168.17.211004.6500.17%
Tony Pike20108.47.062045.3490.13%
John Parker Wilson20096.57.533305.8580.13%
Josh Johnson200812.67.561605.1490.10%
Kellen Moore20129.77.413305.8520.09%
Shane Carden20157.97.173305.8510.07%
Jevan Snead20106.17.083305.8520.07%
David Fales20148.87.551835.2530.07%
John Skelton20108.67.173305.8500.07%
Tim Hiller20106.17.13305.8520.06%
Sean Canfield20107.67.262395.5510.05%
Anthony Boone201567.473305.8560.05%
Graham Harrell200997.453305.8520.04%
Stephen McGee20096.97.343305.8530.04%
Zac Robinson20106.17.242505.5520.04%
Matt Flynn20086.57.212095.3500.02%
Erik Ainge20087.17.511625.1520.02%
Ryan Lindley20127.37.521855.2520.02%
Mike Glennon20136.97.49734.3490.01%
Jerry Lovelocke20156.87.473305.8510.01%
John David Booty200877.791374.9510.00%
Colt Brennan20088.51865.244
Nate Davis20099.41715.156
Colt McCoy20107.5854.456
Jarrett Brown20106.63305.850
Ryan Mallett20119.7744.358
Geno Smith20139.2393.755
Zac Dysert20137.22345.559
Bryan Bennett20157.133305.860
Blake Sims20159.23305.842
Cardale Jones20168.31394.9
Jake Rudock20167.77.061915.3
Trevone Boykin20169.53305.855

Subscribe for a constant stream of league-beating articles available only with a Premium Pass.

  1. It was right there in Chris’ article…yeah, I took a shortcut  (back)
  2. meaning the best fit model that also isn’t overfit  (back)
  3. the log of it  (back)
By RotoDoc | @RotoDoc | Archive

Comments   Add comment