Ja’Marr Chase has been lauded by many as one of the best wide receiver prospects to enter the NFL in recent memory. With a ranking of 19 in the current RotoViz Dynasty Rankings, he’s our top prospect from the 2021 class. Why are we, and many others, so optimistic about his future? It’s pretty simple. He’s projected as an early Round 1 pick, made an impact at an early age, and had one of the most productive WR seasons in the history of college football.
When considering prospects, we often question whether they “check all of the boxes.” It’s great when players like Chase accomplish this, but we know that there are plenty of successful players that only checked a handful. Some boxes are more important than others and in some cases, the combination of checks is more important than the total.
Regression trees are a great way to help us reframe this idea. In prior seasons, Kevin Cole and Anthony Amico used regression trees to consider the importance of production and age in predicting NFL outcomes. Regression trees help us to understand the mixture of attributes that tend to drive NFL performance and provide a visual way to understand how these attributes interact.
What Is A Regression Tree?
So what in the world is a regression tree? For the purposes of this article, think of a regression tree as a series of questions. The response to each question leads to another question, which leads to another question. This process is repeated until all questions are answered and an estimate of the points per game a prospect is expected to score in the first three seasons of his NFL career is reached.
To create the regression tree used in this article, I gathered collegiate, age, and draft data for every WR included in the Prospect Box Score Scout. I filtered this listing to only include players that have logged three NFL seasons. I then worked through different combinations of measures until I arrived at a mixture that was predictive of NFL fantasy points and easy to visually follow.
The specifics of how the tree is built fall outside the scope of this article. But for a little more background, I split my listing of players into training sets and test sets. I then fed the training sets into an algorithm that ran through the process of building the tree by continually separating players based on thresholds. Once the tree was built, I fed the test sets into the model and compared each player’s predicted result (as calculated by the regression tree) to his actual result. I repeated this process, workshopping different mixtures of statistics until the differences between predicted and actual results were limited. On average, points per game totals predicted by the model were approximately 2.5 points higher or lower than actual values.
The Results
Before we walk our way through the regression tree, there are a couple of key qualifiers worth mentioning. Firstly,