What Metrics Are Most Important for WR Evaluation? Machine Learning Provides Several Surprising Answers – The Wrong Read, No. 68

Blair Andrews
April 3, 2021

In the 68th edition of the Wrong Read, Blair Andrews uses a random forest to help you find the most important metrics for wide receiver evaluation.

A couple of years ago, I did a fun study called the Ultimate WR Prospect Metrics Guide to determine which metrics best correlated with NFL success. That was a good first step, but it had a couple of holes. First, it didn’t examine any physical measurements. Second, it assumed linear relationships for all the metrics involved. That’s not always the case. The study on WR hand size in the Wrong Read, No. 61 is a good example.

Non-Linearity and Regression Trees

With those things in mind, perhaps we can find a better way to determine which WR prospect metrics we should be paying the closest attention to. Dave Caban recently undertook a similar study using a regression tree. The great thing about regression trees is that they don’t assume variables have a linear relationship with whatever you’re trying to predict. And they make this fact easy to understand by translating each variable into a threshold for success.

In Dave’s regression tree, draft position is the most important variable. But rather than giving an equation for turning draft position into a projection, the regression tree tells us that WRs drafted in the top 105 picks tend to have greater success.

Intuitively, this makes sense — a WR picked at the end of the third round should probably not be viewed as a significantly worse asset compared to one picked at the end of the second round. But there’s a big difference between being a third-round WR and a sixth-round WR.

Later we see another split on draft position, with players picked in the top-30 going on to even greater success. This also makes some sense, as first-round WRs often get the most early opportunity. With these two splits, you divide the draft almost perfectly into three distinct sections: Day 1, Day 2, and Day 3. In other words, even draft position doesn’t appear to be linear.

Building on Regression Trees

The regression tree helps us understand the interaction between different variables and what it is you want to predict. And we can even go further. What if instead of growing one tree, we grow 500 trees, each with a random subset of the data and a random subset of the variables? This technique is called, fittingly, a random forest. It lets us better isolate different variables to reduce noise, and it also enables us to include more variables in our results.

There are some trade-offs. What you gain in robustness and comprehensiveness, you lose in interpretability. Because we’re growing 500 trees, we can’t visualize just one as a representative sample. However, we can easily use a random forest model to measure relative variable importance.

There are many ways to measure variable importance, but one of my favorite ways — and one of the most intuitive ways — is to use a permutation method. The method gets its name because you measure variable importance by randomly shuffling each variables’ values and seeing what effect that has on overall model accuracy. If replacing actual values with random values has a negligible effect, that means the variable isn’t very important. A large negative effect means the variable is important.^[1]If it has a large positive effect, that would imply that random values give you a more accurate picture than the actual values, which makes little sense. This doesn’t mean the actual values are actively misleading. Rather, in effect this means that the actual values might as well be random, so an increase in model accuracy — a decrease in mean squared error — amounts to the same as no change. We can do this multiple times to find the average decrease in model accuracy for each metric, which gives us a robust and illuminating ranking of variable importance.

What Are the Most Important WR Metrics?

The chart below measures relative importance in terms of the increase in mean squared error after random shuffling. In other words, how much error do random values add to the overall model compared to actual values? Higher numbers indicate that we lose more accuracy with random values. So higher numbers are better.

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Footnotes[+]Footnotes[−]

Footnotes
↑1	If it has a large positive effect, that would imply that random values give you a more accurate picture than the actual values, which makes little sense. This doesn’t mean the actual values are actively misleading. Rather, in effect this means that the actual values might as well be random, so an increase in model accuracy — a decrease in mean squared error — amounts to the same as no change.

Please subscribe For Full Access to all RotoViz content and tools!

What’s included in your subscription??

Exclusive Access to RotoViz Study Hall
- A treasure trove of our most insightful articles that will teach you the metrics that matter, time-tested winning strategies, the approaches that will give you an edge, and teach you how to be an effective fantasy manager.
Revolutionary Tools
- Including the NFL Stat Explorer, Weekly GLSP Projections, NCAA Prospect Box Score Scout, Combine Explorer, Range of Outcomes App, DFS Lineup Optimizer, Best Ball Suite,and many, many, more.
Groundbreaking Articles
- RotoViz is home of the original Zero-RB article and continues to push fantasy gamers forward as the go-to destination for evidence-based analysis and strategic advantages.
Weekly Projections
- Built using RotoViz’s unique GLSP approach.
Expert Rankings
And a whole lot more…

Blair Andrews

Managing Editor, Author of The Wrong Read, Occasional Fantasy Football League Winner. All opinions are someone else's.

The 2025 WR Class May Lack Star Power, But Hidden Gems Abound: Pre-Draft WR Prospect Lab Scores

Blair Andrews April 18, 2025

The Wide Receiver Prospect Lab has been one of the most reliable tools for evaluating rookie receivers for years. Like its RB counterpart, it uses a linear model to predict early-career fantasy performance based on key college metrics. The beauty of this approach lies in its simplicity — by focusing on a few critical variables, the model avoids both overfitting and overreliance on a single…...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Is the 2025 Running Back Class Really as Good as Advertised? Pre-Draft RB Prospect Lab Scores

Blair Andrews April 8, 2025

The Running Back Prospect Lab is one of my favorite tools on the site, but it’s not exactly a precision instrument. It uses a simple linear model to predict an RB’s early NFL career based on a few important college metrics. However, the simplicity ends up being a benefit — it knows what to look for and isn’t often fooled by outliers. Even in a…...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

4 Running Back Prospects Whose College Production Points to Outsized Returns, and 4 Potential Red Flags: 2025 Backfield Dominator Ratings

Blair Andrews March 27, 2025

Collegiate production is still among the most undervalued measurements for running back prospects. In large part this is because we lack many good ways to measure it. Both rushing and receiving numbers depend on scheme, personnel, offensive and defensive strength, and a host of other factors. This can make many of the raw counting stats misleading. The solution is to control for these factors as…...