Hello and welcome to First Line Sports Analytics on Youtube! In this video, we’ll be discussing the question: what makes a metric helpful? We’ll dive into the concepts of directness, lurking variables, and more.
Let’s imagine walking into a workshop. The first two things you see are a table saw and a hammer. These are two completely different tools that perform completely different tasks. But which is more helpful?
The short answer? It depends. On the kind of task at hand, how much you’ll need it, whether you’ll have access to power, etc. If the task is to hammer a bunch of nails, you’re gonna want the hammer. And if the task is to cut up some wood, you’re gonna want the table saw. You could make the argument that the table saw has the potential to be more helpful because it’s powered, it can cut a lot of wood that can be used for other tasks, and so on. But the point stands: if you need to hit a nail into a piece of wood, you need the hammer.
This situation is very similar to that of sports analytics. The hundreds of metrics in each sport are all different tools that perform certain tasks. And their helpfulness is just as complicated and context dependent as the hammer and the table saw.
Now, since it’s already proven to be problematic, let’s lay out a definition of helpfulness that we can use in this context. A metric that is helpful will be one that provides information about and insight into a given situation in sports. For example, 3-point shooting percentage shows some amount of insight into a player’s 3-point shooting.
But what makes something more or less helpful? All metrics are helpful in some way, but in order to tell how seriously we should take them, we need to figure out a way to answer this question. So, what we’ve found over our years of research is that how helpful a metric is generally depends on a mix of two things: directness and potential.
Directness refers to how much a metric is influenced by outside factors. The less it’s influenced, the more direct it is. And the more it’s influenced, the more indirect it is.
This is very similar to the statistical concept of lurking variables. The idea of a lurking variable existing is usually brought up in the context of statistical modeling. It’s a variable that exists outside of the model that has a substantial, and possibly significant impact on the relationship in question.
A classic example of this can be found in a 1976 study trying to find the best spot in the oven to cook a meatloaf. Exciting, right? They measured the internal temperature of half of the meatloaves while in the oven and then measured how well they all cooked afterward. And the study did in fact find that there were certain spots that were significantly better than others. But, upon later review, it was found that the meatloaves that cooked worse, were also the ones that had a thermometer left in them while cooking. In this example, the presence of a thermometer was a lurking variable.
To put this in terms of sports, this would be like looking at the angles that two pitcher’s fast balls entered the strike zone over the course of a season. And you find that which pitcher threw the ball was a significant predictor of the entry angle. Great! The problem is, that you later look back and realize that your two pitchers were 6 foot 10 Randy Johnson and 5 foot 11 Pedro Martinez. Here, height is the obvious lurking variable. Whoops.
The difference between the ideas of lurking variables and directness is that lurking variables will have unacknowledged influences on the conclusions reached after the study. Directness refers to outside influences on the actual value of a single metric. What does that mean? Well, it’s probably easiest to explain through examples.
Metrics that will consistently be the most direct are straight-attempt metrics. Examples include, passes thrown, shots on goal, balls hit in your direction, three-pointers taken, and so on. There are occasionally exceptions, but these metrics are typically very directly descriptive of a very specific situation. And, any fluctuations in these metrics are fairly easily explained.
For example, if a QB doesn’t attempt a lot of passes for a few games, it’s due to a combination of the coach’s game plan and the amount of time their team had the ball. Or if Steph Curry doesn’t take that many threes for a few days, it’s due to a combination of playing time and how frequently he was able to get his shot off.
Metrics that will consistently be the most indirect are counting stats that rely on other events. Two of the best examples are interceptions in football and defensive rebounds in basketball.
First, interceptions. Let’s say that a defender finishes a season with 6 INTs. In order for him to intercept the pass of an opposing quarterback and record ONE of those picks, the following is necessary: The defender needs to be one of the 11 players on their team in position to make the pick. Then, the QB needs to throw the ball mistakenly in just the right place for them to catch it, which requires him to throw it to the receiver that the defender is covering and not another one. And then the defender needs to have good enough hands to catch it and NOT drop it. But that same ability, which the QB is likely aware of, makes it even less likely that the ball gets thrown their way.
Or will their ability, instead, lead them to guard better receivers that get targeted more often? All of this goes into that single little number 6. Obviously, incredibly indirect.
Next, defensive rebounds. Let’s say Shaq finished a game having notched 12 defensive rebounds. In order to get one of them, the other team MUST miss one of their shots, which has some probability. And, in order to be in position to grab the ball, Shaq most likely had to be close to the hoop, which also has some probability. And he has to have the strength and size to push and block others out of the way and make sure nobody else grabs it, which he has with some probability. What makes interceptions indirect is how seemingly low the odds are that any one defensive player will record one. What makes defensive rebounds more odd are the very real and present odds of everything happening around them. There is a percentage of shots made by the opponent while Shaq was on the floor. There is a percentage of time that Shaq was within 5 feet of the basket at the end of plays. There is a comparison of size, strength, wingspan, etc. of the other players around the hoop and close to Shaq. Defensive rebounds are so fabulously indirect, that the total or per-game-average will likely tell you more about the other elements of a player’s situation than their actual rebounding ability.
That leads us to the other factor that influences a metric’s helpfulness: potential. Meaning, how helpful a metric will be if used perfectly in the most correct situation. As an example, on-base-percentage in baseball is a little helpful if you’re trying to use it to make the argument about a player’s overall ability, but much more helpful if you’re discussing their ability to get on base. What potential refers to in this context, is that I think we can all agree that explaining a player’s overall ability is much more helpful than just their ability to get on base. Like the tools we were thinking about earlier, if you need to hammer a nail, you should use the hammer, but it won’t complete as big of a job as the table saw and therefore has a lower potential to be helpful.
A metric has higher potential if it is most helpful in an especially general situation
So, how do these two factors come together to judge a metric’s helpfulness? Well, in general a metric will be more helpful if it is direct and has high potential. Something that is easily interpreted and can explain an especially general situation. It’s rare that a metric achieves this, though.
To explain this further, let’s discuss catch-all metrics. These can also be known as single-value metrics, player-value metrics, etc. They are calculations that lead to a single value that is meant to explain an overarching situation. The wording and interpretation of that will change metric-to-metric, but that’s the general idea. Examples include WAR, VORP, PIPM, Offensive and Defensive Rating, Passer Rating, and more.
What you’ll usually find with catch-all metrics, is that they will sacrifice directness to achieve a higher potential. You look at something like WAR in baseball, that ends up being one of the least direct metrics in sports. But, like many other catch-alls, it has the highest possible potential: quantitatively explaining how valuable a player is to their team. The potential is so astronomically high, that many people can see past the lack of directness.
WAR has directness issues for similar reasons to many other catch-all metrics, though. It combines dozens of other metrics together with elaborate calculations in order to come to its final value. When any metrics are combined, the final value possesses a blend of the original levels of directness. Remember that directness is about interpretive ability, and when multiple metrics are combined, the interpretation gets watered down.
For a simple example, let’s look at OPS or on-base-plus-slugging in baseball. It is the sum of on-base-percentage and slugging percentage. Both metrics have their own interpretations and levels of helpfulness, but what is the interpretation of the sum of a players frequency of getting on base and their rate of getting bases? Advocators for the metric would argue that their sum creates a brand new value with its own interpretation. Whether you agree with that or not, the fact of the matter is that a new interpretation is needed because combining these two specific metrics in such a way requires it.
To think about the tools again, what these high potential-low directness catch-alls will do is put someone in front of the table saw that doesn’t know how to use it. The tool still has a higher potential helpfulness than the hammer, but the person doesn’t have any knowledge about how the machine works, so the tool’s potential goes to waste. Having especially low directness makes it difficult to figure out what a metric is actually telling you. Regardless of how helpful it could theoretically be, it’s rendered essentially useless.
So, can there be a metric that’s direct and high-potential? Theoretically, yes. In order to be high-potential, it would need to be a catch-all-type metric that aims to sum up the value of a player, rotation, etc. But to be the most direct, it would have to be rooted in some single measurement or observation, just like the straight-attempt-metrics. This will most likely come from player-tracking data. Somehow, it would have to figure out approximately how integral the individual is or individuals are to their team by looking at every single move they make, how those moves fit in with their team, how they respond to the other team, and how they compare to other individuals doing similar things.
But even then, it won’t be perfect. An important thing to remember about metrics in sports analytics is that they are tools, not materials. What we’re doing here is looking to maximize helpfulness in coming to conclusions, NOT offering conclusions. While entirely different tools, the hammer and the table saw have one thing in common: without the wood, they’re just junk. And if you don’t consider that the data going into these numbers are collected from the extraordinary actions of human beings with creativity, flaws, and the ability to improvise, they are just numbers.