By Chris Sinon
Sports fans love to argue. Who’s the best player? Who wins tonight’s game? What will Oregon’s jerseys look like?
With January being an especially great time of year for sports, these debates will only get louder and more aggressive. People nationwide will turn to statistics for help justifying their picks for this year’s championship teams and all-star game selections.
For most of the 20th century, the stats used by the sports community have been fairly simple, often referring to just pure raw data (like goals or assists) or simple averages (like shooting percentages). While traditional sports stats make for great trivia and sporting lore, teams and fans have realized that these numbers alone won’t help them project future performance or predict team success. A statistical revolution that any scientist could respect is underway in professional sports, and here are a few examples:
- Baseball: Batting Average vs. On Base Percentage
The most famous example of the statistical revolution in pro sports was popularized by the book Moneyball and the film of the same name. You need to score runs to win baseball games and the players who are the best at hitting score lots of runs. Success as a hitter has been popularly represented by a player’s Batting Average (number of hits divided by number of at bats) and MLB teams sought out and paid more money for players with the highest Batting Averages.However, hitters can also get to first base in other ways, most commonly by a walk, and this isn’t accounted for by Batting Average. A new metric, On-Base Percentage, was created to track how often a player reaches a base by any means necessary and it fared better at predicting how often a team could put itself in position to score runs. For context, Ted Williams, one of the best baseball players of all time, had a batting average of 0.344. When Williams played in the 1940’s and 50’s, his ability to get a hit on one third of his at bats was considered extraordinary. But using today’s statistics, his On-Base Percentage of 0.482 shows he was actually reaching first base almost half the time. That’s a huge difference in evaluating a player’s ability to score a run for their team.
- Hockey: Plus/Minus vs. Corsi Numbers
Great players in any sport have a way of controlling the outcome of a game, but how can that happen in a sport where even the best players play less than half of the time? The plus/minus statistic has been used to get a rough idea of whether a player is an asset or a liability to the team. It works by adding up goals scored by a player’s team while that player is on the ice and subtracting how many goals are scored against a player’s team during that same time. Even though this system may credit and discredit players unfairly for goals they have hardly anything to do with, the thought was that over time this stat would reveal trends related to their performance.It turns out, that’s not the case. Plus/minus has been shown to capture too many variables that are simply outside of a player’s control. The best hockey players maximize their team’s ability to win by keeping possession of the puck and creating scoring chances. These qualities can be estimated using a stat called a Corsi Number, which is calculated by adding up the team’s shots (you can’t shoot if you don’t control the puck) and shot blocks (the shot can’t hit the net if you just blocked it) while the player is on ice and subtracting the other team’s shots and shot blocks. Even better, it’s possible to compute a team’s Corsi Number and compare it to a player’s individual Corsi Number to determine how much better a team controls the game when that player is on the ice.
It’s important to note that for a good Corsi Number it doesn’t matter if shots hit the net or not, your team just needs to try to shoot. The statistic assumes that players won’t try shots unless they think they might score and that the percentage of shots that lead to goals will remain fairly standard over time, and both of these assumptions tend to be true.
- Football: Turnover Differential Here’s an example where a statistic that was used previously to determine potential success ended up being a red herring. In football, turning the ball over by interception or fumble is just about the worst thing that can happen to your team, other than signing Terrell Owens. History has shown that football teams that turn the ball over more than they take the ball away end up doing very poorly. Only 6 out of 48 teams to ever win the Superbowl did so after committing more turnovers during the season than their opponents. This is normally represented as a team’s Turnover Differential, computed similar to the hockey plus/minus stat (Total Team Turnovers minus Total of the Opponents’ Turnovers, for each week of the season), and this statistic has been used to predict future success.The problem is there’s not a lot of evidence to suggest any football team has ever been able to control their Turnover Differential. A lot of this is due to fumbles. Because footballs take unpredictable bounces and there are a variety of situations where a fumble can occur, a team’s likelihood to recover any hypothetical fumble is at chance. For this reason, the argument has been made that Turnover Differential may actually be a better measure a team’s luck each NFL season. In fact, some equations that attempt to predict a team’s future win total assume that a team with a high Turnover Differential one year will experience a regression towards the mean, and the team’s luck will run out next year.
Improving the statistical analysis of sports performance is viewed as a low-cost, high-reward method for gaining an advantage over rivals. As a result, there is no sign that the current quantitative overhaul of sports logic will stop any time soon. With huge databases of box scores from every sport available on the internet, motivated fans looking to land jobs with their favorite teams or just improve the performance of their own fantasy teams will continue mining the data and searching for patterns in the numbers. But some things will always remain unpredictable…