The Dally Ms are a wank, let’s use…something else

Wait, haven’t we done this before? Absolutely, in a post that hasn’t aged super-well, I replaced the Dally Ms with the old player rating system to assess the best player in each season. That system has been improved upon with TPR and WARG, so the conclusions drawn there can be safely discarded.

Today, I specifically do not want to replace voting with a statistical rating. I cannot stress enough just how little I want to do that. Of the two half decent, publicly avilable player rating systems, TPR and the League Eye Test’s Net Points Responsible For, neither account for everything that happens on the field and attempting to quantify all aspects of sport dehumanises the experience of watching and enjoying rugby league.

Awards should be given partly on emotion because it’s stupid to assume there’s a purely rational way to hand them out. Rationally, we should only care who the best teams are, seeing as that’s the point of the sport, and who the best players are is a sideshow that should only concern hacks padding out column inches. Indeed, I believe the flaws in any system are ultimately good because that creates fuel for the content machine and keeps the sport in the news, particularly as the higlighting of flaws comes toward the end of the season when there are fewer games to talk about (this is my pet theory as to why college football doesn’t do away with its ridiculous system for anoiting a national champion). Moreover, these flaws reflect our own flaws as humans and that’s one of the things that makes life interesting. Irrational emotions are part of us and part of the sport.

The Dally Ms are back in the news cycle with a proposed change of voting system courtesy of a Buzz Rothfield shit-stirring Sunday column. Under the current system, the judge has to be at the ground to vote the best player on field three points, the second best two and the third best one. The player with the most votes at the end of the season wins. The current system typically favours good players on average teams who are able to sweep up the points on a regular basis, albeit the award has generally been given to the right player, or close enough to, at the end of the year.

Under the proposed system, each player gets a rating out of ten in every game. The flaws in this are obvious. Even the keenest observer would struggle to give every single player in a match an objective and justifiable rating out of ten. I definitely couldn’t do it without making up at least a few. People naturally, when presented with this kind of problem, tend towards an average of seven, and not five, because people are generally kind of nice. Your five and my five are not likely to be the same and there seems to be no way to calibrate for this, other than intensive training, something that is not an issue under the current system because while we might disagree who is the best, we both understand what the best generally means. What is average is another kettle of fish.

There’s also a scaling problem, which affects both systems, wherein a player who absolutely plays everyone off the park is given the max score but the score rarely reflects how far ahead the player is compared to the rest. If Taumalolo runs for 300 metres, scores three tries and pots a field goal, he will get either a three under the old system or a ten under the proposed. The next best player gets a two for his efforts, implying that Taumalolo’s performance was only 50% more valuable, even though he probably would have earned that three points after the first try and 200 metres. Similarly, if he’s awarded a ten, is everyone else a six at best by comparison? Meanwhile, in the next game, the best player scores one try and assists another but gets an equal three points or ten rating.

Bearing all of that in mind, here’s the top ten from the five most recent Dally M votes.

As a point of comparison, here’s the TPR and WARG champions for each of those seasons.

What I want to do is compare how different voting systems impact on the final results. We’re going to look at four different voting systems –

  • System A: after round 9, 18 and the end of the season, the top ten players are awarded votes from 10 for the best player through that part of the season, down to one for the tenth best. Votes are tallied at season’s end and the player with the most votes wins.
  • System B: is the same as system A but players are assessed after every round based on their performance in that round.
  • System C: the current Dally M voting system, as a control.
  • System D: the proposed system.

Rather than go back and watch every game for the last five years, close to 1,900 hours of entertainment, I’m using my player ratings as a proxy for a (non-existent) rational voter. Systems A, B and C are assessed on WARG while system D is assessed on TPR. I think this difference is justifiable based on how I think people would approach the different voting systems. When when assessing players individually on a 0-10 scale, I’d expect judges would compare them to players at the same position and account for time on field, as TPR does, while WARG does a better job of assessing who has done the best by minimising those factors in favour of raw production.

To translate TPR into a 0-10 rating, I tried to put ratings into buckets that more or less reflects a normal distribution, as follows.

One would expect voters to award their ratings in a likewise fashion but will probably skew it based on their personality and what they’re expected to judge.

Here are the new Dally M results for each system.

If we just compare the results from the different systems, system A generates three or four defensible winners out of five, system B only a single one in 2019, system C three or four and system D two to three. In my subjective opinion, the current Dally M system seems to perform the best, even if sex pest Blake Ferguson was to be awarded the title of “the best” in 2018.

You can review the top tens but I watched most of these seasons and I couldn’t possibly remember who was seventh best in 2017. The point is less about who is in what position but rather how the different voting systems affect the outcome. In none of the five years, despite having the same information, did the systems uniamously appoint a winner.

All systems are going to have pluses and minuses. The validity of the result comes from two things. The first is a widespread understanding what the purpose of the award is. Is it for the best and fairest or the most valuable or something else? Each of those mean different things and the system to award the winner needs to reflect its purpose.

The second is capability of the voters. The average NRL twitter user (and/or person reading this) is going to assume that the judges are idiots, because they produce largely terrible commentary, and because they are susceptible to the same groupthink, biases and laziness as the rest of us. That, too, is very human, as is hubristicly assuming you would do a better job over the long run. When I see two people on the timeline have the complete opposite understanding of what just occured in a replay, I know that the individual punters will not do a better job and that the utopia of a perfect player award system is as far away as ever.