Category Archives: Primers

Primer – SCWP

People familiar with my philosophy will know that I put less stock in wins than most people. The binary nature – you either take everything or get nothing – means that a simple win-loss record is not a particularly nuanced and, unless you have a very long timeframe to work with, doesn’t necessarily reflect teams’ actual talent over shorter timeframes. Points difference and by extension, Pythagorean expectation, does a better job of reflecting true team ability but even that can be affected by luck or odd results. Does a 50-0 scoreline really tell you any more than a 30-0 scoreline about the relative disparity in talent? If a team scores more tries but loses the game, what does that tell you?

Baseball and college football analysts have developed a metric called “second order wins“. The actual win-loss record are considered to be zeroth order wins (nomenclature that I use and probably no one else). Pythagorean wins, the number of wins expected based on the team’s Pythagorean expectation, are considered first order wins. Second order wins calculates a Pythagorean expectation, not based on actual points scored, but utilising advanced stats to calculate expected points. The idea is that these expected points are more repeatable, and less subject to good/bad luck, and provide a less wrong basis for estiamting teams’ true talent and forecasting teams’ performances on that basis.

For use in rugby league, I propose the following hierarchy:

  • 0th order wins – actual wins
  • 1st order wins – wins as calculated by Pythagorean expectation of points for and against
  • 2nd order wins – wins as calculated by Pythagorean expectation of SCWP (Should-a Could-a Would-a Points) for and against

Note that third order wins are second order wins adjusted for strength of schedule. I’m not really concerned with this right now, given that everyone in the NRL plays each other once and then mostly twice so it is of marginal value.

SCWP is not what I would call an advanced statistic because there’s only so much I can do with the data I have. I have taken two metrics for rugby league – metres gained, representing field position, and line breaks, representing playmaking – as our key statistics on which to estimate expected points. I briefly toyed with including tackle busts but it did not improve performance and I suspect we would get a similar result with other stats.

In a similar process to building up the Taylors system, I took every NRL game (2013 – 2021 rd10), QCup (2016 – 2021 rd 7), NSW Cup (2016 – 2021 rd 10) and Super League (2017 – 2021 rd 5) and calculated the running metres and line breaks for each team in each game. I put these games into buckets and then calculated the average score for the bucket with a minimum of five games. The net result is a near 1:1 relationship between metres/line breaks and points scored.

The trendline for these graphs allows us to calculate the Should-a Could-a Would-a Points (i.e. the expected points) that we would expect the team to have scored given the metres and breaks made. We take the (basically) average of the points expected by metres and the points expected by breaks, resulting in the SCWP for the game.

The question then might be, why? The 2nd order winning percentage, based on SCWP, has a lower mean absolute error (MAE) when compared to next year’s actual winning percentage than 0th or 1st order winning percentage. Over the 2013 to 2020 NRL, 2016 to 2019 state cup and 2017 to 2020 Super League seasons (n = 221), we find:

  • 0th order winning percentage has a MAE of .149 when compared to next season’s winning percentage (equivalent to 3.6 wins over a 24 game schedule)
  • 1st order winning percentage has a MAE of .132 (3.2 wins)
  • 2nd order winning percentage has a MAE of .122 (2.9 wins)

Each iteration lowers the error by 10% when forecasting. There’s an additional layer of linear regression that could be applied over the top and this might replace the now defunct Poisedon ratings in pre-season sims.

The decreasing error is partly due to an in-built regression to mean, as SCWP typically has a lower margin than actual points which reflects the fact that teams always put in some effort, even when they get shutout on the scoreboard, and partly because SCWP reflects repeatable statistics, whereas the scoring of actual points can be somewhat prone to randomness (“we would’ve won that if he hadn’t dropped the ball three times/missed those conversions”, hence the name).

The current state of SCWP in the NRL (round 11), compared to actual for-against:

For Super League (round 6):

For Queensland Cup (first part of round 8):

For NSW Cup (round 11):

There’s an additional layer of efficiency to consider. I don’t know if the ratio between actual points scored and SCWP will prove meaningful but if a team is consistently outscoring what we would expect considering the fundamentals, that might either give us a clue about their style of play or it might signal regression to mean. This is something to keep an eye on.

There’s every chance that a SCWP v2 might come forward in the future, based on actual advanced statistics. I of course reserve the right to tinker with my own systems but I’ll let you know when I do.

Primer – WCL

WCL is a means of estimating the probability of a team winning a rugby league match at a given point in the game.

The WCL system finds all instances of a given margin at a given point in the game in that league and calculates how often a team in that position won the game. For example, if 60% of teams who had a 6 point lead after 24 minutes, then we take that to mean that a team who has a 6 point lead after 24 minutes has a 60% chance of winning the game. From this we can build up in-game win probability charts, not unlike those you might have seen on Five Thirty Eight or similar.

It’s that simple. I have used some averaging to smooth out rough edges in the dataset (especially for odd-numbered margins) and where there are too few games in the sample that the model’s results do not make sense, I have edited some of these manually. For example, a one point lead from half time through 60 minutes into the game should not have a less than 50% win probability for the leading team but it apparently does in the NRL.

Note that 100% is only achieved at full time; the remainder of the game is never more than 99.9%. Even though this is not visible, it reflects the reality that our dataset does not cover all available possibilities.

While I could build a more sophisticated model that includes all sorts of other elements, I wanted a basic means to gauge the in-game win probability based on the scoreboard. I do not care what the pre-game odds are and I do not care about the “momentum” or other states of the game. The model is blind to the teams playing and is entirely dependent on the margin and time on clock.

WCL has no overall predictive power but it can graphically summarise a game quite well with a layer of information that simply plotting the margin does not. The sum of team’s win probability percentages at each minute of the game gives a WCL score, which is indicative of how dominant the team has been. A tight game will have each team’s score close to zero, while a perfectly dominant game will have the winning team’s score close to 50 and the loser’s close to -50.

There are separate WCL datasets for NRL, NSW Cup and Qld Cup, based on all matches from 2016 to date. There’s also a generalised men’s WCL set, which is the combination of all three that should be suitable for representative games that would otherwise have too small a sample size to work with or Super League should the need arise. I have been collecting NRLW and QRLW event data as well but there are too few games to form a proper dataset.

WCL stands for Worm Chess Lathe. Worm because the graphs resemble the worm from Australian TV political debates, which are meant to reflect audience responses live in real time. Chess and Lathe because the graphs sometimes resemble a chess piece in profile (bishops and queens, generally), as if it had been created with a wood lathe. The system needed a name and WCL is as good as any.

Generally speaking, this is for novelty purposes but it can also help us answer questions like the following –

Was Magic Round ruined by bins and send offs?

While we all enjoy the chaos of bins and send offs during live football, the fun does wear off somewhat after eight in two games, so yes. But were the game outcomes materially changed by the bins and send offs?

Ah! Well, nevertheless…

Some selected games of interest

Panthers vs Raiders, round 14, 2017

Raiders vs Warriors, round 3, 2018

Storm vs Panthers, grand final, 2020

Broncos vs Titans, round 8, 2021

Primer – Rismans

A big thank you to Lorna Brown (@_Lorna_Brown) who provided me with the dataset and whose ongoing updates to the same mean that we should be able to do some form of Super League player analysis. She has – presumably through some sort of black magic and/or competence with programming – managed to scrape a far more complete dataset out of the SL website than I managed to in previous attempts.

In short, a Risman is an English Super League equivalent to a Taylor. That is, it is a unit of measurement of rugby league production. Production is the accumulation of valuable work done on field as measured by traditional statistics.

The Risman, as a unit of production, is named for Gus Risman. He is a player whose name has largely stuck in my head due to Tony Collins’ podcast, Rugby Reloaded, wherein Collins makes the case that Risman is one of the all time great footballers of any code.

Gus Risman was one of the greatest of Cardiff’s rugby codebreakers. The son of immigrants who grew up in Tiger Bay, he played top-class rugby league for more than a quarter of a century, was a Championship and Challenge Cup winner with two clubs, and captained the 1946 Lions. Not only that, but he also captained the Wales in war-time rugby union internationals while a rugby league player.

Rugby Reloaded #138

As with Dave Taylor, the unit of production is named for a player who can do it all.

The Risman is derived by running linear regressions to confirm which statistics from the Super League dataset correlate with winning percentage. The stats get distributed in to buckets and we review the success of teams achieving those statistics (minimum ten games in the bucket). The result is that tries, try assists, missed tackles, tackle busts, metres, clean breaks and errors (negative) have significant correlations with winning. This is considerably less than the NRL dataset offers, which is why I’ve opted to give these production units a different name; Rismans don’t quite measure the same stuff as Taylors.

We multiply the stat by the slope of the trendline calculated in the regression and a weighting proportional to its correlation to winning (higher the correlation, the higher the weighting) and then by 1000.

Through this product of slope and weighting, we develop a series of equivalences between stats and can compare this across leagues. The following shows the quantity of each stat a player needs to accumulate to be equal to the same production as scoring one try for the 2021 season. The NRL’s values are calculated on the dataset of the five previous seasons, while the others are based on the three previous seasons (State Cups just roll over what the weightings should have been for 2020 to 2021, given they didn’t play last year).

For the record, a try is worth 8.7 Taylors in the NRL, 8.4 Ty in QCup, 7.5 Ty in NSW Cup but a whopping 17.3 Rismans in Super League. This, of course, doesn’t mean anything as Taylors and Rismans have no real world value.

Due to the limitations of the dataset, we can only calculate raw production. Without positional information or time on field, it is not possible to calculate more exotic ratings like an English TPR equivalent, Wins Above Reserve Grade or undertake pre-season projections.

Raw production is still somewhat useful and if nothing else, I think it will likely come in handy for assessing squad strength at the next World Cup. Teams with superior production, as calculated post-game, win 90% of their games.

The average player generates approximately 20 Rismans per game and for players with fewer than ten games, this figure is used until a reasonable sample size can be drawn upon for that player. Based on the actual 17 fielded, teams with the superior expected Rismans, as estimated pre-game by the sum of each player’s prior career average Rismans per game, has a 63.8% successful tipping rate (n=131). This is comparable to using Taylors in the NRL. Using the same formula for the NRL and the above, we can estimate a pre-game winning probability for a given line-up (re-deriving this formula based on the small SL sample meant that the team with more expected Rismans had a lower winning probability when teams were closely matched, which doesn’t make sense).

I posted a leaderboard of players by total Rismans up to round 5 of the Super League. As a not particularly close observer of that part of the game – I still perhaps have a better idea of what’s going in England than in NSW Cup – most of the top twenty at least rang bells as players I’d heard of.

I would have included an update for round 6 but the Super League website does not have any stats listed for the Leeds-Wakefield game, except for who scored the tries. So we must bear in mind that the dataset has some fairly significant limitations, not just in scope, but in completeness. For example, some of the Qualifiers games have been included but a lot, particularly those involving Championship teams, were not. Stats avilability for finals games seems to be hit and miss.

There’s also probably something to be said for different positions accumulating different typical quantities of production but without an independent arbiter of who plays what position, I’m choosing to be blind to this because I refuse to do this manually.

Nonetheless, here’s the all-time (2017 to 2021 round 6) Risman leaderboard.

As a couple of reference points, George Williams’ 43.1 Rs/gm has translated into a TPR of .119 at NRL level. This should be exciting for Tigers fans, as Oliver Gildart will presumably perform at a similar level when he joins Wests next year based on his 43.3 Rs/gm. Undermining that somewhat is Jackson Hastings’ 56.4 Rs/gm, the second highest of any player with at least 50 games, behind Greg Eden, compared to his career .052 TPR in the NRL and .080 TPR in NSW Cup. Hastings is also en route to the Tigers in 2022. Whether a real correspondence between different leagues’ ratings can be derived will probably depend on sourcing more information to bolster the dataset but it should be interesting to see how those signings pan out in the meantime.

Primer – TPR

For the third season in a row, I’m changing the player rating system. We mourn the passing of Statscore (not really) and PPG (again, not really) as we slowly converge on to a system that I can take for granted and don’t have to refine any further.

The core of the system hasn’t changed. The proposition is that there are important and unimportant statistics and that counting the important ones provides information about players and teams and can be predictive.

PPG was useful, and development and application through 2019 demonstrated that:

The last one should be taught in universities as a perfect example of ringing the bell at the top. Sheer narrative power subsequently forced Pearce back to mean and Brown onto the compost heap.

The mechanics of PPG have been preserved through TPR. My biggest issue is that when I wrote about production (that is, the accumulation of useful statistics), I didn’t have any units to work with. I originally didn’t think this would be a problem but it would make some things clearer if I did have units. So I took a leaf from the sciences and landed on naming it after the man that could do it all, David “Coal Train” Taylor.

Embed from Getty Images

“PPG”, which was Production – and not Points – Per Game, doesn’t make much sense now, so that’s been punted and replaced with TPR, or Taylor Player Rating. There has been a substantial change in the way I’d calculated WARG in the primer at the start of 2019 and the way I calculated it in Rugby league’s replacement player at the end. The latter method is now canonical but the name is going to stick.

In brief, TPR and WARG are derived through the following six steps:

  1. Run linear regressions to confirm which statistics correlate with winning percentage. The stats get distributed in to buckets and we review the success of teams achieving those statistics. One crucial change was to exclude any buckets from the regression with fewer than ten games in it. We end up with tries, running metres, kick return metres, post-contact metres, line breaks, line break assists, try assists, tackle busts, hit ups, dummy half run metres, missed tackles (negative), kick metres, forced drop outs, errors (negative) and, in Queensland only, penalties (negative) as having significant correlations out of the data provided by the NRL.
  2. Take the slope of the trendline calculated in the regression and weight it by its correlation (higher the correlation, the higher the weighting). Through this weighting, we develop a series of equivalences between stats. The below is shows the quantities required of each stat to be equivalent to one try in 2020:
  3. Players who accumulate these statistics are said to be generating production, which is now measured in Taylors, and is the product of the weighting/slope multiplied by the quantity of stats accumulated multiplied by 1000. However, due to the limitations of the statistics, some positions on the field generate significantly more Taylors than others.
    Average Taylors per game by position (1)
  4. To combat this, the production generated each game is then compared to the average production generated at that position (averaging previous 5 seasons of data in NRL, 3 seasons for State Cup). We make the same adjustments for time on field as in PPG and then divide by 10 for aesthetic purposes. The resulting number is the Taylor Player Rating, or TPR.
  5. We derive a formula for estimating win probability based on production for each competition and then substitute in a winning percentage of .083 (or two wins in twenty-four games, per the previous definition of a replacement-level team) and estimate the amount of production created by a team of fringe players against the competition average. This gives us a TPR that we can set replacement level at. The Taylors created over and above replacement level is added to the notional replacement level team’s production and the increase in winning probability is attributed to that player as a Win Above Reserve Grade, or WARG. Replacement level in TPR for the NRL is .057, Queensland is .072 and NSW is .070. The career WARG leaders are currently:
    career warg
  6. Finally, we go back and check that it all makes sense by confirming that TPR has some predictive power (~61% successful tipping rate, head-to-head) and there’s a correlation with team performance (~0.60 r-squared for team season production against team winning percentage).

For a more in-depth explanation, you can refer back to the original PPG primer. The differences between last year’s system and this year’s are slight and, for most intents and purposes, PPG and TPR are equivalent. Some of the changes are small in impact but important.

The most obvious change is the addition of NSW Cup data to the Queensland Cup and NRL datasets. This was driven by my interest in assessing the farm systems of each NRL club and you can’t make a decent fist of that if you’re missing twelve feeder clubs from the picture. It will also allow me to better test talent identification in the lower levels if I have more talents to identify and to better set expectations of players as they move between competitions.

For the most recent seasons, TPR only uses past data to calculate its variables, whereas PPG used all of the data available and created a false sense of success. A system that uses 2018 data to create after-the-fact predictions for the 2018 season isn’t going to give you an accurate view of how it will perform in 2019.

Finally, projecting player performance into the future is a pretty powerful concept, even if the tools for doing so are limited. I went back and re-derived all of the reversion-to-mean formulas used in The Art of Projection. It turns out that the constants for the projection formula don’t change much between seasons, so this is fixed across the datasets for now. It also turns out adjustments for age and experience are different and largely useless under the TPR system, such is the ephemera of statistical analysis.

One application for projections is that I’ll be able to run season simulations using the winning probability formula and team production that will be able to measure the impact of including or excluding a player on the outcome of a team’s season. It may not be super-accurate (the projections have large average errors) but it will be interesting. I also like the idea that out- or under-performance of projections as an assessment of coaching.

Finally, to reiterate things that I think are important caveats: TPR is a value-over-average rate statistic, while WARG is a volume statistic. No, statistics don’t tell the whole story and even these ones don’t measure effectiveness. Yes, any player rating system is going to have a certain level of arbitrariness to it because the system designer has to make decisions about what they consider important and unimportant. I’m fully aware of these things and wrote 1500 words accordingly at the end of the PPG primer.

A thing I’m trying to do this season is publish all of my rating systems on Google Sheets so anyone can have a look. You can see match-by-match ratings for NRL and the two State Cups if that’s your jam.

Primer – PPG and WARG

Turns out that StatScore didn’t pan out the way I had hoped. There were some conceptual errors but the biggest was that I wanted a measure of rate and a measure of volume and you can’t have one statistic that does both. It’s like having one number that meaningfully states that a boxer is both the best in the world pound-for-pound but also the best boxer in the world who can beat anyone irrespective of weight classes. The world doesn’t work like that. As a result, there was some okay, but not great, player analysis. Unfortunately, the creation of a new tool requires that you use it for a while on new scenarios in order to evaluate it’s usefulness. Sometimes it doesn’t pan out as well as you would have hoped.

Also, the name sucked.

So I went back to the drawing board. There were some salvageable concepts from StatScore that have been repackaged, with corrections made for some fundamental mistakes, and repurposed into new player rating systems: PPG and WARG.


Read more

Primer – Poseidon ratings

Poseidon ratings are a new team rating system for both the NRL and the Queensland Cup.

For those who don’t have time to read 2000+ words, here’s the short version: the purpose of Poseidon ratings is to assess the offensive and defensive capabilities of rugby league teams in terms of the number of tries they score and concede against the league average. By using these ratings, we can estimate how many tries will be scored/conceded in specific match ups and then use that, with probability distributions, to calculate an expected score, margin and winning probabilities for the match-up.


Read more

Primer – StatScore and Win Shares: rating NRL players

The biggest off-season story in the NRL was the transfers of Cooper Cronk from Melbourne to Sydney and then Mitchell Pearce from Sydney to Newcastle. From the Roosters’ perspective, for two players likely on similar pay packets, how did the Roosters decide one was better than the other? Then I wondered if it were possible to work out a way of judging value for money in player trades. It’s big in baseball, so why not rugby league? This led me to develop StatScore and Win Shares as ways to numerically evaluate rugby league players.

Read more

Primer – How do the Indices work?

If you’re wired for numbers, like I am, it can be hard to deal with people’s feelings and understanding why they think the things that they do. That’s why I’ve decided to quantify the feelings a team generates into five distinct indices: Power, Hope, Panic, Fortune and Disappointment.

Each index has two components. There’s a main mechanism for ranking the teams and some minor tie-breaking stats. The main mechanism typically uses Elo ratings to make an estimation of what we expect from a team, whether or not they are meeting that expectation and what that means for the season ahead. The tie-breakers are statistics used to award a few points here and there to help rank the teams should they have similar mechanism results.

Editor’s note: As much of last season’s material was influenced by The Arc, much of this season’s material owes a debt to SB Nation, including the idea of panic/hope indices.

Read more

Primer – Who are the Greeks? 2018 Edition

The Greeks is the collective name given to a series of Elo rating models for tracking performance of rugby league teams and forecasting the outcomes of games. I usually refer to them as if the philosopher himself was making the prediction, even though the Greeks have mostly been dead for a couple thousand years and certainly would never have heard of rugby league or Arpad Elo.

The differences between each Greek are on the subtle side, with the intention of measuring different things. You may want to revisit primers from last year:

Read more

Primer – Introducing Pythago World Rankings

While the recent RLWC was on, I couldn’t help but notice that the RLIF had Scotland pegged as the world’s fourth best team. Scotland hadn’t won a game since 2014 and even that was against Ireland. Since then, they’d lost to Australia, England, Ireland, Wales and France. I also got frustrated because a fifteen second Google didn’t reveal how the rankings are actually calculated.

So I figured I could come up with a better system. I did and this is how the Pythago World Rankings (PWR) work.

Read more

« Older Entries