Category Archives: Primers

Primer – TPR

For the third season in a row, I’m changing the player rating system. We mourn the passing of Statscore (not really) and PPG (again, not really) as we slowly converge on to a system that I can take for granted and don’t have to refine any further.

The core of the system hasn’t changed. The proposition is that there are important and unimportant statistics and that counting the important ones provides information about players and teams and can be predictive.

PPG was useful, and development and application through 2019 demonstrated that:

The last one should be taught in universities as a perfect example of ringing the bell at the top. Sheer narrative power subsequently forced Pearce back to mean and Brown onto the compost heap.

The mechanics of PPG have been preserved through TPR. My biggest issue is that when I wrote about production (that is, the accumulation of useful statistics), I didn’t have any units to work with. I originally didn’t think this would be a problem but it would make some things clearer if I did have units. So I took a leaf from the sciences and landed on naming it after the man that could do it all, David “Coal Train” Taylor.

Embed from Getty Images

“PPG”, which was Production – and not Points – Per Game, doesn’t make much sense now, so that’s been punted and replaced with TPR, or Taylor Player Rating. There has been a substantial change in the way I’d calculated WARG in the primer at the start of 2019 and the way I calculated it in Rugby league’s replacement player at the end. The latter method is now canonical but the name is going to stick.

In brief, TPR and WARG are derived through the following six steps:

  1. Run linear regressions to confirm which statistics correlate with winning percentage. The stats get distributed in to buckets and we review the success of teams achieving those statistics. One crucial change was to exclude any buckets from the regression with fewer than ten games in it. We end up with tries, running metres, kick return metres, post-contact metres, line breaks, line break assists, try assists, tackle busts, hit ups, dummy half run metres, missed tackles (negative), kick metres, forced drop outs, errors (negative) and, in Queensland only, penalties (negative) as having significant correlations out of the data provided by the NRL.
  2. Take the slope of the trendline calculated in the regression and weight it by its correlation (higher the correlation, the higher the weighting). Through this weighting, we develop a series of equivalences between stats. The below is shows the quantities required of each stat to be equivalent to one try in 2020:
    equivalences
  3. Players who accumulate these statistics are said to be generating production, which is now measured in Taylors, and is the product of the weighting/slope multiplied by the quantity of stats accumulated multiplied by 1000. However, due to the limitations of the statistics, some positions on the field generate significantly more Taylors than others.
    Average Taylors per game by position (1)
  4. To combat this, the production generated each game is then compared to the average production generated at that position (averaging previous 5 seasons of data in NRL, 3 seasons for State Cup). We make the same adjustments for time on field as in PPG and then divide by 10 for aesthetic purposes. The resulting number is the Taylor Player Rating, or TPR.
  5. We derive a formula for estimating win probability based on production for each competition and then substitute in a winning percentage of .083 (or two wins in twenty-four games, per the previous definition of a replacement-level team) and estimate the amount of production created by a team of fringe players against the competition average. This gives us a TPR that we can set replacement level at. The Taylors created over and above replacement level is added to the notional replacement level team’s production and the increase in winning probability is attributed to that player as a Win Above Reserve Grade, or WARG. Replacement level in TPR for the NRL is .057, Queensland is .072 and NSW is .070. The career WARG leaders are currently:
    career warg
  6. Finally, we go back and check that it all makes sense by confirming that TPR has some predictive power (~61% successful tipping rate, head-to-head) and there’s a correlation with team performance (~0.60 r-squared for team season production against team winning percentage).

For a more in-depth explanation, you can refer back to the original PPG primer. The differences between last year’s system and this year’s are slight and, for most intents and purposes, PPG and TPR are equivalent. Some of the changes are small in impact but important.

The most obvious change is the addition of NSW Cup data to the Queensland Cup and NRL datasets. This was driven by my interest in assessing the farm systems of each NRL club and you can’t make a decent fist of that if you’re missing twelve feeder clubs from the picture. It will also allow me to better test talent identification in the lower levels if I have more talents to identify and to better set expectations of players as they move between competitions.

For the most recent seasons, TPR only uses past data to calculate its variables, whereas PPG used all of the data available and created a false sense of success. A system that uses 2018 data to create after-the-fact predictions for the 2018 season isn’t going to give you an accurate view of how it will perform in 2019.

Finally, projecting player performance into the future is a pretty powerful concept, even if the tools for doing so are limited. I went back and re-derived all of the reversion-to-mean formulas used in The Art of Projection. It turns out that the constants for the projection formula don’t change much between seasons, so this is fixed across the datasets for now. It also turns out adjustments for age and experience are different and largely useless under the TPR system, such is the ephemera of statistical analysis.

One application for projections is that I’ll be able to run season simulations using the winning probability formula and team production that will be able to measure the impact of including or excluding a player on the outcome of a team’s season. It may not be super-accurate (the projections have large average errors) but it will be interesting. I also like the idea that out- or under-performance of projections as an assessment of coaching.

Finally, to reiterate things that I think are important caveats: TPR is a value-over-average rate statistic, while WARG is a volume statistic. No, statistics don’t tell the whole story and even these ones don’t measure effectiveness. Yes, any player rating system is going to have a certain level of arbitrariness to it because the system designer has to make decisions about what they consider important and unimportant. I’m fully aware of these things and wrote 1500 words accordingly at the end of the PPG primer.

A thing I’m trying to do this season is publish all of my rating systems on Google Sheets so anyone can have a look. You can see match-by-match ratings for NRL and the two State Cups if that’s your jam.

Primer – PPG and WARG

Turns out that StatScore didn’t pan out the way I had hoped. There were some conceptual errors but the biggest was that I wanted a measure of rate and a measure of volume and you can’t have one statistic that does both. It’s like having one number that meaningfully states that a boxer is both the best in the world pound-for-pound but also the best boxer in the world who can beat anyone irrespective of weight classes. The world doesn’t work like that. As a result, there was some okay, but not great, player analysis. Unfortunately, the creation of a new tool requires that you use it for a while on new scenarios in order to evaluate it’s usefulness. Sometimes it doesn’t pan out as well as you would have hoped.

Also, the name sucked.

So I went back to the drawing board. There were some salvageable concepts from StatScore that have been repackaged, with corrections made for some fundamental mistakes, and repurposed into new player rating systems: PPG and WARG.

moneyball

Read more

Primer – Poseidon ratings

Poseidon ratings are a new team rating system for both the NRL and the Queensland Cup.

For those who don’t have time to read 2000+ words, here’s the short version: the purpose of Poseidon ratings is to assess the offensive and defensive capabilities of rugby league teams in terms of the number of tries they score and concede against the league average. By using these ratings, we can estimate how many tries will be scored/conceded in specific match ups and then use that, with probability distributions, to calculate an expected score, margin and winning probabilities for the match-up.

poseidon

Read more

Primer – StatScore and Win Shares: rating NRL players

The biggest off-season story in the NRL was the transfers of Cooper Cronk from Melbourne to Sydney and then Mitchell Pearce from Sydney to Newcastle. From the Roosters’ perspective, for two players likely on similar pay packets, how did the Roosters decide one was better than the other? Then I wondered if it were possible to work out a way of judging value for money in player trades. It’s big in baseball, so why not rugby league? This led me to develop StatScore and Win Shares as ways to numerically evaluate rugby league players.

Read more

Primer – How do the Indices work?

If you’re wired for numbers, like I am, it can be hard to deal with people’s feelings and understanding why they think the things that they do. That’s why I’ve decided to quantify the feelings a team generates into five distinct indices: Power, Hope, Panic, Fortune and Disappointment.

Each index has two components. There’s a main mechanism for ranking the teams and some minor tie-breaking stats. The main mechanism typically uses Elo ratings to make an estimation of what we expect from a team, whether or not they are meeting that expectation and what that means for the season ahead. The tie-breakers are statistics used to award a few points here and there to help rank the teams should they have similar mechanism results.

Editor’s note: As much of last season’s material was influenced by The Arc, much of this season’s material owes a debt to SB Nation, including the idea of panic/hope indices.

Read more

Primer – Who are the Greeks? 2018 Edition

The Greeks is the collective name given to a series of Elo rating models for tracking performance of rugby league teams and forecasting the outcomes of games. I usually refer to them as if the philosopher himself was making the prediction, even though the Greeks have mostly been dead for a couple thousand years and certainly would never have heard of rugby league or Arpad Elo.

The differences between each Greek are on the subtle side, with the intention of measuring different things. You may want to revisit primers from last year:

Read more

Primer – Introducing Pythago World Rankings

While the recent RLWC was on, I couldn’t help but notice that the RLIF had Scotland pegged as the world’s fourth best team. Scotland hadn’t won a game since 2014 and even that was against Ireland. Since then, they’d lost to Australia, England, Ireland, Wales and France. I also got frustrated because a fifteen second Google didn’t reveal how the rankings are actually calculated.

So I figured I could come up with a better system. I did and this is how the Pythago World Rankings (PWR) work.

Read more

Primer – How does the Collated Ladder work?

The Collated Ladder takes in two inputs:

  • The projected number of wins for each club from the Stocky

Put simply, the Collated Ladder is an average of these two numbers, with a 2:1 weighting towards the output of the Stocky, rounded to the nearest whole number.

The Ladder is then based on sorting each team by its Collated number of wins, then by its Pythagoras projection, which is a loose analogue for for-and-against (the greater the number of wins projected, the better the team’s for-and-against will be).

Why bother with this if both systems have limitations and inaccuracies? Aren’t we just compounding that?

Read more

Primer – How does the Stocky work?

The Stocky, which is short for stochastic simulation, is a Monte Carlo simulation of the season using Elo modelling to work out what the outcome of that season might be.

The basic premise of a Monte Carlo simulation is that if you have a few pieces of the puzzle, an idea of how they relate and then throw enough random numbers at it, you’ll get a pretty good idea of what the puzzle picture is.

Let’s say you have a circle inside a square with sides the same length as the circle’s diameter. Then throw a bunch of sand onto the square/circle combination and count how many grains of sand end up in the circle. If you know the length of the square’s side and the proportion of sand that ends up in the circle, you can work out a value for π.

(You want more detail? Fine: the side of the square can be used to calculate the area of the square, multiply that by the proportion of sand inside the circle will give you an estimate of the circle’s area, divide the circle’s area by square of half the square’s length and you will get an estimate of π).

The more grains of sand you throw at the square/circle, the closer the estimate will be to the actual answer.

Read more

Primer – What variables do the Elo models use?

In my previous primer on Elo ratings, I talked about different ways of calculating Elo ratings with a view of measuring form and/or class. This primer will look in a bit more depth at how I arrived at the specific numbers for the variables.

The main variables in an Elo model are:

  • Starting ratings (discrete versus continuous)
    • If continuous, then the reversion to mean discount of ratings
  • Calculation method (margin vs result/WTA)
  • K, weighting for each game
  • h, homefield advantage
  • p, margin factor

Some are derived from game data, others from optimisation. Let’s tackle them one by one.

Read more

« Older Entries