How much does Origin affect NRL club form?

Sometimes I take reader requests, although sometimes I can’t always meet the brief.

This was the closest I could do for a stat analysis of the next Immortal candidates. Sorry. pic.twitter.com/96atyhaF0G

— Liam (@pythagoNRL) June 26, 2018

And sometimes I read or hear someone ask a question and think, “hey, that’s an interesting idea. I’ll take a look.” A while back on Inside Sport’s Dead In Goal podcast, Jeff Centenera asked how much Origin affects team form. We normally associate heavy Origin loading as corresponding with poor performance of the club team, devoid as it is of its star power and typically relying on inexperienced youngsters to fill the gaps. Being a Broncos fan, I am as familiar with this phenomenon as it is possible to be without actually playing the game.

But I thought that was a question worth working through.

Embed from Getty Images

We get into our time machine and look at each season from 1988 onwards, skipping 1997 due to the split competition. I picked this as a starting point because it’s the year that the Broncos, Knights and Giants enter the league and Origin has more or less taken its modern shape. For each team, I counted how many players appeared in each Origin game, as per Wikipedia. If the same player appeared in all three games, that counts as three appearances from that club for that season. I use this as a proxy for the Origin loading of each club.

I split each season into three phases: the pre-Origin, the inter-Origin and the post-Origin periods. The pre-Origin period ends in the round that commences immediately before the first Origin game and the inter-Origin periods ends with the round that commences directly after the final game. In the old days, the round might carry over two weekends (rather than splitting into two distinct rounds as we do these days), so there’s a small part of the dataset that is classified as pre-Origin but actually occurs during the second phase.

For each phase, I calculated each team’s win percentage. I didn’t want to use Elo ratings for this partly because the Greeks only start calculating in 1998 and partly because Archimedes purposely underweights the inter-Origin phase of the season and Eratosthenes more or less reflects the win record. Win percentage is a more appropriate tool for this exercise. To measure the potential change in form, I calculated the differences in win percentage between phases. Delta (Δ) 1 is the difference between pre and inter, delta 2 is between post and inter and delta 3 is the gap between pre and post win percentages. There’s too many records to post them all, but I have summarised club averages in the below table:

The Gold Coast stats include the Seagulls/Chargers/Giants franchise and the current Titans franchise, while the Manly records includes their stint as the Northern Eagles.

I wouldn’t get too bogged down in specific elements of this table. That Brisbane averages a .096 lower win percentage during Origin compared to pre-Origin, is but one datum. That the Warriors do the opposite to the tune of .069 is another. The averages conceal the variance from year to year.

The 1995 series is especially interesting and worth an aside. Four new clubs had been added to the newly rebranded premiership. Super League signed players were overlooked for selection and a single Bronco was selected for Origin; Gavin Allen hadn’t signed with Super League due to his impending retirement. A slew of North Sydney and South Queensland players filled the void left by the Broncos in the Maroons lineup. Manly and Canberra were on for incredible years, both finishing 20-2, but the ARL’s Manly made a major contribution to the line-ups and Super League’s Canberra had none.

(It might have also been the year that I correctly picked the finals bracket as an eight year old, foreshadowing a later move into rugby league analytics as a thirty year old)

The Broncos, despite having one out due to the 1995 equivalent of Sam Thaiday, had a shocking inter-Origin period. North Sydney got better without their Origin stars. The Crushers meandered along and the Sea Eagles seemed unaffected one way or another. I think this amply demonstrates the fickleness of the year-to-year results.

What I want to do is not be distracted by anecdotes and see what correlations arise between Origin appearances and win percentages in each phase of the season across the whole dataset.

A brief reminder about R-squared, or the correlation coefficient. This is a number between zero and one that shows the degree of correlation between two variables. The closer to one the R-squared is, the tighter the correlation. Normally, I wouldn’t get too excited about anything less than 0.2, as any two things can be correlated and a low level of correlation is as indicative of noise in the data than any actual connection. Remember that correlation is not causation and to prove causation, we would need to do even more statistics that I’m not too interested in doing.

The first test is comparing the overall season win percentage against Origin appearances. These data have an R-squared of 0.22, which is a weak-to-moderate correlation. Obviously, there’s more to a team’s performance across a season than how many Origin appearances the players made but it’s not a bad proxy and suggests that the appearances metric will be useful. Additionally, the trend line shows that the correlation is positive, i.e. more appearances means a higher win percentage. This is all fairly intuitive as we would expect teams with lots of Origin players to win more games and teams with fewer or none to win fewer games.

The R-squared for pre-Origin win percentage versus appearances is relatively high at 0.20. This makes sense as selectors will choose good players from teams that are performing well and normally, you would conclude that the teams that are performing well are the teams winning games in the early phase of the season.

After that, the correlation basically disappears. The inter-Origin phase has an R-squared of 0.05 and the post-Origin phase, 0.11. These hint at relationships but there is nowhere near enough of a correlation to draw any conclusions.

This is not what we would have expected. We expected there to be an inverse relationship between the inter-Origin win percentage and Origin appearances. That is, as more players play more Origin games from given teams, we expect them to lose games, particularly in the inter-Origin phase. There should be a negative correlation but there isn’t anything. Strange.

Even looking at changes in form, there is basically no correlation between each of the deltas – representing the change in win percentage from phase to phase of the season – and the number of Origin appearances. Delta 1, the difference between pre- and inter-Origin win percentages, has the best R-squared of 0.02, which is nothing. We can’t even draw a conclusion as to the true directions of the trendlines based on this.

Just to make sure, I isolated teams with ten or more Origin appearances in one year to see if there was a specific effect on higher calibre teams. The result: none, nada, zéro, bubkes correlation.

There could be two mechanisms to explain this. Normally, the inter-Origin period lasts about six or seven weeks. In that time, Origin players will typically miss three or four games or so and play in the other four or three. In the games with Origin players, the team will resume their pre-Origin form and otherwise lose games. Averaged across six or seven rounds, this level of performance looks a lot like the other, non-Origin impacted teams’, who continue to perform at their usual level. That all teams play the same makes it difficult to distinguish a correlation with Origin appearances.

Alternatively, and this is my preferred explanation, the results of the inter-Origin period are highly randomised. That is, the selection of Origin players doesn’t necessarily have any specific negative impact on performance but rather the replacement players are capable of winning games and losing them in an unknowable proportion. Match outcomes become a coin flip. Part of that will be due to factors outside the team’s control. The draw, for example, will have a huge impact on a team’s success through six games, as well the strength, experience and coaching of reserve grade options and, the worst of all, injuries. The net result is that match outcomes are so unpredictable that any relationship we expect to occur is buried in randomised noise.

But the more important takeaway from this exercise is that having Origin players is usually a net benefit. We see this in the positive correlation between season win percentage and Origin appearances. Performance during a specific inter-Origin period may suffer (or it may not, depending on how the chips fall) but overall, average performance is improved by having Origin players in your squad. We can’t draw the conclusion that some clubs would have won a bunch more premierships if Origin didn’t exist but we can conclude that, over the long run, season outcomes didn’t suffer for having Origin players in the squad.

Just for interest, I’ve plotted each club’s Origin appearances each year since 1988 (or when the club was founded), along with their end of season Eratosthenes rating. The latter only starts at 1998, with the founding of the NRL, but I think it clearly demonstrates that the more Origin players you have, or if they’re good enough, the more times they appear, the better the team plays. It probably also says a lot about selectors.