1. ## Sabermetrics in Cricket

Recently, there has been a great deal of talk regarding the limitations of the batting average, particularly in one-day cricket. In Test cricket, the batting average is a sound indicator of the batsman's ability: the rate at which runs are scored, while becoming more important, is still nowhere near as important as the amount which are scored and not-outs rarely distort the averages of top order batsmen. However in one-day cricket, a 25-ball fifty is often worth much more than a laboured 150-ball hundred. Due to teams being less likely to be bowled out in fifty overs, there is a large number of not-outs among the top order batsmen. This has led to the strike rate being considered almost as important by some pundits as the average itself.

In baseball, there have been similar problems. The basic batting average did not give enough detail about the player's true value to the team and failed to show his style of play: was he the classic slogger or the dasher that enabled other players to move around the diamond. Therefore statisticans have developed a number of formulae to demonstrate different aspects of the player numerically, the study of which came to be known as sabermetrics. Rather than just concentrate on a couple of factors to find a value, some sabermetrics use large formulae, some complex, others basic.

Therefore, with a bit of time on my hands, I decided to see how I could manipulate the figures of leading ODI batsmen to find several values that, used either in conjunction or alone, would paint a better numerical picture of the type of ODI batsman the player in question is.

Please note that as yet my formulae have only had a few hours work on them and there is definite scope for improvement (which is why I'm posting them here).

I took the career ODI batting statistics from 53 leading batsmen from the leading eight nations. The cut-off point for what defines a batsman was roughly 50 completed innings with an average in excess of 25, though exceptions were made for the likes of Pietersen, Afridi, Dhoni and Hamish Marshall. These stats were taken from The Cricinfo Guide to International Cricket 2007 and only include ODIs up to mid-September 2006.

If the basic batting average was no longer the most useful statistic, then I needed something to replace it, showing the true value of a batsman to his team. Therefore I introduce the total batting value or TBV.

To begin with, a few terms I made up need explaining:
Regular average - the standard batting average (runs divided by times out).
Raw average - runs scored divided by innings batted, hence removing the not-outs.
100s per innings - the number of hundreds scored divided by innings batted.
50s per innings - the number of fifties scored divided by innings batted.

The sum of the raw average multiplied by four and the regular average multiplied by one is divided by five to give the weighted average.
(raw*4 + reg)/5

The 100s per innings (100/I) is multiplied by 450, to give the weighted 100/I.
100/I*450

The 50s per innings (50/I) is multiplied by 150 to give the weighted 50/I.
50/I*150

Matches played is also factored into the formula to reduce the anomalies caused by players with short careers (e.g. Pietersen, Dhoni).

Strike rate (SR) is divided by two to give the weighted SR
(runs/balls*100)/2

The TBV is composed as follows:
0.65 - weighted average
0.20 - weighted SR
0.09 - weighted 100/I
0.04 - weighted 50/I
0.02 - matches played

The entire formula is therefore:
(0.65*weighted average)+(0.20*weighted SR)+(0.09*weighted 100/I)+(0.04*weighted 50/I)+(0.02*matches played)
.

My spreadsheet shows players' performances over their entire career, so Tendulkar therefore tops the pile. Pietersen is second. While the differences in the players' ratings may seem very small, they are unlikely to fluctuate as drastically as a batting average. At the moment the only problem is trying to strike a balance between removing the anomalies caused by a short career by making experience more important (which puts Tendulkar, Jayasuriya, Inzamam to high) and removing too much emphasis on experience (which puts Dhoni to high). It tends to work best with top order players who have played over 100 ODI (Nico Boje, for example, is too high, as is Shaun Pollock).

I'd appreciate some feedback here - if you can try making subtle adjustments to the formula. I've already realised that it fails to take in the match conditions (e.g. runs scored on Asian flattracks are worth as much as those in New Zealand), but I think it is difficult to do so without a huge formula and lots of time. It looks like an improvement on the simple batting average anyhow. It is not an effort to show how 'good' a batsman is, but how useful he is to his team.

I've also been working on a 'slog coefficient', but that requires a bit more work.

Thanks to Hakon to his help and suggestions.

2. If you havent already, I suggest you read Moneyball. Its a very important book on how the correct usage of stats is a far better gauge of potential and production than scouting and eye witness accounts and traditional stats such as batting average.

I have similarly spent a lot of time searching for the correct way of analysing cricket stats to give predictors and accurate evaluation etc. I think I have come close.

These type of things should only ever be used to look at production and value rather than ability. Talent and ability are qualatitive where as production and value can be measured. The difficulty is finding the right method.

EDITED- As I realised I contradicted myself over the word ability

3. George, this is really top-class stuff. I'm amazed by the accuracy of your method - I can't think of any obvious weaknesses that wouldn't be evened out over the course of a player's career.

Fascinating work.

4. 'Kin hell mate, quite something this is. Looks pretty bang-on as well, as far as I can get my hear round it.

Ultimately though, I doubt we'll ever find something with cricket to match up with some of the numbers they churn out for baseball. Baseball is so much easier to do things like this with because of the way batting is split up, and because it's easy to record what happens each time.

5. Originally Posted by Goughy
If you havent already, I suggest you read Moneyball. Its a very important book on how the correct usage of stats is a far better gague of potential and ability than scouting and eye witness accounts and traditional stats such as batting average.

I have similarly spent a lot of time searching for the correct way of analysing cricket stats to give predictors and accurate evaluation etc. I think I have come close.
I'll have a look at the Moneyball book you mention, presumably it's about baseball statistics? I probably need to get a better grasp of baseball beforehand though, which I'm intending to do this year.

These type of things should only ever be used to look at production and value rather than ability. Talent and ability are qualatitive where as production and value can be measured. The difficulty is finding the right method.
I fully agree with you there - statistics should only be used for that reason. You can easily tell that there are players with much better abilities lower down the table than those higher up. From the limited amount of baseball I have watched, it appears to be harder to work out how much effect certain players are having on the game, which is less so than cricket (I may be wrong here). Rather than reflect the batsman's ability, the TBV shows his usefulness to the team - the rate at which he scores his runs, the amount he scores and the number of large innings he plays.

6. Originally Posted by The Baconator
'Kin hell mate, quite something this is. Looks pretty bang-on as well, as far as I can get my hear round it.

Ultimately though, I doubt we'll ever find something with cricket to match up with some of the numbers they churn out for baseball. Baseball is so much easier to do things like this with because of the way batting is split up, and because it's easy to record what happens each time.
There is also much more raw data is baseball, whereas in cricket all we have is innings, balls, runs and the denominations the runs are scored in.

7. Strauss 38. Collingwood 34.

8. Originally Posted by Jungle Jumbo
I'll have a look at the Moneyball book you mention, presumably it's about baseball statistics? I probably need to get a better grasp of baseball beforehand though, which I'm intending to do this year.
Yeah, the Moneyball book sounds pretty good from what I've read, I'll probably try and get hold of it at some point this year. It's about a manager of the Oakland Athletics (A's), who managed to get together a top side on a much lower budget than yer Yankees and Red Sox. I think the idea is he discarded the usal methods of judging players and used less common ones which show a player's actual effectiveness better.

9. Originally Posted by Neil Pickup
Strauss 38. Collingwood 34.
That's the effect of the raw average.

10. Originally Posted by The Baconator
Yeah, the Moneyball book sounds pretty good from what I've read, I'll probably try and get hold of it at some point this year. It's about a manager of the Oakland Athletics (A's), who managed to get together a top side on a much lower budget than yer Yankees and Red Sox. I think the idea is he discarded the usal methods of judging players and used less common ones which show a player's actual effectiveness better.
Ive had the book a few years and read it many times.

TBH, the true value and aim of the systems they use is to be able to identify athletes and to get players they rank highly at below market value, as the non-statistical and eyeball methods other teams used overvalued certain skills. This allows them to build quality teams at a fraction of the price of other teams.

If you buy into it (as I do) its pure genius.

11. Originally Posted by Goughy
Ive had the book a few years and read it many times.

TBH, the true value and aim of the systems they use is to be able to identify athletes and to get players they rank highly at below market value, as the non-statistical and eyeball methods other teams used overvalued certain skills. This allows them to build quality teams at a fraction of the price of other teams.

If you buy into it (as I do) its pure genius.
Do you think that sabermetrics could (and should) be applied successfully to cricket then? Or do the diffrent systems of the domestic game vs MLB make it pointless?

12. Originally Posted by Jungle Jumbo
Do you think that sabermetrics could (and should) be applied successfully to cricket then? Or do the diffrent systems of the domestic game vs MLB make it pointless?
Ive been working on it for a long time and I think the answer is probably yes. However, despite the full notepads of research, I can't find anything as simple as the usage of On Base Percentage (OBP) in baseball. Ive gone through many areas and looked hard and I think I will find something, however there is nothing clean enough to release ATM.

There are many trends in domestic cricket that can apply and be used to predict. However, depressingly FC average (using CC are a sample) is historically one of the best.

13. The hardest thing I found was trying to put logical numbers together. Anyone can produce a figure based on a few numbers thrown together, but they have to be there for a reason. My reasoning for the TBV formula was that average runs scored was still the primary factor, followed by rate of scoring and then by the frequency of the big, match winning scores. A case could probably be made for putting more emphasis on any of them.

14. moneyball doesnt work, and sabermatrics is given waaay too much weight in baseball. It tries to place number values and attempts to quantify things that cant be (in my opinion anyway) The reason that the A's and the blue jays, (2 teams that have tried following it) have not won anything in the last 10 years.

according to money ball:
You dont need a closer, cause the last 3 outs are just important as the outs in the 7th inning.
No sacrifiing bunts/steals

bunting/stealing is one of the most important parts of baseball, and it is that type of play that wins during the postseason.

15. Interesting stuff...

I'd personally prefer it if you just created some kind of figure that combined average and strike-rate, tho, myself.

And obviously those averages would be first-chance averages... and the SR considering first-chance innings...

Page 1 of 3 123 Last