# Thread: Fielding Statistics: A New Approach

1. ## Fielding Statistics: A New Approach

Fielding statistics in cricket are borderline useless when trying to rate the quality of a fielder currently. Catches/match are heavily influenced by the quality of the bowlers involved (and how many opportunities they create) and fielding position. Even assuming that wasn't an issue, taking a catch by itself doesn't tell you much - it might've been a dolly that anyone could've taken. What really changes matches are those spectacular catches, and even more, it's the catches that should've been taken but are dropped.

The picture isn't any better when it comes to ground fielding. There is no measure whatsoever on how much value a great fielder brings to a team - do they save 3-4 runs on average in an ODI or 10-12 or even more? How much of a difference in terms of runs does a great fielding side bring to the table compared to a poor one? How can you properly value a player without this information? Baseball does a much better job with this with error statistics - partly the reason why ground fielding is the generally accepted aspect of the game in baseball that is considered superior to cricket. When there are numbers involved, fielding performance improvements can be expected down the line.

Resolving these issues with statistics has multiple obvious issues:
• There are no "dropped catch" or "missed stumping" numbers stored anywhere
• No data on how much a misfield cost or how much some great fielding saved
• No standard way to rate the difficulty of an opportunity: taking a dolly vs a tough chance should not be rated the same

So how do we go about resolving these issues without resorting to subjective difficulty measures and hiring monkeys to watch videos of old matches painstakingly? There really isn't a simple answer to all of the issues above, but a first foray into generating meaningful fielding statistics is not something that is impossible.

How would you go about doing this?

Three words - Live commentary feeds.

The steps that would need to be taken:
• Scrape commentary feeds for specific terms such as "dropped", "missed", "misfield", "great catch", etc.
• Parse out the fielder involved from the commentary
• In the case of a dropped chance, rate the significance of the drop in terms of the quality of the batsman involved, the difficulty of the chance (if possible) and how much the drop hurt the team
• In rating the difficulty of catches, attempt to be subjective based on a few simple rules - say if only fingertips were involved classify the chance as difficult, if not classify as straightforward (admittedly ignores the case where the fielder is late to the ball)
• In rating the difficulty of ground fielding, if any sliding is involved classify as difficult, if not classify as straightforward (assuming the commentary has this information - if not, assume average difficulty)
• For misfields or great saves, default to a certain number of runs saved/given if explicit values are not found
• Generate drops/match, runs saved/match (possibly negative), fielder reliability ratings and statistics

I plan to attempt this holy endeavor sometime this year when I can spare the time with the blessings of the CW gods. Any comments, advice, tips or suggestions are appreciated. I only hope to survive this epic journey I am about to undertake.

When and if these preliminary measures are shown to have value, there would be impetus to go back in cricket video history and document each fielding related event more accurately - but that is an issue for another day.

2. I admire your determination viraya, but am genuinely fearful that the words "first chance average" are going to come into a thread title near here very soon

3. Cricket stats gathering is going to increase massively in the near future, IMO.
Cricinfo has recently added in their pitch map, and occasionally when hawkeye is available that's in there too.

I think we'll get to a point where a fielder moving 2 metres to his right to gather the ball will be recorded.

4. Record something important like who sledged who and how effective it was or wasn't.

5. Baseball fielding statistics are easier to generate because generally fielders are in a fixed position making them easier to compare across players. I think the problem you have is trying to accommodate too many variables like the quality of batsman and catch difficulty. These things are complex and should even out over time anyway if you need to compare players.

A nice statistic would be something like errors in a certain position. I wouldn't be bothered with trying to work out runs saved, but treat every fielding chance as an instance and then just count the number of misfields/drop catches. Same with the significance of a dropped catch - dropping a number 11 should be the same as dropping a top-order player.

6. Originally Posted by hendrix
I think we'll get to a point where a fielder moving 2 metres to his right to gather the ball will be recorded.
We already have the tech to do this, will be cool to get something like a fielding range statistic.

7. Originally Posted by RossTaylorsBox
Baseball fielding statistics are easier to generate because generally fielders are in a fixed position making them easier to compare across players. I think the problem you have is trying to accommodate too many variables like the quality of batsman and catch difficulty. These things are complex and should even out over time anyway if you need to compare players.

A nice statistic would be something like errors in a certain position. I wouldn't be bothered with trying to work out runs saved, but treat every fielding chance as an instance and then just count the number of misfields/drop catches. Same with the significance of a dropped catch - dropping a number 11 should be the same as dropping a top-order player.
Yea the first step would be to just get the raw stats like drops/misfields etc.. if it's possible to parse the players involved there's no reason not to though. For example Sanga was dropped by Mitch yesterday, and the commentary had:

20.5 Williamson to Sangakkara, no run, sweeps hard at this fuller one going down the leg side, but ends up offering a chance to McClenaghan at backward point, who ends up shelling the chance. Would have been a significant double-blow had it come off for NZ!

Sanga was at 53 and went on to make 76, the keyword here is "shelling", there is no other player name in the commentary except "McClenaghan" so it would make sense to assume that he was the culprit. All this info should be relatively straightforward to parse.

8. The runs scored after the drop and the quality of the batsman involved shouldn't impact the fielding analysis.

Edit: Other than, I guess, that a better batsman is likely to offer a more difficult chance, but that should be contained within the subjective assessment of the difficulty of the chance itself.

9. Originally Posted by viriya
Yea the first step would be to just get the raw stats like drops/misfields etc.. if it's possible to parse the players involved there's no reason not to though. For example Sanga was dropped by Mitch yesterday, and the commentary had:

20.5 Williamson to Sangakkara, no run, sweeps hard at this fuller one going down the leg side, but ends up offering a chance to McClenaghan at backward point, who ends up shelling the chance. Would have been a significant double-blow had it come off for NZ!

Sanga was at 53 and went on to make 76, the keyword here is "shelling", there is no other player name in the commentary except "McClenaghan" so it would make sense to assume that he was the culprit. All this info should be relatively straightforward to parse.
I think this is harder than it looks. For example:

Morkel to Brathwaite, no run, this was in the air and drops just short of gully! Oohs and aahs around. He was looking for the square cut but this just gets big on him
We don't know who gully was. Could well have been a half-chance.

Boult to Sangakkara, 1 run, these two are making some very tight calls in the running as Sangakkara attempts yet another drop single, with Brendon right on it, this time picking it up and scooping the throw in one fluid motion, but doesn't hit the target.
Keyword here is "drop" but this has nothing to do with a catch. Of course you could always find "drops" or "dropped" as keywords but the same situation could apply. You'd probably still need to check some things manually.

Southee to Sangakkara, no run, gets this one wrong as it spills down the leg side, with Sangakkara playing a half-hearted glance at it as it rushed past
Again, "spills" would be a common keyword for a dropped catch but here it means nothing.

Cool idea though, I would be interested in seeing how your parsing algorithm works.

10. It's yet another attempt to rate things without watching them so it is to all intents yet another waste of time.

11. I think that Viriya would be best served starting the analysis from the first game of the World Cup. Record your stats each game, ask questions (There will be people watching each games) compile the stats, perhaps providing updates after each stage of the tournament and see how you go. If its enjoyable, then start to work your back in time. Up to you though I guess.

12. Originally Posted by RossTaylorsBox
I think this is harder than it looks. For example:

Morkel to Brathwaite, no run, this was in the air and drops just short of gully! Oohs and aahs around. He was looking for the square cut but this just gets big on him
We don't know who gully was. Could well have been a half-chance.

Boult to Sangakkara, 1 run, these two are making some very tight calls in the running as Sangakkara attempts yet another drop single, with Brendon right on it, this time picking it up and scooping the throw in one fluid motion, but doesn't hit the target.
Keyword here is "drop" but this has nothing to do with a catch. Of course you could always find "drops" or "dropped" as keywords but the same situation could apply. You'd probably still need to check some things manually.

Southee to Sangakkara, no run, gets this one wrong as it spills down the leg side, with Sangakkara playing a half-hearted glance at it as it rushed past
Again, "spills" would be a common keyword for a dropped catch but here it means nothing.

Cool idea though, I would be interested in seeing how your parsing algorithm works.
Good point, I wasn't planning to make it fully automated off the bat due to issues like this - more like parse possible fielding moment instances, then eyeball them (I don't expect much more than 10 per game) before going forward. A little painstaking but would only need to be done once per game, so most probably worthwhile.

13. Originally Posted by marc71178
It's yet another attempt to rate things without watching them so it is to all intents yet another waste of time.
I'll be using my eyes more than spreadsheets though..

14. I have some preliminary results, but before I put it out there I want to do some sanity checks to see if I'm capturing at least most of the fielding events.

I know this is a long shot but does anyone have some examples of ODIs with significant fielding events (drops/direct hits/great catches etc) in the last 7-8 years? If you can point out certain series/players that would help too.

15. Some interesting preliminary results (only considering matches from mid-2006 since detailed commentaries only start then):

Ricky Ponting: 8 direct hits (that resulted in a wicket) over ~118 matches => 15 matches per direct hit
Tillakaratne Dilshan: 30 runs saved, 5 direct hits over ~200 matches => 6.7 matches per saved run, 40 matches per direct hit
Suresh Raina: 35 runs saved over ~155 games => 4.5 games per saved run

Umar Gul: 3 dropped catches over ~100 matches => 33 games per dropped catch

Lasith Malinga: 5 dropped catches when he was bowling over ~160 matches => 32 matches per dropped catch
Ajit Agarkar: 4 dropped catches when he was bowling over ~25 matches => 6 matches per dropped catch!
MS Dhoni: 7 dropped catches when he was batting over ~178 matches => 25 matches per dropped catch

I think considering that these numbers are believable in general, the method has some promise. I'm sure I'm missing some events though, so the setup can improve. I'm going to try to develop a career ratings + current ratings setup for fielders based on this - should add some insight.

Agarkar's numbers are the most mind-boggling - the Indian fielders dropped a catch off him every 6 matches on average for over almost 1.5 years! Goes to show how much drops affect bowlers in inferior fielding sides.

One obvious thing that I noticed was that the runs saved per match figure for even the great fielders is not that high - Ponting/Dilshan didn't save 5 runs a game - it was more like they saved a run every 5 games - not as significant as you might think.
What really has value when rating fielders are direct hits, great catches and low drop rates - the runs saved is just a bonus but that makes a much more significant difference. Ponting getting a wicket with a direct hit so frequently is huge - basically Australia gets a free wicket every 15 games just due to his presence.

Page 1 of 16 12311 ... Last