Part III – Not Out Innings and Batting Averages – Demystifying Rarefied Solutions & a Searchlight on the StraightforwardPeter Kettle |
PART III – CONCLUSIONS, RECOMMENDATIONS & APPLICATION
The various methods reviewed and evaluated have all been directed to arriving at a “true” set of batting averages, ones that improve on the Traditional Average and are also representative of batsmen’s demonstrated capabilities at the crease. And they have all paid particular attention to the issue of how many additional runs, if any, a retiring Not Out batsman could reasonably be expected to make, if he had been allowed to continue on until ultimately being dismissed – instead of being stranded without a partner, or the captain had declared, bad weather had intervened, the batsman had to retire hurt, etc.
The Rarefied School
The approach adopted by Alan Kimber and by Peter Danaher to derive true averages, being centred on use of the probabilistic Product Limit Estimator mechanism, is ruled out of a contention for a recommendation because of the following inherent feature. When projecting Not Out Scores to an expected conclusion, it is wedded to the mathematical notion of expected value, instead of being able to take the Median value as a general rule.
Hoffie Lemmer’s recommended Estimators of a batting averages are considered to be suitable for those players with a high proportion of Not Out Innings – specifically, at least 23% for Tests and similar matches, and at least 40% for ODIs and similar. His Estimators have a niche role to play for a player’s whole career or participation in a fairly lengthy series of matches (such as a whole season in county or state competition); but they have a more general role to play for those participating in a short series of matches, such as World Cup competitions for ODI and Twenty20 formats of the game and a year’s worth of Test matches. This reflects the fact that the incidence of players with a large proportion of Not Out innings is a lot higher for short series.
Third, the exposure-to-risk approach put forward by Sanchit Maini and Sumit Narayanan does have its attractions as a general scheme, and it is widely cited as an interesting contribution. Working against it, though, is that it appears to have an internal inconsistency. This, perhaps, is only a blemish rather than a really substantial drawback. More importantly, their proposed approach is unable to be applied to many prominent Test batsmen of the past owing to lack of readily available information on number of deliveries faced – not only in pre-WW2 times, but also for some countries during the 1950s and 1960s, and for at least one country through to the mid-1980s.
The Straightforward School
To start with The Weighted Batting Average method advocated by Anantha Narayanan, this is similar in spirit to the risk exposure approach just commented on, whilst being based on number of runs scored rather than number of deliveries faced.
The rationale Narayanan provides is, however, somewhat superficial and seems unconvincing. This being that it offers a compromise between the Traditional Average – viewed as “intrinsically unfair to batsmen with a low proportion of Not Outs” – and the most simple of all forms, “the plain runs per innings played” with no distinction made between uncompleted and completed innings. Narayanan proposes “something in the middle”. Despite this reservation, the resulting averages for the handful of Test careers provided do seem reasonable, intuitively.
The other three advocated methods of this school also address whole careers, either in Tests or ODI matches, though they don’t explicitly exclude application to a series of seasons taken together or indeed one lengthy season. Whilst each of these solutions rest on projecting Not Out Scores (NOSs) to a notional completed score, they vary as to:
(a) whether they make the projection dependent on all scores that equal or exceed it in magnitude, or only on those scores made prior to the NOS in question,
(b) whether or not other NOSs are taken into account (at their own projected values), and
(c) whether they take the Median or Mean value of the scores relevant to making a projection.
Regardless of whichever of these variants is to be preferred, I consider this kind of approach to be inherently superior to that of Anantha Narayanan as it hasa more compelling rationale.
Charles Davis, who gave the initial lead, rightly indicates that in projecting a NOS to a conclusion one should take account of scores made both before and after its occurrence, and that one should apply the Median value as the best predictor. But he eschews taking account of other NOSs in making a projection, which represents a deficiency in my view.
Paul Ulrick has provided a more explicit account of Davis’ approach, though departs from Davis by applying the Mean value in this “preliminary” work; whilst Uday Damodaran has sketched a refined version by factoring in other NOSs (at their projected values). But Damodaran restricts the relevant scores for making projections to those that occur prior to the NOS in question; and, in his sole worked example, he applies the Mean value rather than the Median value – these being two blemishes which are easily corrected for.
Uday Damodaran – Most appealing method of the Straightforwards
To conclude: the most suitable method of determining a batsman’s “true” average is a combination of the three versions noted immediately above. This can be represented by Damodaran’s proposal after making two simple modifications, as specified below. It will then provide a sound and readily comprehended generally applicable solution.
The Recommended Method for General Use
For general use in arriving at truly representative batting averages, when projecting a Not Out Score (NOS) to a conclusion, Damodaran’s method is to be modified in two ways:
(i) Extend the search for relevant scores to all of a batsman’s innings (ie his whole career or whole series of matches in question), regardless of when the Not Out innings (NOI) in question happens to occur (for reasons given in Part II).
(ii) Substitute the Median value, in place of the Mean value, of the relevant scores for projecting.
Lemmer’s proposed Estimators should be treated as supplementary, being something separate and for use in substitution for players with an especially high proportion of NOIs.
The abbreviation for this combination is “MDSL”: modified Damodaran, supplemented with Lemmer.
Mechanics of the General Method
For each Not Out Score (NOS), list those other innings that ended on the same or higher score and take the Median score of these innings for the purpose of projecting it to a notional conclusion.
The only complication arises (as noted earlier) in the treatment of other NOSs. Use should be made of these at their own projected values when they equal or exceed the particular NOS being considered.
In this way, each of a batsman’s NOSs is converted into an ultimately terminated (or concluded) value, to be included along with all his actually completed innings scores. The sum of these scores divided by number of innings played then represents his “true” average. A worked example is provided at Appendix I, based on the innings of Alec Stewart when playing for Surrey in 1983.
This straightforward method can be applied with an excel spreadsheet and a few simple instructions. For instance, to arrive at a median value, type in MEDIAN followed by the range for the data, such as Column D/Row 3 through to Column D/Row 25 (written as MEDIAN(D3:D25).
In the unusual case of a batsman’s top score being Not Out, the choice is between:
- adding on the average of a batsman’s other scores that exceed ten or so runs (including other Not Out Innings at their projected levels),
- restricting its projected level to the highest score the batsman has achieved during his career or season in question (as proposed by van Staden and colleagues in 2011),
- simply letting the Not Out Score stand as it is, treating it as a completed innings (as Danaher does, 1989 article),
- applying an intuitively reasonable margin – such as an uplift of 10% or 20%.
The first of these alternatives, reflecting evidence cited by Kimber, seems preferable (the rationale being given in Part I). The other solutions rest on rather arbitrary assumptions or judgements. When either of Lemmer’s Estimators is being applied, no adjustment is required; the Not Out score stands as it is.
Application of Recommendations to a Sample of Test Players
A two-pronged examination of the effect of applying the recommended approach has been applied:
(i) Is the change to how batting averages are to be determined shown to be worthwhile? In other words, does the recommended approach produce materially different outcomes to the Traditional Average, including whether it leads to ranking batsmen differently on their derived averages.
(ii) Reasonableness of resulting outcomes. Do the resulting changes to players’ averages accord with one’s cricketing knowledge and intuition about their respective demonstrated capabilities?
The Sample of Fifty Test Players
The players have been selected from teams of England, Australia, West Indies, India and South Africa during the last half century (five with careers still in progress). A minimum of 35 innings played has been applied.
Openers: four players
Upper middle order (nos. 3-4): six
Lower middle order (nos. 5-8): twenty
Tail-enders (nos. 9-11): twenty
Jimmy Anderson – King of the undefeated: now a Centurion
Principal Features of the Findings
In considering differences found between Traditional and MDSL-derived batting averages for the 50 Test players, a materiality yardstick of 2.0% is adopted. This is bearing in mind that quite large groups of players occupy a single whole number in a Test match country’s all-time batting averages (eg, for England, 7 players occupy the 46 mark while 4 players occupy the 47 mark; 6 players occupy mark 44 while 4 occupy mark 43; and 10 occupy mark 40). On this basis, as shown by the table at Appendix II:
Openers: four players
- No material changes.
Upper Middle Order (nos. 3 and 4): six players
- Material change for 4 players; all being modest reductions on their Traditional Averages, within a range of 2.0 – 3.3%
Lower Middle Order (nos. 5-8): twenty players
- Material change for 6 players, all being reductions on their Traditional Averages, except in one case. (The exception is an increase of 11% for Brad Haddin; due to his Not Out Scores being concentrated in the middle range of his dismissal scores, so benefitting from large uplifts when projected.)
- The reductions are all modest and lie within a narrow range of 2.1 – 2.9%.
The Tail (nos. 9-11): twenty players
- 14 of these players undergo material change, all being reductions to their Traditional Averages; 11 being in excess of 5.0%, with 7 of these exceeding 10%. This includes one reduction of around 20% and two of around 30%.
- 10 of the 14 material reductions result from applying Lemmer’s Estimator (average reduction of 16%), the other 4 cases resulting from applying Damodaran’s Modified Estimator (reductions of 3 – 6%).
Median versus Mean Values for Projecting Not Out Innings
Use of Median values of relevant scores for projecting a Not Out Innings to a conclusion nearly always produces a lower estimate of a batsman’s “true” average than when using Mean values:
- for the twenty Lower Middle Order players: 7 material differences, all of 2-3%,
- for the ten relevant Tail-End players: 9 material differences, all of 3-8% (average 5%).
Effect of Lemmer’s Estimator for Test matches
It is comforting to find that Lemmer’s Estimator (LE) has a switching effect when the proportion of batsmen’s Not Out Innings (Prop NOI) reaches 22-23%, being the threshold at which it has begun to be applied in preference to the Modified Damodaran (MD) Estimator. From this point upwards, LE gives a consistent and material reduction on batting averages derived using MD. The resulting difference between the two rises strongly when Prop NOI climbs into the 30% plus and 40% plus regions, the LE-derived averages then being some 15% to 30% lower than for MD.
For prop NOI in the range of 4% to 21%, LE produces averages that are generally higher than for MD although the difference exceeds 3.0% in only 8 of the 22 cases, and is greater than 5.0% in only 1 case.
Resulting Reversals of Player Rankings
MDSL-estimated averages do produce some reversals of rankings based on Traditional Averages, although these are few in number. Of the three cases, two apply to players occupying positions 5-8 in the batting order, with the other case applying to a tail-ender.
Whilst, in my opinion, all three reversals are justified on demonstrated capability, the quantitative differences are slight in two of these three cases. In the other case, Brad Haddin’s inferiority of 1.4 runs per innings in relation to Paul Collingwood on Traditional Averages, turns into a superiority of 3.2 runs per innings (refer to Appendix II).
Options for the Cricket Establishment
In overall terms, the findings reviewed make a substantial case for a change to the traditional way that batting averages are determined. Yet the quantitative differences involved are such as to imply that – apart from tail-enders – this is not a major deal. This is to say: it is more than a Claytons (as Aussies would say), though not a truly big deal.
And some might say: why be conscientious about this matter for tail-enders, even though the batting averages of just over half of the tail-enders in my sample are affected by more than 5%? Why deny these particular players the pleasure, and fun, that tradition confers by a somewhat artificial boost to their averages? This is, perhaps, one of the reasons why the qualified statisticians’ shimmering delights, paraded in journals, have remained in the background as far as lay cricket enthusiasts are concerned.
For those establishment organisations that publish batting averages, the main options are:
- Maintain the status quo, at least for the time being.
- Publish a set of MDSL-derived averages in parallel with the Traditional set.
- Supersede the Traditional Averages, past and present, with those of MDSL.
I shall abstain from giving my own view and let the reader decide for him or herself – whilst hoping this essay brings forth a groundswell of opinion for change of some sort. Whether the second or third option above is realistically a potential development will have to await any responses made to the recommendations made here.
To quote the German economist, political philosopher and social revolutionary, Karl Marx:
Ziel ist es, die Welt nicht nur zu verstehen, sondern vor allem zu verändern.
The aim is not only to understand the world but, more importantly, to change it.
Guidance for MDSLites
Those persuaded of the merit of the MDSL approach, and wish to apply it, should bear in mind the following verified propositions:
Applying the Modified Damodaran Estimator:
- If Not Out Scores (NOSs) are in general low for a batsman, the resulting Average will tend to approximate to the average of his Completed Innings Scores.
- If NOSs are generally high for a batsman, because there will then exist little headroom when projecting to arrive at notional completed scores, the resulting Average will tend to be lower than the Traditional Average.
- If NOSs cluster around the middle of a batsman’s Completed Innings scoring, this will tend to produce an estimated Average substantially higher than the Traditional Average.
Applying either of Lemmer’s Estimators:
- If a high proportion of a batsman’s NOSs are large in magnitude, his resulting “true” average will tend to be over-stated.
- If a high proportion of his NOSs are small in magnitude, his resulting “true” average will tend to be under-stated.
Applying the Traditional Average formula:
- The greater the proportion of a batsman’s NOSs, the larger will be the effect of his Not Out Innings on the resulting Average.
- It is this factor, rather than the proportion of Not Out Innings, that is the more important.
Some cricket enthusiasts might wish to apply the MDSL approach as a default position when a particular player’s Traditional Average seems dubious as a representative measure.
Lastly, an enduring general point: batting averages – of whatever formulation – are not multiplicative. Hence, it is not legitimate to say that a player with an average of, say, 51 is three times more meritorious than some other player with an average of 17 – even if of the same country, batting position, era and opposition played against.
A worked example is given below for the innings of Alec Stewart when playing for Surrey in the 1983 England County Championship. (A fictitious completed inning of 90 has been added to aid the exposition.)
|(Projected Not Outs)||Total|
|Total Medians||Dismissal Scores||Grand Total||Divide by|
PK’s SAMPLE OF 50 TEST PLAYERS
BOLD names denotes Lemmer’s estimator is applied
Summary of Findings
|Player||Career||Total||Not Out||Runs from||Official||“MDSL”||Reduction on|
|Span||Innings||Innings||Not Outs||Average||Average||Official Ave|
|India – Opening|
|Navjot Sidhu||1983-99||78||2.6%||2.7%||42.13||42.17||plus 0.1%|
|South Africa – Nos 3-4|
|England – Nos 5-8|
|Ben Stokes||2013-21||130||3.8%||8.7%||37.04||37.18||plus 0.4%|
|Australia – Nos 5-8|
|Brad Haddin||2008-15||112||11.6%||12.2%||32.98||36.58||plus 10.9%|
|England – Nos 9-11|
|Stuart Broad||2007-21||218||16.5%||19.0%||18.51||18.60||plus 0.5%|
|John Emburey||1978-95||96||20.8%||26.6%||22.53||22.90||plus 1.6%|
|Phil Edmonds||1975-87||65||23.1%||25.3%||17.50||17.73||plus 1.3%|
|West Indies – Nos 9-11|