Pitting Don Bradman Against Leaders of Related Sports: An Investigation – Part 1Peter Kettle |
Pitting Don Bradman Against Leaders of Related Sports:
Acknowledgement: My thanks to Dave Wilson, head of Cricket Web’s features, for helpful and thought-provoking comments provided on drafts of this three part essay.
Bradman golfing in England, April 1938
Context and Purpose
The overall performance of Donald George Bradman (DGB) as a Test match batsman is universally recognised to be outstanding – simply without parallel. In an enterprising article published in 2015, Professor Stephen Walters (a statistician at the UK’s University of Sheffield) assessed the extent of DGB’s dominance in Test cricket and examined how that stacked up against the dominance of leaders in a range of other international sports.[i]
Walters based his assessments on data from the inception of Test cricket and of six other sports up to early/mid-2014. Applying a well-established statistical measure, he found that the extent of DGB’s dominance in batting outstrips the extent of dominance attained by those who lead performance in international soccer, international rugby union, the golf “Majors”, men’s and women’s Grand Slam tennis, and the dissimilar sport of Formula 1 motor racing. This finding had two dimensions:
Absolute: the magnitude of a participant’s dominance in a sport, as indicated by how far their measured career performance (batting average, wins attained and suchlike) stands above the overall average (the “mean”) for the whole field of qualifying participants.
Relative: the numerical gap between the estimated dominance of the leader of a sport and that of its second placed participant.
The more that an individual’s measured performance departs (or deviates) from the average for the qualifying group as a whole, the greater will be his or her dominance rating. A low positive rating signifies that their performance is a little above the average for whole group; a high positive rating signifies that it is well above the group’s average. Those who perform below the group’s average are given a negative dominance rating. Hence, the concept of dominance applies to all qualifying competitors for a sport and not just to the few best performing ones. The rating scale itself is linear: so ratings of 0.5 and 1.0 denote a difference in level (or quality) of performance that is the same for ratings of 2.0 and 2.5, for 2.75 and 3.25, and so on.
Statisticians use the mysterious sounding term “Z Score” to refer to the dominance rating attained, and this term appears as a heading in the lists that follow of the leading participants for each sport examined.[ii]
More specific aspects of Walters’ findings are:
- For Test cricket batting: a dominance rating for DGB of 6.48, the next highest being well below this at 2.24 for South Africa’s Graeme Pollock.
- The highest rating for the other six sports examined is 5.02 for golfer Jack Nicklaus, followed by 4.35 for the French soccer player of the 1950s, Just Fontaine and 3.92 for tennis player Roger Federer.
- The distinctive feature of DGB for Walters’ findings is not only that he stands far removed from the rest of the field in cricket and has a substantially higher dominance rating than each leader of the other six sports. There is also a greater gap between DGB and the runner-up in cricket – a lead of 4.24 rating points – than is the case for the leaders of each of the other sports he examined. The next largest first/second place gap is for golf, Jack Nicklaus having a far smaller lead of 1.41.
This article seeks to do two principal things: to propose, and apply, improvements to Walters’ methodology and bring the assessments up to the present day – end of year 2020. The improvements made concern a broadening of the range of ball sports considered; the issue of what criteria are to be applied for a player to be included in the assessment; and, for cricket, the desirability of deducting “dead runs” – those runs scored after a stage is reached when the opposing team has only a remote chance of being able to secure a win. I have been greatly assisted in this by Stephen Walters giving me access to his sports data set and the computations made, as well as some personal correspondence between us.
For some of Walters’ sports, updating from early/mid-2014 substantially increases the number of participants who qualify on the thresholds that he applied – as with soccer (qualifiers up by 38%), rugby union (up by 18%) and cricket (up by 16%). Whether these changes – and also the impact of relaxing some of Walters’ qualifying thresholds – materially raise or lower the values of the dominance ratings for the most prominent in each sport is, at the outset, an open question.
Walters confined consideration to those sports he happened to be familiar with. I have extended these to encompass most of the major ball sports: twelve in all, including men’s and women’s squash, American football and North American baseball, basketball and ice hockey – its puck sometimes being referred to as a “flat ball”. Some of the following ball sports might also have a claim, in terms of the prestige of events staged and extent of public interest, to be included in the “major” category: rugby league, field hockey, lacrosse, badminton, table tennis and snooker. These have not been looked at here on grounds of manageability and, in certain cases, limited data availability.
The topic of leading figures in the history of ball sports would doubtless be of much interest to DGB himself if he were still with us today as, besides cricket, he was highly competent at a number of them. On finishing school, he played a good deal of tennis which became his main sport for close on the next two years, joining a local club and playing in organised competitions. By the age of 16, he was highly promising with national rankings potential, nearly making it his primary sport for coming years. He continued to play tennis into his late-twenties and perhaps beyond.
At golf, DGB became proficient by the age of 21, playing throughout his Test cricket career and attaining a single figure handicap when in his forties. He was good enough to impress the likes of Jack Nicklaus and Gary Player when playing a round of 18 holes with them. And at squash, he won the South Australian amateur championship in 1939, defeating a former Davis Cup tennis player, doing so only a few years after taking up the game.
The inclusion of baseball, as well as some other American sports, would also have been of interest to DGB. During a light hearted cricket-sightseeing tour of USA and Canada in 1932, undertaken with a group of largely first class standard Australians, a number of matches were played against local baseball teams. In mid-July, he met the baseball legend Babe Ruth, conversing with him during a game when the Babe’s team, the New York Yankees, were hosting the Chicago White Sox. They took to each other. DGB showed a liking for baseball and its relatively short time-frame for a game (around two-three hours) even though each batter comes in four or five times, commenting: “There is more snap and dash to baseball than cricket.” [iii]
It is noted that the scientist and statistician, Charles Davis has also compared the extent of dominance of DGB in Test match batting with leaders across a number of major ball sports – overlapping with my selection – doing so two decades ago in his book The Best of the Best.[iv] On his analysis, DGB came out clearly ahead of the leaders of the six other sports looked at. But the periods examined did not go back before WW2, except for Test cricket (from its inception), whereas Walters and I take matters back to a sport’s inception in all cases, which mostly pre-date Davis’ starting dates by at least three decades.
Given the present availability of data, Davis’ time scales would now unnecessarily limit the number of qualifying participants. Changes over time made to the rules and other playing conditions of the sports he examined have not been far-reaching enough to substantially alter the ease or difficulty of achieving high honours. There is a caveat with Test cricket, though, as the poor state of pitches used prior to the 1920s had a substantially adverse impact on batting averages, commented on later.
Qualifying Criteria for a Sport’s Participants
In deciding on the qualifying criteria to be applied, a choice has to be made between:
- Requiring a certain amount of participation in a given sport
This is done with the aim of ensuring that a competitor’s performance which is being captured is a reliable indicator, representative of their overall capability and not distorted from taking too limited a view. This relates to those with short careers at the level of competition being examined, or who are starting out. Setting a participation threshold to be met (a minimum number of times playing an innings, playing a match and so forth) is intended to neutralise volatility – the fluctuations that are inevitably experienced in the standard of an individual’s performance (which reflects not only their own efforts but also, of course, how well the opposition happen to be performing).
(b) Requiring a certain level of attainment
Specifying a threshold to be met in terms of a certain number of runs, goals or points to be scored during a career has the merit of focussing wholly, or mainly, on those players whose principal task is to score for the team. The findings on dominance ratings are not then muddied by the effect of including many players who are not strictly relevant.
Whether one specifies a minimum amount of participation or a minimum level of attainment doesn’t really matter, so long as the former captures one or two “cycles”, each consisting of some relatively good and poor form. A suitable number of innings, matches played or tournaments entered depends on the typical duration of a cycle of form, which varies across different sports (for instance, considerably shorter for cricket than for rugby). A threshold for a certain level of attainment can be the more difficult to set because, unless chosen carefully, it risks either not being met, or being met too slowly, by a sizeable tranche of capable players. This is, though, the device that Walters and I have applied to cricket, soccer and rugby union.
An attainment type threshold is inapplicable to golf Majors or Grand Slam tennis as these are not team competitions in which contributions to success can be determined. But there is a difficulty in applying a participation threshold to these two sports. Although data on the number of entries made to tournaments by an individual is generally available, each of these sports has only four events a year. The “tyranny of distance” has meant that a sizeable proportion of their all-time participants have entered only one of the four events and quite often doing so for just a very few years. Hence, specifying a minimum number of entries risks ruling out too many participants who are worthy of being considered.
In practice, though, this turns out not to be a significant problem for my analysis. For golf Majors, of the field of 173 winning players taken into account, 5 have 6-10 tournament entries, with all the rest having 13 plus entries. For Grand Slam tennis, of the field examined for men of 135 winning players, 9 of them have between 2 and 7 entries, with all the rest having at least 8 entries; and of a field examined for the women of 102 winning players, 5 have 3-6 entries, all the rest having at least 9 entries. At this standard of golf and tennis, for those with short careers or who have recently begun playing at the Major level, around 8 plus tournament entries is taken to be generally sufficient to indicate a player’s capability.
How Competitor Performance is to be Represented
In general, it is either more interesting or more insightful to compare the quality of competitors’ performance over their careers in terms of intensity rather than longevity – the runs scored per innings played, number of goals or points scored per match, or wins achieved per tournament entry; rather than aggregate career runs, goals, points or wins. The chief reasons for this preference are two-fold. Careers are often cut short for various reasons apart from loss of interest in a sport – such as debilitating injury, the demands of one’s occupation or lack of income derived from the sport itself. Secondly, relatively high aggregate scoring or wins can be, and often is, attained through a dogged determination to keep persisting, year after year, beyond a normal career span, though with few if any personally memorable matches.
However, for golf and tennis, Walters did have regard to longevity – relying on the absolute number of wins attained during a career, rather than wins per tournament entry. This is because, in many cases, establishing the number of entries made by an individual takes a lot of ferreting out (needing to resort to a lengthy series of tournament records). I follow Walters in this respect, so as to enable comparison between our respective findings.
Treatment of One-Time Winners
As noted in Walters’ article, his published dominance ratings for golf and tennis (and for Formula 1 racing) are based on those competitors who have attained two or more wins. After generating a set of dominance ratings on both bases – one-time winners and multiple winners – he decided to omit all those in the former category. His rationale being that rare things do happen in sport, and “a benchmark of two or more major tournament wins implies some level of sustaining consistency and sporting excellence” (personal correspondence, February 2017).
I take these two terms to reflect distinct notions:
Sustained Consistency – relating to whether a participant’s overall performance indicates that they genuinely belong at the standard of competition being examined. Whilst it is true that, in certain cases, a single win during a career may have involved a large element of luck – such as a weak field of competitors or an injury to the opponent during play in the final – these sort of occurrences are extremely unusual. So I have applied a seemingly reasonable test of consistency to one-time winners, retaining all those who pass it. For a knock-out format, as applying to tennis, the test is whether – in addition to the sole win – a participant reaches the final at least once, or reaches the semi-finals stage on two or more occasions. With an open field, as with golf, the criterion used is attaining 1 second place or 2 third/fourth places, in addition to the sole win.
The case for including some of the better performing one-time winners is strengthened by the observation that in men’s tennis, for example, one-time winners Chuck McKinley (with 18 Grand Slam entries), Sven Davidson (22 entries) and Mal Anderson (24 entries), each have a higher win/entry ratio than do Stan Smith with 2 wins from 51 entries, and Leyton Hewitt with 2 wins from 66 entries. McKinley’s win/entry ratio is also higher than for Vic Seixas with 2 wins from 44 entries and Stan Wawrinka with 3 wins from 61 entries. The number of entries made by the one-time winners just noted is surely sufficient to meet a realistic qualifying number that might be proposed.
Sporting excellence – which essentially concerns the standard of participation that is the focus of analysis, as indicated by the status of the events staged and the participants’ prior achievements. In golf, for instance, there is a choice of whether to focus on the very top level – the four Majors – or to examine somewhat less prestigious events instead, such as a number of those held on the European continent; and in cricket, whether to focus on competition at county, state and provincial level, rather than on international matches. In selecting which particular events to include for different sports, the aim should be to have approximate parity in the standard of participants.
Benchmarks, and Sub-Sets, for Measuring Competitor Dominance
The mean (overall average) level of performance of a field of qualifying competitors stands as a generally accepted benchmark when measuring relative dominance. Yet it is no more than a working convention (promulgated by professional statisticians), and is not inherently more meritorious than certain alternatives. The principal option is to use median values instead, which are more suitable – being more representative – in those cases where the set of performance observations are distinctly skewed at the top and/or bottom end.[v] Moreover, there is no harm done in using median values with a data set that is not substantially skewed.
In the case of ball sports, skewedness of performance data is not present other than the tails of the data invariably being relatively long, and as this is a common feature it is not a real concern. Using mean performance values as the benchmarks, as done here, has the advantage of facilitating more numerous comparisons with the results obtained by other researchers – such as with the findings of Stephen Walters and Charles Davis; and for Test cricket batting alone, enabling comparison with the results of Geoff Dickson and colleagues.[vi]
Another tack in assessing competitor dominance could be to focus on a particular stratum of the overall field of qualifiers. Such as for Test cricket, focussing on those having a batting average of at least, say, 30.0 or 40.0, forming a sub-set of those qualifying on a specified aggregate number of runs. The degrees of dominance of the batsmen would then be measured with respect to the mean (or median) value for that stratum; or with soccer, a minimum number of goals per match could be specified to define a stratum of interest within all those qualifying on a minimum aggregate number of goals. As will be now be realised, the possibilities of potentially interesting variation are high.
There are also, of course, options for the time scale to be employed. The long-run simply happens to be of most interest to Walters and myself in assessing DGB’s status within ball sports. Others might wish to focus on some sports during a particular block of time or recognised era. It all depends on what is of greatest relevance to a person’s field of research: that lends the required rationale. The alternatives in this regard are endless!
Findings by Individual Sport
Test Cricket – Batting
Walters applied a qualifying minimum of 2,000 career runs which, from the inception of Test matches in 1877 through to March 2014, produced a field of 274 players. Only a few of them (seven) have a career span prior to the late-1920s – all bar one being Australians:
Clem Hill (Tests, 1896-1912), Victor Trumper (1899-1912), Warwick Armstrong (1902-21),
Warren Bardsley (1909-26), Syd Gregory (1890-1912), Charlie Macartney (1907-26),
Dave Nourse of South Africa (1902-24)
And no more than a further six players, within the same timeline, have scored 1,500 plus runs.
The paucity of such players during the initial five decades of Test cricket is due to a combination of two factors: the poor standard of pitches in pre-WW1 times, which improved substantially during the 1920s onwards; and matches being limited to a trio of countries (England, Australia and South Africa). The trio was supplemented by new admissions in the form of West Indies from 1928, New Zealand from 1929/30 and India from 1932, considerably boosting the number of fixtures and player numbers.
Adjustment for “dead runs”
As indicated, the term dead runs is used to refer to those runs a batsman scores after a point has been reached when any further runs for the team would not make a material difference to the prospect of securing a win. Indeed, as the opposition can then, at best, only draw the match, any further runs scored means their chances of securing a draw, instead of losing, are increased. And as nothing is riding on a batsman’s performance during a run accumulation phase that serves no real purpose – contributing nothing of value to the team – he has an absence of pressure, so runs are then that much easier to come by. For these reasons, I consider that these sort of runs should not only be treated as having a different status from other runs; they should be discarded in using averages to indicate the comparative capability of batsmen.
Accordingly, in estimating degrees of Test batsmen dominance, reductions are made to the raw (official) averages of those whose aggregate career runs include a material proportion of dead runs – taking 2.0% as the trigger point. DGB is conspicuous for his high proportion. At 8.9% (625 of his 6,996 runs), this is considerably higher than any other Test player in the history of the game who has passed 1,250 runs. Except for three cases, the dead runs for all others are under 3.0% – the highest being 4.9% for Bert Oldfield (1,427 career runs), followed by 4.1% for Lindsay Hassett (3,073 runs) and 3.4% for Adam Voges (1,485 runs).[vii]
The chief reason for DGB’s high proportion concerns his own personality and that of Bill Woodfull. During Woodfull’s captaincy (1930-34), he let DGB indulge himself due to feeling insecure in the role, which he took on reluctantly, and his extreme risk-aversion that accompanied this trait. When DGB himself took on the captaincy (1936/37-1948), his piling up dead runs reflected his unceasing hunger for compiling mammoth personal scores. This served to upstage those Australian cricketers who had made wounding fun of him when starting out on his first class career, as a boy from the bush with a peculiar way of playing – especially his exaggerated “loopy” back-lift and various unorthodox shots. It also served as further payback for the know-all English critics of his technique during their tour down-under in 1928/29. The master seam bowler, Maurice Tate let it be known he doubted that DGB’s “crooked blade” would be able to cope with the swing and seam movement of English conditions in the return series! Also present, reporting events, was the shrewd Surrey captain, Percy Fender who pronounced: “Bradman will always be brilliant but unsound.”
In addition, there is a sound case for adopting a qualifying threshold of 1,500 career runs, instead of 2,000 as applied by Walters. Lowering the threshold to this level extends the number of genuine (bowling and wicket-keeping) all-rounders to end of 2020 from 57 to 75, while the number of specialist bowlers who have a batting average of at least 19.0 is little affected (there are now 6 instead of 3). All-rounders are included if they have a batting average at least 25.0, subject to their bowling average not exceeding 35.0 runs per dismissal. The total number of players captured (all with a minimum of 20 innings) then rises by a quarter, from 318 to 403.
Results based on 1,500 plus runs, and excluding dead runs which applies to 14 of the field of 403 batsmen in addition to DGB, are as follows. The names of those having dead runs deducted are underlined.
Kettle – findings for Test cricket, top 16 to end 2020
|Country||Career Span||Runs||Average||Z Score|
|2||Steven Smith||Australia||2010 –||7,540||61.80||2.56|
|3||Graeme Pollock||South Africa||1963-70||2,256||60.97||2.47|
|4||George Headley||West Indies||1930-54||2,190||60.83||2.46|
|5||Marnus Labuschagne||Australia||2018 –||1,885||60.81||2.45|
|8||Everton Weekes||West Indies||1948-58||4,455||58.62||2.21|
|10||Gary Sobers||West Indies||1954-74||8,032||57.78||2.12|
|13||Clyde Walcott||West Indies||1948-60||3,798||56.69||2.00|
|15||Kumar Sangakkara||Sri Lanka||2000-15||12,090||55.97||1.92|
|16||Jacques Kallis||South Africa||1995-2013||13,010||54.21||1.73|
Source: Howstat.com – Test Cricket – Most Career Runs
To separate out the different impacts: the effect of updating from early-March 2014 to end of 2020, combined with moving to 1,500 plus runs, is to include Steven Smith, Marnus Labuschagne and Eddie Paynter, and to slightly increase the ratings of the other players listed above (by 0.22).
The impact of then discounting dead runs is substantial for DGB’s dominance – reducing his rating by 0.94 (14%). But it is only slight for the others listed, except for Paynter, Sangakkara and Kallis whose ratings fall by 0.17, 0.14 and 0.12 respectively, reflecting their dead runs at 2.9%, 2.5% and 2.1%.
[i] Stephen Walters: Did Don Bradman’s cricketing genius make him a statistical outlier? Significance magazine, Sports section, 3 February 2015. A magazine published bi-monthly on behalf of the UK’s Royal Statistical Society and the American Statistical Association.
[ii] Technically speaking, the value of an individual’s dominance rating (plus 1.7, minus 0.4, or whatever) represents the number of “standard deviations” that their measured performance stands above, or below, the average for the qualifying group as a whole. For those standing above the average value for the whole group, one standard deviation denotes how far above that value they are collectively – ie their average magnitude above it. Two standard deviations denotes twice that magnitude above the group’s average, and so on. Similarly for those standing below the group’s average.
[iii] Arunabha Sengupta: Cricket in America: Don Bradman meets Babe Ruth. CricketCountry.com, 26 June 2016.
[iv] Published by ABC Books, Sydney, October 2000. Chapter 15 gives Davis’ analysis and results.
[v] Another way of dealing with a highly skewed data set is to calculate the mean value after trimming off a small designated proportion (perhaps around 5%) of the top and/or the bottom values (which is known as a “trimmed mean”). This is intended to eliminate the influence of data points on the tails which unduly affect the mean value.
[vi] Geoff Dickson, Kerry Mummery, Trevor Arnold and others: A cricketer for the ages: Changes to the performance variation of Test cricket batting from 1877-1997. Paper given at the third annual conference of Sport Management Association of Australian and New Zealand, Griffiths University, Queensland, 1998.
[vii] The estimates made are based on those contained, with derivation explained, in a recent book by the present author: Rescuing Don Bradman from Splendid Isolation. PK Associates, Melbourne, February 2019.