Running Medians

Calculating Championship Performances

(article in The Fellrunner magazine, January 1999)

Mark Rigby's article in the October issue of the Fellrunner magazine was most interesting and highlighted some important issues in scoring championship performances. Firstly, I would like to support his suggestion for a system which better reflects actual performances, instead of just race rankings, and is less susceptible to variation in the field of runners. Also, I would like to add some (technical) thoughts on the proposed system which I hope may make it easier to use, as well as more robust.

Essentially what we need is a summary of the races which is consistent between races - if the races were over exactly the same course (hardly a good model for a championship, but it illustrates the problem), then a simple summary would be the runner's times from each race and hence the winner overall might have the smallest total time. Using rankings from each race is in some ways equivalent, although less sensitive, since winning margins are ignored and one-off-racers (to use Mark's terminology) can push contenders down unfairly.

While ranks account for different length races, times do not and would clearly be biased towards the longer races. What we want to do then, is to normalise the times, so that they are more comparable between races - i.e. we seek a measure of each race which is consistent. For races on the road, we might use the distance, although this has its problems, since longer races are necessarily slower. For races on the fells, we might use some measure derived from the distance and the ascent, although this doesn't take account of all of the conditions in the race (as an aside these do give quite a good prediction of course records for Scottish hill races).

A better measure is derived from the actual results on the day. One benchmark might be the winning time, which seems sensible, although it does assume that the winners of each race are equally good - as does the use of ranks of course. Mark's suggestion of using the mean finishing time is a sound idea, since this only relies upon the field in general being comparable between races, which is more reasonable. The median (the middle time) would be a better summary and is simpler to calculate - in a race of 99 runners, it is the time of the 50th finisher, in a race of 100 runners, it is the average of the 50th and 51st times - and is less affected by very slow or very fast performances (though of course the very fast will receive a score which reflects their achievement). Mark's calculation of the championship results changes slightly with medians, which are in general lower than the mean times - for Buttermere the median time is 7221s against a mean of 7322s.

Can we look at this a little more to check that it is sensible? I have looked at the distribution of times in the Carnethy race in the Pentlands in 1998 (since these results were available electronically). Figure (a) shows the distribution of times in this race - high values show that more people were finishing in these times. This shows that most people took about 4300 seconds, while the winner (John Brooks) took 2901 seconds.

[Carnethy results]

Looking a little more closely, we see that the `tail' of this distribution is quite long on the right - i.e. slow people take considerably longer, which agrees with what we all know - it is harder to make up time when you are already running fast, than when you are running slowly. This is the key to making sense of these numbers - instead of time we look at 1/time - a slightly odd measure, but it is simply proportional to speed, without the distance in the formula. Figure (b) shows that this is much more symmetric - i.e. differences in speed are equally hard at the front (right) and the back (left) of the race, although of course these differences have a bigger effect on the time at the back. This says that the distribution of speeds in a race follows approximately a Normal distribution, indicating that there is a good spread of different abilities, with more people running close to the average speed. This is as we might expect, whereas the times suggest that the field is biased towards fast runners (low times in Figure (a)), which seems unlikely.

My suggestion then is to use the median(1/time) as the measure of average speed of a race - i.e. each runner's(1/time) is divided by this, or more simply, the median(time) is divided by the runner's(time) (since median(1/time)=1/median(time), which is not quite true for the average) - i.e. for this race, the winner ran at 4300/2901=148% of the median speed. The winning lady (Angela Mudge) ran at 4300/3456=124% of the median speed (note that there are too few women to get a reliable estimate of their distribution of speeds, but nevertheless we can compare their performances using the whole field, as that is simply a measure of the difficulty of the race).

To summarise then, a robust and simple way of calculating championship performances in races of different distances, is to divide the median (middle of the field) time by the time for any given athlete. Assuming that similar quality fields (on average) are running each race (reasonable for championship races), this gives a sound measure, which can be compared between races and added up to calculate championship placings (bigger is better). It is not influenced by the distance of races, or by lots of one-off-racers, unless there are enough to influence the median, and it rewards large winning margins (but not simply winning, as the current rankings do, except that the winner of course has the highest score). Doing the calculation this way (in terms of speed) also uses a better distribution (this result holds in several large road races, cross-country and other fell races) of which the median is a good summary. The median(time) is also a simple measure which allows all runners to measure their performances between races - median(time)/my(time) - regardless of whether they are near the 150% mark.