Difference of medians vs difference of means

2.4. Difference of medians vs difference of means#

An important point to consider for rank-based tests is that, when stating the null and alternative hypotheses, we refer to a difference in medians rather than a difference in means.

\(\mathcal{H_o}\): there is no difference in median salary between men and women \(\mathcal{H_a}\): the median salary of men is higher than that of women

Why is this?

2.4.1. After ranking the data, we don’t know what the mean is any more!#

This is because rank-based tests are sensitive to the relative ordering of values, not their exact numerical size. Consider the following two datasets, which have very different means, but after ranking are identical:

https://raw.githubusercontent.com/jillxoreilly/StatsCourseBook_2024/main/images/MT_wk6_MeanMedian.png

Rank-based statistical tests use only the ranks of the data, not the original numerical values. As a result, the difference between groups in the left- and right-hand datasets would be equally significant according to a rank-sum test, even though the difference in means is much larger (and would be more statistically significant) in the dataset on the left.

It would therefore be incorrect to say that a rank-based test is testing for a difference in means. In fact, such tests are not sensitive to differences in means as long as the ranks remain unchanged, as is the case for the two datasets shown above.

In contrast, ranked data relate directly to the median, which by definition is simply the middle-ranked value.