T-test and statistical power
I was fiddling with R to calculate the statistical power of an experiment that we recently conducted (using Welch’s t-test), when I decided to make a reference table of sample sizes that produce a power of 0.8, given a range of deltas (differences between the means of two groups) and standard deviations. Here it is, with delta in the first column and standard deviation in the first row:
| Delta | SD | |||||||
| 1 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | |
| 1 | 17 | 143 | 394 | 771 | 1571 | 3533 | 6281 | 14129 |
| 2 | 6 | 37 | 100 | 194 | 394 | 884 | 1571 | 3533 |
| 3 | 4 | 17 | 45 | 87 | 176 | 394 | 699 | 1570 |
| 5 | 3 | 7 | 17 | 32 | 64 | 143 | 253 | 567 |
| 7 | 2 | 5 | 10 | 17 | 34 | 74 | 130 | 290 |
| 10 | 2 | 3 | 6 | 9 | 17 | 37 | 64 | 143 |
| 15 | 2 | 3 | 4 | 5 | 9 | 17 | 29 | 64 |
| 20 | 2 | 2 | 3 | 4 | 6 | 10 | 17 | 37 |
| 30 | 2 | 2 | 2 | 3 | 4 | 6 | 9 | 17 |
Realistically, we want to keep the N of each group below 50, which means that the standard deviation should not exceed delta by about 75%. A simple function call to demonstrate:
power.t.test(power=0.8, sig.level=0.05, type="two.sample", alternative="two.sided", delta=100, sd=175)
Two-sample t test power calculation
n = 49.05349 delta = 100 sd = 175 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group
There’s a classic example in statistics which illustrates the difference between statistical significance and effect size. Suppose you are conducting clinical trials on a drug purported to lower blood pressure. Given a large enough sample size, you could find a blood pressure decrease of 1 point to be “significant,” but given the effect size, would it be worth marketing? Of course not. So I got to wondering: how big would the sample size have to be? Generally people throw out a large number, like 10,000. But how big does it really have to be? I found data on the standard deviation of blood pressure across many populations, and the average is about 20. Plugging that into the t-test, I got:
power.t.test(power=0.8, sig.level=0.05, type="two.sample", alternative="two.sided", delta=1, sd=20) Two-sample t test power calculation
n = 6280.064 delta = 1 sd = 20 sig.level = 0.05 power = 0.8 alternative = two.sided
NOTE: n is number in *each* group
So, you’d need a sample size of 6,281 per group (drug vs control), or over 12,500 study participants, to achieve statistical significance with a 1 mm Hg difference in blood pressure, at a power of 0.8. At a power of 0.9, the group size must be 8,407, for a total study size of over 16,800.
EDIT: I made a big mistake. The proper statistical test for the blood pressure experiment, as described, would be a paired t-test. The sample size can then be calculated with this function call in R:
power.t.test(power=0.8, sig.level=0.05, type="paired", delta=1, sd=20)
Here’s the result:
Paired t test power calculation n = 3141.473 delta = 1 sd = 20 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs
Thus you’d need 3,142 participants in the study, and you’d need only one group (since the two “groups” are before and after administration of the drug). That makes the study considerably more feasible.