T-test and statistical power

I was fiddling with R to calculate the statistical power of an experiment that we recently conducted (using Welch’s t-test), when I decided to make a reference table of sample sizes that produce a power of 0.8, given a range of deltas (differences between the means of two groups) and standard deviations.  Here it is, with delta in the first column and standard deviation in the first row:

Delta SD

1 3 5 7 10 15 20 30
1 17 143 394 771 1571 3533 6281 14129
2 6 37 100 194 394 884 1571 3533
3 4 17 45 87 176 394 699 1570
5 3 7 17 32 64 143 253 567
7 2 5 10 17 34 74 130 290
10 2 3 6 9 17 37 64 143
15 2 3 4 5 9 17 29 64
20 2 2 3 4 6 10 17 37
30 2 2 2 3 4 6 9 17

Realistically, we want to keep the N of each group below 50, which means that the standard deviation should not exceed delta by about 75%.  A simple function call to demonstrate:

power.t.test(power=0.8, sig.level=0.05, type="two.sample", alternative="two.sided", delta=100, sd=175)
 Two-sample t test power calculation
 n = 49.05349
 delta = 100
 sd = 175
 sig.level = 0.05
 power = 0.8
 alternative = two.sided

NOTE: n is number in *each* group

There’s a classic example in statistics which illustrates the difference between statistical significance and effect size.  Suppose you are conducting clinical trials on a drug purported to lower blood pressure.  Given a large enough sample size, you could find a blood pressure decrease of 1 point to be “significant,” but given the effect size, would it be worth marketing?  Of course not.  So I got to wondering:  how big would the sample size have to be?  Generally people throw out a large number, like 10,000.  But how big does it really have to be?  I found data on the standard deviation of blood pressure across many populations, and the average is about 20.  Plugging that into the t-test, I got:

power.t.test(power=0.8, sig.level=0.05, type="two.sample", alternative="two.sided", delta=1, sd=20)

 Two-sample t test power calculation
 n = 6280.064
 delta = 1
 sd = 20
 sig.level = 0.05
 power = 0.8
 alternative = two.sided
 NOTE: n is number in *each* group

So,  you’d need a sample size of 6,281 per group (drug vs control), or over 12,500 study participants, to achieve statistical significance with a 1 mm Hg difference in blood pressure, at a power of 0.8.  At a power of 0.9, the group size must be 8,407, for a total study size of over 16,800.

EDIT: I made a big mistake.  The proper statistical test for the blood pressure experiment, as described, would be a paired t-test.  The sample size can then be calculated with this function call in R:

power.t.test(power=0.8, sig.level=0.05, type="paired", delta=1, sd=20)

Here’s the result:

Paired t test power calculation 

   n = 3141.473
   delta = 1
   sd = 20
   sig.level = 0.05
   power = 0.8
   alternative = two.sided

 NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

Thus you’d need 3,142 participants in the study, and you’d need only one group (since the two “groups” are before and after administration of the drug).  That makes the study considerably more feasible.

Leave a Reply