Robert has submitted a response to PLoS and develops his argument and analysis in this post. Amongst the other responses to PLoS biostatistician Jim Young develops a similar line of argument:
"...To see what might be regression to the mean with increasing initial disease severity, one would need to plot raw improvement against initial disease severity.
Most of the authors’ analyses seem to represent each trial as two group means, not as a single difference between groups. It is not possible to estimate the pure effect of treatment and of placebo within a single trial of this sort. Each effect is confounded with other effects – such as regression to mean and spontaneous improvement. If it were possible to measure separate effects of treatment and placebo within each trial, then there would be no need for the placebo group at all. It would be much more efficient to run trials with only a treatment group .
The authors’ write “Finally…we calculated the difference between the change for the drug group minus the change for the placebo group, leaving the difference in raw units and deriving its analytic weight from its standard error.” This is the only analysis that makes any sense because it is an analysis of the difference between groups in each trial. I think this corresponds to Models 3a and 3b in Table 2 but, as PJ Leonard notes, Model 3b is best because it’s sensible to drop out the one study where patients had only moderate depression.
The authors first conclude that “Drug–placebo differences in antidepressant efficacy increase as a function of baseline severity”. I have no problem with this – that’s what Figure 4 shows, at least in patients with more than moderate depression. But the authors go on to conclude that “The relationship between initial severity and antidepressant efficacy is attributable to decreased responsiveness to placebo among very severely depressed patients, rather than to increased responsiveness to medication.” This second conclusion requires the assumption that other effects (such as regression to the mean and spontaneous improvement or deterioration) are the same in each trial. Because of measurement error, it’s logical to expect regression to the mean to be greater in trials recruiting patients with more severe disease or greater in trials with longer follow up. Likewise it’s logical to expect spontaneous improvement or deterioration to differ with length of follow up. Even if the authors are happy to make the assumption that these other effects are the same in each trial, I think they should have made this assumption explicit, because I would not want to assume this myself.
Even if willing to make this assumption, why would you base this second conclusion on Figure 3? Why plot the standardised difference (between initial and final measurements in each group); why not just plot the raw difference? In meta-analysis, the only reason for standardising is to convert measurements on different scales into a common metric so that one can compare them. But if measurements are already on the same scale in each trial, why standardise them? It’s more difficult to interpret and requires stronger assumptions (“that variation between standard deviations reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among trial populations” ). Figure 3 may mislead because standard deviations will vary from trial to trial (these’s an order of magnitude difference in sample size between the smallest and largest trials). PJ Leonard says that if you use the raw differences, the “placebo response does not in fact decrease with increasing baseline severity”. If he’s right, then the authors’ second conclusion just looks like wishful thinking."