Carrying out a meta-analysis using raw Hamilton Rating Scale for Depression (HRSD) change scores derived from the Kirsch et al 2008 PLoS Medicine paper I found that the effect size was larger than that found by Kirsch et al, and for paroxetine and venlafaxine this effect size exceeded the NICE criterion for 'clinical significance' (a difference of 3 points on the HRSD score). This suggests that the findings of Kirsch et al are both dependent on their particular method of analysis and cannot be generalised to all antidepressants included in their analysis.
The weighted mean difference (WMD) with a random effects model (see right) shows an overall effect size of 2.7 (95% CI 2.0-3.4) with paroxetine, fluoxetine, nefazadone, and venlafaxine 3.4, 2.1 (non-sig), 1.7, and 3.5 respectively, so both paroxetine and venlafaxine both exceeded the NICE criteria for 'clinical significance' of 3 HRSD points.
Analysing the standardised mean difference gave an overall effect size of .32 (95% CI .24-.41), with paroxetine, fluoxetine, nefazadone, and venlafaxine having effect sizes of .41, .24, .21, and .40 respectively.
There was minimal difference between using fixed versus random effects, and excluding the mild study gives an overall effect size of 2.81, with fluoxetine statistically significant at 2.85.
The PLoS Medicine paper gives the data for individual studies in Table 1. It reports that the measure of effect size 'd' is the change score divided by the standard deviation (SD) of the change score so the SD can be derived from the change score and 'd'.
The raw HRSD change scores and SDs were entered, along with the sample sizes from Table 1, into the Cochrane Collaboration RevMan Analyses (v 1.0.5) software to perform a weighted mean difference (WMD) meta-analysis with random effects.
Subgroups were defined to analyse paroxetine, fluoxetine, venlafaxine, and nefazadone separately. Sensitivity analyses were performed by omitting the outlying fluoxetine study of subjects with mild depression ('ELC 62 (mild)').
For completeness a fixed effect analysis was also carried out, as well as an analysis looking at the standardised mean diffence (this is the difference in change scores normalised to the standard deviation of the change scores) using Hedges adjusted g (similar to Cohen's d but includes an adjustment for small sample bias), although this is not in fact appropriate for studies which have used the same outcome measure (HRSD scores in this case) .
This study is an updated version of this analysis and this analysis, but deriving the SD for the SMD more accurately. I also discuss the Kirsch et al paper here and here.
* UPDATE 11/3/8
Following on from Robert Waldmann's findings, despite my protestations to the contrary, it looks like the confidence intervals of 'd' in Kirsch et al are a poor guide to the standard deviation of the change score, and the effect size 'd' may actually be the HRSD change score/SD change score, so the above analysis was corrected to be based on this new SD measure. Unsurprisingly it makes little difference.