Thursday 28 February 2008

Antidepressants redux

For people interested in this sort of thing I've done a back-of-the-envelope forest plot of the PLoS study - the data is derived from their Table 1 and isn't 100% accurate (due to the way I derived the SD from the confidence intervals), and I've used a weighted mean difference of standardised mean differences rather than a true standardised mean difference because of the way they've presented their data (but it shouldn't make much difference to comparing my results to theirs). Personally I wouldn't have used standardised mean difference scores like this, I'd have wanted to use the raw scores since all studies used the same rating scale and it seems odd to standardise within treatment group in this way (but I don't know how the data was presented to the FDA - they may have had to use the data this way).

So we can see that I've pretty much replicated their finding of a .32 effect size (95% CI .24-.41) and this holds if we exclude studies with group sizes below 40.

I think it is interesting to note that this study hasn't told us much more than we already knew since you'll note that effect sizes are not exactly huge if we were looking for a d > .5 'medium' effect. You'll also note that our confidence limits do not include .5 so NICE would classify it as:
"There is evidence suggesting that there is a statistically significant difference between x and y but the size of this difference is unlikely to be of clinical significance."
Someone somewhere was asking about how effect size is influenced by study size (commonly used as a proxy for study quality). I've already said that excluding small samples doesn't affect the conclusions and looking at a simple scatter plot, if anything, larger studies have a smaller effect size. The funnel plot I've derived here is also unremarkable. Excluding the outlying study of mildly depressed subjects doesn't make much difference either.

The interesting thing is to look at the data split by individual antidepressant. We can see that paroxetine and venlafaxine have larger effect sizes (both .42, with confidence intervals crossing .5) than nefazadone and fluoxetine (.22 and .24 respectively, neither CI crossing .5). In the PLoS study they remark that:
"Although venlafaxine and paroxetine had significantly (p [less than] 0.001) larger weighted mean effect sizes comparing drug to placebo conditions (ds = 0.42 and 0.47, respectively) than fluoxetine (d = 0.22) or nefazodone (0.21), these differences disappeared when baseline severity was controlled."
But that is a rather troubling caveat to their overall conclusion. What they are saying is that their regression analysis suggests that the venlafaxine and paroxetine trials enrolled more severe patients and that could be why they had greater responses to the medication. But at the very least we must conclude that in the trials that were actually performed and submitted to the FDA there was a reasonable effect size due to these two drugs (we might also conclude that there was little evidence of a meaningful effect size of the other two). However, according to the NICE criteria we should still say that for venlafaxine and paroxetine:
"There is evidence suggesting that there is a statistically significant difference between x and y but there is insufficient evidence to determine its clinical significance."

No comments: