First of all I don't like the Kirsch et al method of carrying out a regression on the drug and placebo groups seperately - we are looking at how the difference between antidepressant and placebo groups varies with initial disease severity and that is the analysis to carry out.
So taking my SD and mean effect from the last study, using the mean baseline HRSD score (in the drug group) from the paper, and weighting a linear regression by inverse variance*, what do we find?
Well let's look at the graph on the top right. Here we see that, unsurprisingly, we replicate the Kirsch et al finding that effect size increases with baseline severity, we also see that those drugs which had statistically and 'clinically' significant effect sizes in the previous analysis did have greater disease severity - supporting the Kirsch et al contention that differences in baseline disease severity may underlie some of the differences in the efficacy of the different drugs (see that this is essentially the same as Kirsch et al's Figure 4, right).
But also compare this figure with one from the PLoS Medicine paper (Figure 2, right). Note that Kirsch et al have marked the region in green where their regression suggests an effect size over the NICE criteria (Cohen d > .5). On my graph I have also marked (grey vertical line) where the regression effect size exceeds the NICE criteria (mean HRSD difference > 3, marked as grey horizontal line). I find that this occurs around a baseline of 26 HRSD points** versus the 28 points found by Kirsch et al. So not a large difference between my finding and theirs, and likely due to using standardised effect size analyses and associated NICE criteria versus raw HRSD scores and associated NICE criteria, but looking at my figure is rather less striking because a baseline of 26 points is actually more representative of the studies analysed, falling near the mean baseline score.
If I repeat my analysis so that it is performed the same way as Kirsch et al, modelling an inverse variance* regression line on the mean change in HRSD for each group separately we can see (bottom right) that the point at which the regression lines are 3 HRSD points apart (grey bar) is similar to my above analysis, so that result is replicated when we consider the change scores and baseline HRSD scores for drug and placebo groups separately***.
An interesting further point to note is that the placebo response does not seem to decrease with increasing baseline HRSD when we use raw HRSD change as our effect size, and that drug response does in fact rise with increasing severity. Again this suggests that Kirsch et al's findings were due to using standardised mean differences as a measure of effect size rather than the raw figures. It also puts the lie to the widely reported claim that increasing effectiveness with severity is due to decreasing placebo response rather than increasing drug response.
* Weighting by sample size makes little difference.
** This is still in the 'very severe' category of depression, althought the labels used by NICE do not necessarily reflect clinical practice as Moncrieff & Kirsch say that in
But also compare this figure with one from the PLoS Medicine paper (Figure 2, right). Note that Kirsch et al have marked the region in green where their regression suggests an effect size over the NICE criteria (Cohen d > .5). On my graph I have also marked (grey vertical line) where the regression effect size exceeds the NICE criteria (mean HRSD difference > 3, marked as grey horizontal line). I find that this occurs around a baseline of 26 HRSD points** versus the 28 points found by Kirsch et al. So not a large difference between my finding and theirs, and likely due to using standardised effect size analyses and associated NICE criteria versus raw HRSD scores and associated NICE criteria, but looking at my figure is rather less striking because a baseline of 26 points is actually more representative of the studies analysed, falling near the mean baseline score.
If I repeat my analysis so that it is performed the same way as Kirsch et al, modelling an inverse variance* regression line on the mean change in HRSD for each group separately we can see (bottom right) that the point at which the regression lines are 3 HRSD points apart (grey bar) is similar to my above analysis, so that result is replicated when we consider the change scores and baseline HRSD scores for drug and placebo groups separately***.
An interesting further point to note is that the placebo response does not seem to decrease with increasing baseline HRSD when we use raw HRSD change as our effect size, and that drug response does in fact rise with increasing severity. Again this suggests that Kirsch et al's findings were due to using standardised mean differences as a measure of effect size rather than the raw figures. It also puts the lie to the widely reported claim that increasing effectiveness with severity is due to decreasing placebo response rather than increasing drug response.
* Weighting by sample size makes little difference.
** This is still in the 'very severe' category of depression, althought the labels used by NICE do not necessarily reflect clinical practice as Moncrieff & Kirsch say that in
"...the NICE meta-analysis [of] “moderate” (Hamilton score 14-18) through “severe” (19-22) to “very severe” depression ( ≥ 23)...the middle group...would generally be referred to as moderately depressed..."
So the 'very severe' group may be better thought of as simply 'severe' in normal practice, explaining why most studies were in this range of severity.
*** The regression analyses here are in broad agreement with the meta-analysis split by severity category.
**** UPDATE 11/3/8
Thanks to Robert Waldmann I've had to re-do the regression weights as the SD is lower than I estimated from confidence intervals if we use them directly derived from the 'd' and change scores, but it essentially makes no difference as we would expect. As Robert points out in the comments, excluding the fluoxetine 'mild' depression study makes no difference to the regression, which is interesting as it does look from the plot as if it is driving the regression.
3 comments:
[Reposting robert's comment without his email address]
What happens when you drop the single study with baseline HRSD 17 (protocol 62 mild IIRC). Your scatter looks like an illustration of how one leverage point can take over a regression.
Hmmmm I'll ask STATA. I will get different results as I use change/d as the standard deviation within a trial. for SSRI including all trials I regress average change on baseline wighted by the inverse of the variance of the average change.
Coeff 0.53 (or in other words end of trial HRSD score 0.47 points higher if initial HRSD score 1 point higher) reported se 0.038. R squared of 0.66.
Now I redo the data for sSSRI patients dropping the study with baseline 17 and get a coefficient of 0.48. Much higher standard error of course, but almost no change in the point estimate.
Oh wow you're right !
Also I get similar results weighting by sample size which, as I as I explain at length at my blog, in my opinion, trades a tiny loss of precision for a large gain in robustness.
in contrast there is no hint of a slope in the Placebo regressions
For placebo slope = 0.023 se 0.162.
Post a Comment