Thursday, 27 January 2011

Britain: leading the world in fake explosive detectors, because brown people's lives are cheap

On Newsnight tonight how Britain's military and government promoted selling dowsing rods to third world countries for thousands of pounds. It isn't even like they didn't know about it - they banned their sale to Iraq and Afghanistan but nowhere else:
"It has been alleged that hundreds of Iraqis died in explosions in Baghdad after ADE651 detectors failed to detect suicide bombers at checkpoints. 
between 2001 and 2004 a Royal Engineers sales team went around the world demonstrating the GT200, another of the "magic wand" detectors which has been banned for export to Iraq and Afghanistan, at arms fairs around the world even though the British Army did not consider them suitable for its own use.

The government's Department of Trade and Industry, which has since been superseded by the Department for Business, Innovation and Skills, helped two of the manufacturers sell their products in Mexico and the Philippines."

The disgusting mandarins of the British government do the security equipment equivalent of letting companies manufacture and export sugar pills to third world countries labelled as antiretroviral drugs for HIV. And they still won't ban their sale because:
The Department of Business, Innovation and Skills told Newsnight that there was little point: "The impact of any further UK action in preventing the supply of these devices from the UK would be limited if they are available elsewhere"
It makes me sick.

Tuesday, 25 January 2011

The Academic-Medical Establishment

I was at a talk recently regarding the side-effects of a widely used drug, let us call it 'fictoxetine'. Now fictoxetine has long been thought to cause damage to the body, let us say the retina, leading to progressive visual loss and eventually blindness. Fictoxetine can be used for years, even decades, and so patients need to have their vision screened regularly. This talk reported their findings from a systematic review and meta-analysis that sought to find out whether the evidence actually supported this fear.

What they presented was data showing that over 1-2yrs of fictoxetine use visual acuity declined by around 10%. They also reported evidence that found that rates of fictoxetine use in people on a register of blind people were around twice those in the general population. They then concluded from this that fictoxetine doesn't really have as much of a major effect on vision as we had thought so we shouldn't be so worried about it and pontificated on how this myth had become so widespread in the medical community.

The majority of people at that talk took this message at face value and went away with that in their heads, maybe they will change their clinical practice - after all a respected academic in the field of fictoxetine research presented the evidence that showed fictoxetine doesn't have much effect on vision. Didn't he?

Well no, he didn't, the data I've just described is consistent with fictoxetine having really quite a large and serious effect on vision. If the short-term deficits of 10% in acuity over 1-2yrs continued over 10 years that would be a major loss of 40% of visual acuity. The blind register data is neither here nor there, most people would stop fictoxetine in patients with significant visual loss in the hope that they would never reach complete blindness (conversely, maybe they would be more happy to start it in people who were already completely blind than those with some vision).  So the data doesn't support the narrative being given to it but few people feel qualified or confident to gainsay a big name in the field.

And this is true throughout academic medicine, and academia in general, powerful personalities are able to shape the discourse in a scientific field not only through the research they perform but also the wider influence their ideas and opinions carry.

Monday, 24 January 2011

Is low dose citalopram as effective as other antidepressants?

From this post at the 21st Floor I found out about a study of citalopram dosing (Aursnes et al 2010) that is a sort of pre-print (apparently Webmedcentral is a kind of post-publication peer review model). It suggests that citalopram in lower doses (below 45mg per day) might not be as effective as comparator antidepressants. Citalopram is used in doses from 20-60mg for depression and an average dose is probably around 30mg so this is a significant claim*. It is actually titled "Are Regular Doses of Citalopram for Depression only Placebos? Meta-analysis And Meta-regression Analysis Of Pre-registration Clinical Trial Data" which is pretty tendentious since the study only includes two placebo controlled trials, both of which are in high doses of citalopram. Whether citalopram is or isn't a placebo at lower doses is not going to be answered by this study since, even if lower doses of citalopram are less effective than other antidepressants we don't know how they'd compare to placebo using this data.

The paper found a statistically significant relationship (p=.034) between mean citalopram dose and effect size (see Figure 1 below) and went on to split the data into doses above and below 45mg and found that doses below 45mg (but not above) had a statistically significantly smaller effect size than the control antidepressants.

Figure 1. Regression from Aursnes et al 2010

It is very unclear from the paper what methods they used - it appears that they used standardised mean difference (SMD) in change scores but it is very unclear how they computed the change scores from the data they had available. The data they show for most studies does not report change scores and you wouldn't normally be able to estimate the standard deviation of this measure (which you need for the meta-analysis) unless the study reported the correlation between baseline and final score (which is very unlikely).***

Naturally I wanted to have a look at the data and regular readers of this blog will know that baseline severity is an important predictor of antidepressant efficacy in trials and I was interested to see what effect that would have in this study. Some of the source data is conveniently provided, it was obtained from Danish medicine licensing applications for citalopram so likely to be less subject to publication bias (in a similar way to the Kirsch et al data). I was able to get final score data for the studies using the Hamilton Rating Scale for Depression (HRSD-17) as outcome but had to impute standard deviations for those using the Montgomery-├ůsberg Depression Rating Scale (MADRS). I could obtain change score SMD data from their figures but I think this is a little unreliable***. I used all the studies with an active comparator (usually a tricyclic antidepressant**). All analyses were random effects unless otherwise noted, and everything was done in R.

So the first thing I wanted to do was reproduce the correlation between mean dose of citalopram and effect size. Using the data I extracted (see Figure 2 below) my meta-regression showed a slope of .038 of SMD against citalopram dose (p=.13). Then I used their numbers (SMD and standard error) and performed another meta-regression. This showed a regression slope of .037 (similar, but not the same as their .038) but it was not statistically significant (p=.20) even if I used a fixed effects regression (p=.19). This is surprising as the authors report statistical significance at p=.034! The only way I can get their data to give a statistically significant regression is to include the placebo studies so that the slope is now .041 with p=.033 but this would be completely wrong. Having placebo studies included only for high doses of 50mg and 60mg will overestimate the effect of citalopram at high doses since we expect citalopram to be better than placebo but no better or worse than another antidepressant as control.

Performing multiple regression with dose and baseline score against weighted mean difference in HRSD score and MADRS score separately**** showed slopes of .73 (p=.10) and -.047 (p=.76) for baseline severity and mean dose respectively for HRSD scores and 1.06 (p=.30) and .25 (p=.75) for MADRS scores. So no statistically significant effect of baseline severity. That isn't necessarily surprising because it may be that both citalopram and other antidepressants are all equally affected by increasing efficacy with increasing severity of depression.

Figure 2. The data I have extracted presented as weighted mean differences in RevMan 5.0

So, in summary, there does not seem to be a robust relationship between the effect size of citalopram and the mean dose used in trials - I do not think a discrepancy of this magnitude is likely to due to methodological differences and I hypothesise that Aursnes et al (2010) have made a calculation error. Therefore it does not make sense to divide the studies based on citalopram dose. If we combine all studies using the SMD as outcome (see Figure 3 below) there is not a statistically significant difference between the citalopram and active control arms of trials although the difference is borderline significant (p=.057) and includes differences up to -.46 which corresponds to scores on the HRSD up to 4 points*****. It is worth bearing in mind that not a single trial had a mean dose of citalopram less than 40mg so this analysis cannot tell us much about lower doses of 20-30mg of citalopram and, if the correlation with dose is not real, then it can tell us nothing about these lower doses. In some ways this resembles the study by Kirsch et al where the authors used a regression analysis to make claims about patients with low severity by extrapolating the line into a region where there were not actually any studies.******

Figure 3. All active control studies combined using SMD as outcome, to the left of 0.0 favours control and to the right favours citalopram

* No study had a mean dose of citalopram less than 40mg.
** Two amitriptyline, two clomipramine, three mianserin, two maprotiline, nortripyline, and imipramine. The paper seems to misreport this but looking at the original data sheets they get their numbers from I think my numbers are right and their report of five studies using mianserin is wrong.
*** This is what the authors say about their methods:
These data were fed into a tailor-made program for performing meta-analysis (Comprehensive Meta-analysis Version 2, from Biostat, Englewood, USA), which uses standard statistical procedures [9]. We filled in columns for the mean score, its standard deviation, and the number of patients in the citalopram group and in the comparator group. We added a column for correlation between before and after, Pearson’s coefficient of correlation found to be 0.48, and standardized the effect analysis with standard deviations of the differences between values before and after. We found I2 to be 73.5 and performed random effect meta-analyses with the effects weighted with the inverse of their variances.
And I think they've probably committed an error in their methodology because I'm not sure what they think that correlation is for - they can't use that (the between study correlation) to estimate the correlation between baseline and final scores within studies and then use that to calculate change score standard deviations. That would be completely wrong. It is also noteable that the authors do an 'intention to treat' analysis in addition to 'last observation carried forward' but this is a flawed approach for continuous measures if you don't have access to the original data and it inflates the apparent sample size and does not capture any useful information regarding drop-outs that 'intention to treat' does with dichotomous data.
**** They can't be combined together because the baseline scores are in HRSD or MADRS units respectively. 
***** You can see from Figure 2 that the studies using the HRSD did show a statistically significant decrease in the citalopram group.
****** If the regression slope were to be taken seriously an arbitrary 'clinically significant' difference of .35 (around 3 points on the HRSD) is reached around a dose of 40mg.

Wednesday, 19 January 2011

Pay and conditions in the NHS

Our trust has just made a big chunk of people redundant, admin staff, nurses and doctors. Last year my pay was increased 1.5% and this year will get 1% despite a recommendation of 1.5% from the Doctors and Dentists' Pay Review Board. For the next two years doctors and many other NHS staff will have a pay freeze.

Meanwhile MPs get 1.5% as recommended by the Senior Salaries Review Board (we'll see if they decide to vote themselves another 1% next year) and the Chief Executive of my trust got 14% last year and 16% the year before, which is high even for the bloated standards of NHS Chief Execs:

All in this together my arse.

Care clusters: A race to the bottom?

Do you think you've got severe depression? Well you're wrong, you are in 'care cluster 4':
"This group is characterised by severe depression and/or anxiety and/or other increasing complexity of needs. They may experience disruption to function in everyday life and there is an increasing likelihood of significant risks.

Likely to include: F32 Depressive Episode (Non-Psychotic), F40 Phobic Anxiety Disorders, F41 Other Anxiety Disorders, F42 Obsessive-Compulsive Disorder, F43 Stress Reaction/Adjustment Disorder, F44 Dissociative Disorder, F45 Somatoform Disorder, F48 Other Neurotic Disorders, F50 Eating Disorder

Some may experience significant disruption in everyday functioning.

Some may experience moderate risk to self through self-harm or suicidal thoughts or behaviours.

Unlikely to improve without treatment and may deteriorate with long term impact on functioning."
So you're basically the same as someone with OCD or an eating disorder. Care clusters (see Figure 1 below) are the result of the Labour government's payment by results programme. You get allocated to a cluster partly based on 'clinical judgement' and partly automatically by a computer program using scores inputed by clinicians to answer 18 questions.*

Figure 1. The 21 Care Clusters
Unlike 'acute' medical trusts which are paid by 'activity' (e.g. how many operations they do) the mental health trusts will be paid by 'need', and that is defined basically by 'care cluster', which a different local tariff for each cluster. So the local mental health trust will get paid for 20 cluster 1 patients, 30 cluster 2 etc. It isn't entirely clear at this point how that will take into account that many patients only stay on a mental health team's books for a few weeks or months.

As with my local mental health trust care providers are now going to start deciding what services they are prepared to deliver for each care cluster based on that cluster's tariff (i.e. how much they'll get paid for treating that patient) and not by the current combination of supply and need. You are not paid by activity, that is by what care (e.g. therapy sessions or meetings) you deliver, but on what cluster someone comes under - therefore there is going to be a pressure to reduce the amount of care provided within each cluster to maximise profit. You also have to wonder whether patients that seem like they'll require more work, and thus cost, than others (e.g. personality disorders) will get taken on at all.

Locally we're developing care packages for each cluster, but you have to wonder how a cluster that includes major depression, OCD, and eating disorders can really have a generic package that is actually includes the appropriate evidence based treatments for those conditions. And where is the room for a bit of clinical judgement and addressing individual patient needs? 

This might not have happened if it was left to the current model where only a single mental health trust provides services to a given PCT but under the new government proposals for NHS commissioning by GPs they are subject to EU competition law and must commission services from 'any willing provider' based on price. So it seems likely there will be a race to the bottom, I may not want to deny you access to, say, psychological therapy, but if another provider is tendering for work with the GP consortium and they don't offer it they will be able to save money and come in under my quote.

This payment regime hasn't been introduced yet but the tools for implementing it are being put in place in 2010/11 ready to roll it out. Something worth looking out for.

* Bah, who says you can't quantify mental ill-health on a simple 72 point scale, and who says depression can't be nicely quantified on a 4 point scale e.g. 
"Question 7. Problems with depressed mood (current): 
0 No problem associated with depressed mood during the period rated. 
1 Gloomy; or minor changes in mood.
2 Mild but definite depression and distress (eg feelings of guilt; loss of self-esteem). 
3 Depression with inappropriate self-blame; preoccupied with feelings of guilt. 
4 Severe or very severe depression, with guilt or self-accusation."

Monday, 17 January 2011

Diversifying the skill mix: or paying peanuts and getting monkeys

This is what Alan Maynard, economist, and influence behind much government thinking on the NHS, thinks is the way forward:
"...can patient care be maintained with fewer staff or changes in skill mix?...Expensive GPs replaced by nurse practitioners, for example? Double GP list sizes and reduce the demand for such physicians by half? Expensive registered nurses could be replaced by assistant practitioners. Evidence suggests skill mix changes such as these could be cost effective. However, the potential for skill mix is limited by the power of the craft guilds, especially the royal colleges."*
That's right, the future of the NHS should be half as many GPs (so there would be half as many appointments available), or replacing GPs with nurses, and replacing nurses with kids off the street. Sounds like a recipe for success.

Funnily enough there have been some papers looking at this, by one A Maynard, doesn't look like overwhelming evidence for his plans:

"Twenty-two large studies...strongly suggest that higher nurse staffing and richer skill mix (especially of registered nurses) are associated with improved patient outcomes"

"An extensive review of published studies where doctors were replaced by other health professions demonstrates considerable scope for alterations in skill mix. However, the studies reported are often dated and have design deficiencies. In health services world-wide there is a policy focus which emphasises the substitution of nurses in particular for doctors. However, this substitution may not be real and increased roles for non-physician personnel may result in service development/enhancement rather than labour substitution. Further study of skill mix changes and whether non-physician personnel are being used as substitutes or complements for doctors is required urgently." 
I must say, my experience of physician's assistants in the US and nurse practitioners in the UK to replace doctors and care assistants to replace ward nurses doesn't incline me to feel positive about the future. Currently the NHS doesn't have enough doctors delivering front line services or enough nurses delivering care on the wards. Diversifying the skill mix is just another way of saying that the magic of 'progress' and 'reform' will make up for cutting front line staff. It won't, no matter how many economists sit on their arses pontificating about how successfully managing minor self limiting ailments means that GP receptionists or your Granny can replace consultant oncologists. And who, at the end of the day, will take the brunt of these cuts, who will be ultimately responsible for what these people do? The handful of properly qualified people who are left, with their professional membership at stake and big lawsuits waiting for them as they desperately try and supervise a million under-qualified drones with no professional stake or commitment to their patient's care.

As a doctor I've spent enough of my life running around after 9-5 nurse practitioners, phlebotomists, ward clerks or whoever** doing the stuff they won't do because they can just wash their hands of it when 5 o'clock (or more likely 3pm) comes around.

* Worth bearing in mind when you hear him talking in the news recently about consultants:

"They don't always keep to their job plans and then get to do the overtime. I think there needs to be much more transparency about consultants' pay.
"The public are just not aware of the sums they can earn. If the data was published it would put pressure on them and reduce some of the figures we are seeing."
Interesting from a man making nearly £50k per year for 12 years from the NHS for chairing the board of the York NHS Hospitals Trust.

** Incidentally, one of the reasons that nurse practitioners are cheaper than junior doctors (a band 6 nurse like a nurse practitioner gets £25,472-34,189; a junior doctor's pay starts at 23,533 and goes up to 31,523 before specialist registrar level; healthcare assistants get £13,653-21,798; band 5 front line nurses get £21,176-27,534)- is that they only work 9-5, so out of hours the ever decreasing number of junior doctors has to cover the stuff the nurse does during the day but with a concomitant reduction in overall numbers to cover the out-of-hours rota and no chance to practice under the supervision of superiors whatever it is the nurse does. Medicine is now no longer 'see one, do one, teach one', it's 'read about one, do one'. That's why people die so much more at night.

Friday, 7 January 2011

Antidepressants have no more than a minor effect in Minor Depression

Neuroskeptic reports on another paper that:
"...has added to the growing ranks of studies finding that antidepressant drugs don't work in people with milder forms of depression"
The study looked at 'Minor Depression', which is a DSM diagnosis that is similar to 'Major Depression' except that patients only have 2-4 rather than the 5 or more depressive symptoms necessary for the latter diagnosis.

Neuroskeptic reproduces the 'response rate' figure which shows no significant effect of antidepressants in this diagnosis but I think the findings from the continuous outcomes are more informative:

This shows the 3 studies that reported continuous outcomes (mean HRSD score) looking at paroxetine in nursing home residents, fluoxetine, and amitriptyline respectively. The effect size is a small* -0.93 HRSD points (95% CI: -2.27-0.41) and not statistically significant**. However, there are only about 200 patients included (which is about the minimum number to justify combining data in a meta-analysis) of which the vast majority came from one trial and I don't think there is enough evidence there to conclude that antidepressants do not work in minor depression***, although the effect size is likely to be small even if they do have a statistically significant effect when enough patients are studied.

* See my previous post for an antidepressant medication that is widely recommended and similarly shows a statistically non-significant HRSD score improvement around 1 point.
** The data presented are for random effects but fixed effects produce exactly the same results.
*** To compare with previous studies of major depression, where mean baseline severity is usually around 24 points on the HRSD in these studies they were something like 12 points on the HRSD. If you extrapolate from studies in major depression then baseline severity of this magnitude would fall well within the region of 'no effect' and so the detected effect size of around 1 point is actually larger than you would have expected.

Thursday, 6 January 2011

Lamotrigine: an exciting new treatment for acute bipolar depression?

I've talked a fair bit about the efficacy of antidepressants (or lack thereof) in treating major depression. I won't go into that again, but I did want to discuss something that has been neglected in that debate.

Bipolar depression
It's estimated that some 10% of people with a major depressive episode have underlying bipolar disorder - that is they'll go on to have a manic or hypomanic episode (if they haven't had one already) - and if the efficacy of antidepressants in straight major depression is controversial then bipolar depression is a minefield.

It is certainly considered that antidepressant usage alone gives a big risk of provoking a swing from a depressive to a manic episode so they would usually be used alongside a 'mood stabiliser' like lithium or an anticonvulsant, and even then there is considerable disagreement as to how much they help.

Lamotrigine in acute bipolar depression
Something that has got a lot of attention in recent years is the anticonvulsant lamotrigine. There is good evidence for the efficacy of lamotrigine in preventing further depressive episodes in people with bipolar disorder. But recently there has been much interest in its use for treating an acute episode of depression in bipolar disorder, and this is despite the fact that it takes a considerable time to titrate up the dose to therapeutic levels (if you go by the BNF it takes 5 weeks to get to the usual dose of 200mg).

A major paper influencing people's thinking came out of Oxford by Geddes et al in 2009. This was a meta-analysis of trials of lamotrigine in acute bipolar depressive episodes and had a considerable impact. The Canadian Network for Mood and Anxiety Treatments (CANMAT) guidelines now recommend lamotrigine as a first-line treatment for acute bipolar depression largely on the strength of this analysis.

What they did was, apparently, contact GSK (who make lamotrigine as 'Lamictal') and get hold of all the individual patient level data from all five trials performed by the drug company and used it to perform a meta-analysis. They also identified two other studies not by GSK but didn't combine them because they didn't have any individual subject data and the trials were a crossover trial (which is difficult to correctly combine in a meta-analysis) and the other used lamotrigine as add-on to lithium therapy. They excluded data from one of the trials which had used a 50mg dose as this is generally considered subtherapeutic.

I'll concentrate on their findings using the Hamilton Rating Scale for Depression (HRSD, 17-item version) which is very widely used in antidepressant trials (I've mentioned it before) which has a maximum of 54 points and a score greater than 18 is usually needed to be recruited into a trial as 'moderately' depressed. Again, I'll focus on two measures of outcome, 'response rate' (the proportion of patients in each arm who achieve a 50% or more reduction in their initial HRSD score) and mean difference in the final HRSD score.
If we look at the mean difference in the final HRSD score (this was adjusted for baseline severity in a regression) it was –1.01 (–2.17 to 0.14). That mean difference is not statistically significant (although using a different measure, the Montgomery–├ůsberg Depression Rating Scale, they did find a statistically significant effect), nor is it even suggesting a particularly large effect is possible (with the upper limit of the effect size around 2 points on the HRSD). This is smaller than the 1.8 point effect size reported by Irving Kirsch's meta-analysis of 'new' antidepressants in major depression and when I reanalysed Kirsch's data properly I (and others) found an effect size of 2.7 points on the HRSD, just short of NICE's (arbitrary) 3 point threshold for 'clinical significance'.

So a mean improvement of 1.0 points on the HRSD is not exactly impressive - certainly it wouldn't be very good if that was a uniform single point improvement across every patient. But potentially it could represent a really big, 'clinically significant' improvement for a subset of patients - and we'd be interested in a drug that could do that.

So this is why we look at 'response rates' - what proportion of patients got 'clinically significantly' better, or 'remission rates', the proportion who score sufficiently low to count as being better. Commonly the former is defined as a 50% reduction in the score on the HRSD (or other symptom scale). We can see below (Figure 1) that significantly more patients responded in the lamotrigine group than the placebo group with a risk ratio of 1.27 (95% CI: 1.09-1.47), that is 27% more patients in the lamotrigine group showed an improvement in HRSD of 50% or more - that implies 11 patients need to be treated for one additional 'response'.

Figure 1. Figure from Geddes et al - Meta-analysis of HRSD 50% response rates using individual patient data from GSK trials
As has been pointed out before, response rates in depression trials are slippery beasts. By calling a 50% reduction in HRSD score a 'response' to treatment you actually need to improve by less points on the HRSD if you are less depressed (and have a lower baseline score) so an improvement on exactly the same items of the HRSD could be classified as 'response' or 'no response' depending on that patient's initial severity. This also means that this measure is very vulnerable to small differences in baseline severity between the arms of a trial.*  In practice response rates based on thresholding continuous variables like the HRSD (such as a 50% improvement threshold, or a threshold of 8 points for 'remission') are vulnerable to artefactual non-linear effects where very small improvements tip a few people over the threshold (something to particularly worry about if the threshold used seems a bit arbitrary anyway as you can easily pick one after the fact that amplifies any effect).  

The response rates in this study are around 35% of patients in the placebo arm and 45% of those taking lamotrigine - so an increase of 10 percentage points due to lamotrigine. If we consider that, for the average patient, 50% improvement implies a minimum change of around 12 points on the HRSD the actual mean improvement of 1.0 points (and the standard deviation) seems a little small (in a back of the envelope calculation you could say that you would expect an average of 1.2 point improvement - thats 10% getting 12 points averaged over all the group). So a bit of a mixed bag I'd say.

Traditional meta-analysis versus individual patient data
A question that occurs to me is what extra information we gain from having the individual patient data? It is pretty rare to get hold of individual patient data when doing a meta-analysis, usually all we have is the overall results for each trial. Sometimes this doesn't make much difference, comparing the individual level meta-analysis by Fournier et al of antidepressants with Kirsch et al's meta-analysis did not reveal any major differences (and these studies looked at fairly different sets of antidepressants).

I had a quick go at performing a study level meta-analysis of the GSK lamotrigine data** and found that the response rate had a relative risk of 1.22 (1.06-1.41) (Figure 2 below) which is pretty similar to the individual level data 1.27 risk ratio.

Figure 2. Meta-analysis of HRSD 50% response rates from GSK per trial data
Looking at the mean difference in the change in HRSD scores I found an effect size of -.86 points (-1.84-.12) which is similar to the -1.0 effect using individual level data and is similarly also only borderline statistically significant (Figure 3 below).

Figure 3. Meta-analysis of mean difference in HRSD scores from GSK per trial data
This is pretty reassuring as it suggests the majority of the information is present in the trial data so we wouldn't lose too much or miss any important effect if we didn't have the individual patient data available. You have to ask why GSK didn't re-analyse the data themselves, this would have been easily done (it took me a few minutes). I think there are a few reasons for this, one intimated in the paper is that regulatory authorities do not accept the results of meta-analyses, they require two large positive trials for licensing a drug, and all but one of the GSK trials was negative so a meta-analysis would have done them no good with licensing. However, they could still have influenced the scientific literature or provided justification for a further larger trial and I wonder whether they didn't in fact do the same analysis as either me or Geddes et al and realise that this was probably a small and fairly dubious effect and reckon it wouldn't do them any favours in the long run.

Antidepressant response and the severity of depression
Like many previous studies of antidepressants Geddes et al also find that there is an interaction between the size of the antidepressant effect of lamotrigine and how severely depressed the patients in the study were at baseline. They found a significant interaction on their ANCOVA analysis between baseline HRSD severity and final HRSD score with a regression coefficient of .30 (p=.04). They go on to comment that:
"Thus, the interaction by severity was because of a higher response rate in the moderately ill placebo-treated group, rather than, for example, a higher response rate in the severely ill lamotrigine-treated group."
This statement is very redolent of Kirsch et al's claim:
"The relationship between initial severity and antidepressant efficacy is attributable to decreased responsiveness to placebo among very severely depressed patients, rather than to increased responsiveness to medication."
Kirsch et al got a lot of stick for saying this, not least from yours truly, so I was interested in what a set of rather more mainstream figures in the psychiatric world (two Oxford academic psychiatrists and a psychiatrist from the GSK advisory board) were trying to say, and how this relates the Kirsch's work. Maybe I was being unfair to Kirsch?

Kirsch et al argued that the apparent increase in antidepressant efficacy with increasing baseline severity in trials was due to decreasing placebo response in more severe trials (see Figure 4 below). They further argued that this increasing efficacy was therefore only 'apparent' because the response to antidepressant was the same. I've discussed before how, on many levels, it is meaningless to claim that this effect is only 'apparent'. 

Figure 4. Figure from Kirsch et al - Regression of baseline severity against standardised mean difference of HRSD score improvement for antidepressant and placebo groups using per trial data
However, when I re-analysed their data it was their finding of decreased placebo response that was actually only 'apparent', and was due to needlessly normalising the raw HRSD data using the standardised mean difference (see Figure 5 below) and in fact placebo responses remained fairly static with increasing severity while antidepressant responses increased.

Figure 5. Regression of baseline severity against HRSD score improvement for antidepressant and placebo groups using per trial data from Kirsch et al

Of course these correlations are only looking at the average baseline severity between each trial and doesn't tell us whether the relationship between baseline severity and HRSD improvement holds true within each trial for the individual patients in that trial. Fournier et al used individual level data to look at this relationship and found increases in both placebo and antidepressant responses, with the greater gradient in the latter leading to increased efficacy of antidepressants overall (see Figure 6 below) so there is actually not great evidence for a decreased placebo response with increasing baseline severity in straight trials of antidepressants in major depression.

Figure 6. Figure from Fournier et al - Regression of baseline severity against HRSD score improvement for antidepressant and placebo groups using individual patient data

So what did Geddes et al find? Well, unlike Kirsch et al and Fournier et al they primarily looked at response rates rather than mean change in HRSD, obviously you can't have an individual patient's response rate (an individual either does, or does not respond) so they divided the subjects into two groups, those with a baseline severity below 24 points, and those above. They found that only those in the 'severe' severity (>24 point) group had a statistically significant response rate greater than placebo with 46% of those on lamotrigine responding compared to 30% on placebo. In the 'moderate' (<= 24 point) group response rates were 48% for lamotrigine and 45% for placebo. Geddes and others have argued that this increased placebo response at moderate severity is due to something like inflation of baseline severity at trial recruitment (that is doctors subconsciously inflate severity for those around the lower threshold for trial recruitment and they then regress to the 'real' lower score they would have had anyway when assessed blindly as part of the trial while this doesn't happen for the more severe patients).***

Now I've outlined some of my reservations about 50% reduction in HRSD score as a measure of 'response rate' and I don't think that these numbers necessarily show what Geddes et al think it does. Let's consider a simple model of how antidepressants might work, let's say they can be modelled simply by saying that an antidepressant reduces the baseline HRSD score by X HRSD points which is the simple sum of the placebo effect (P) and a 'true' antidepressant effect (A):

X = P + A

Based on this model, as discussed above, we can see how patients in the 'moderate' severity group could be more likely to respond even if X is the same for both severity groups. This is because a lower baseline severity means less HRSD points need to be lost to reach a 50% reduction. If we take this observation it is conceivable that the lower placebo response in the 'severe' group could be, at least partly, due to pure artefact. Given that those treated with lamotrigine in the 'severe' group had a larger response rate than those on placebo you might then go on to posit that maybe P is larger for the more severely depressed patients.

The way we ideally would want to answer this question would be to look at the mean HRSD scores as we did above for the data from Kirsch et al and Fournier et al. The response rate figures alone would still be completely consistent with a relationship like that shown in Figure 5 above and suggest that the claim that "the interaction by severity was because of a higher response rate in the moderately ill placebo-treated group" is false.

Unfortunately Geddes et al don't show the mean HRSD data by baseline severity nor do they report the correlations between mean HRSD score and baseline severity for the lamotrigine and placebo groups separately, either of which might help to answer this question. This is a bit odd as this is one of the main areas where the individual patient data would prove very useful and answer questions that the trial only data cannot. So we'll never know whether they do or don't show that placebo responses are constant, decreased, or increased with greater severity of depression. I've reproduced the data presenting each trial below (Figures 7 & 8) but these can't really answer this question, for that we need the within trial data.

Figure 7. Simple regression of mean baseline severity against 50% response rates from GSK per trial data split by lamotrigine (blue) and placebo (red)

Figure 8. Simple regression of mean baseline severity against mean change in HRSD score from GSK per trial data split by lamotrigine (blue) and placebo (red)
Amusingly, if you consider Figure 8 in the same way as my Figure 5 and attempt to determine a severity 'threshold' above which the NICE 3 point 'clinical significance' criteria is reached then you find that you never actually reach it.

So, what should we conclude? Well a few things:
  • Lamotrigine is not very effective for acute bipolar depression
  • Individual patient data in these trials doesn't tell us much more than a classical meta-analysis
  • The effect of lamotrigine, like other antidepressants in major depression, increases with greater baseline severity of depression, but never reaches the NICE 'clinical significance' criteria
  • It is unclear exactly why antidepressant efficacy increases in this way but it is far from established that it is due to "higher response rate in the moderately ill placebo-treated [patients], rather than, for example, a higher response rate in the severely ill lamotrigine-treated [patients]"

So what next?
So what medication should be used for acute treatment of bipolar depression? Well I think the data from quetiapine is pretty promising and certainly a lot more convincing and impressive than for lamotrigine monotherapy. Perhaps lamotrigine will be synergistic when added to quetiapine and the first author, John Geddes, is heading up the CEQUEL trial looking at just this question. 

* I think a better measure might be a fixed improvement in HRSD score, call it a 'clinically significant response', this would avoid the assumption that antidepressants somehow cause an X% reduction in HRSD score rather than say improving Y number of symptoms, and thus the problem I've mentioned above about how mildly depressed patients need less improvement to 'respond' than more severely depressed. 

** I got the response rate data from the Geddes et al paper and the HRSD mean difference data from the GSK trials register (since it wasn't presented in the paper). I used mean change scores rather than final HRSD scores (although this shouldn't make much difference), I had to estimate standard deviations for trial SCA40910 and the means for trial SCA30924 were already adjusted for baseline severity. Data are for fixed effect models but random effects makes minimal difference to the results.

*** Interestingly this explanation would only be tenable if trials of greater severity showed the same within-trial effect as trials of lower severity, since this effect should take place at the recruitment threshold irrespective of what that threshold is. So therefore within a trial the less severe subjects around the recruitment threshold should show a greater placebo response whereas there would not necessarily be any relationship between average baseline severity and average response between trials. This means that my regression data from Figure 5 above doesn't necessarily argue against this model - what we need to know is whether this relationship holds within the trials, something that even the individual subject data from Figure 6 doesn't rule out because we'd ideally have data separately plotted for each trial since the recruitment thresholds for each trial may have differed.