Thursday 29 May 2008

Absolutes versus rates

This post on talking philosophy highlights another example of journalists willfully using absolute figures and using these to calculate meaningless headline statistics ('mobile phone thefts up one billion % since 1980!!!') even when the sensible rate figures have been provided and their significance explained. Do they not understand or just not care?

Creationism from the BBC Natural History Unit

On nullifidian:
"BBC (i.e. taxpayer) provided webcams at the creationist Noah’s Ark Zoo Farm near Bristol...Oddly enough, these webcams are still advertised within the “nature features” part of the BBC Bristol web site. I guess they haven’t got around to re-organising the web site since the supernatural became a part of everyday existence."

Friday 16 May 2008

Birth rape

Interesting, if bad tempered, exchange going on between Dr Crippen and the F Word (and others) over this post there on 'birth rape' and 'medical rape':

"A woman who is raped while giving birth does not experience the assault in a way that fits neatly within the typical definitions we hold true in civilised society. A penis is usually nowhere to be found in the story and the perpetrator may not even possess one. But fingers, hands, suction cups, forceps, needles and scissors… these are the tools of birth rape and they are wielded with as much force and as little consent as if a stranger grabbed a passer-by off the street and tied her up before having his way with her. Women are slapped, told to shut up, stop making noise and a nuisance of themselves, that they deserve this, that they shouldn’t have opened their legs nine months ago if they didn’t want to open them now. They are threatened, intimidated and bullied into submitting to procedures they do not need and interventions they do not want. Some are physically restrained from moving, their legs held open or their stomachs pushed on."
I'm somewhat ambivalent as to where my sympathies lie in this argument. I've seen some callous behaviour in the delivery room (more often from midwives than doctors I have to say). Of course any medical procedure carried out without consent is assault, and a patient can withdraw that consent at any time. On the other hand, when carrying out examinations and procedures which are uncomfortable it is always sensible, when a patient wants you to stop because of discomfort, to engage in at least some discussion with them - this is because people are often just temporarily unhappy about the discomfort and will continue with a little encouragement, avoiding terminating the procedure and requiring the whole experience to be repeated. This applies as much to inserting a cannula as to intimate examinations. I don't think there is any easy answer as to where the line lies, which is why doctors are always walking something of a tightrope between best interest and violation of autonomy.

Saturday 10 May 2008

Mad Nad

Via Hawk/Handsaw, Nadine Dorries shares her unique perspective on the BMJ study of survival rates in preterm infants:
"No improvement in neo-natal care in twelve years? Really? So where has all the money that has been pumped into neo-natal services gone then? A baby born at 23 weeks today stands no better a chance of living than it did in 1996? This report is the most desperate piece of tosh produced by the pro-choice lobby and it smells of one thing, desperation."
You may remember Ms Dorries from the minority report on abortion from the Commons Science and Technology Committe.

Sunday 4 May 2008

Kirsch et al reply again

Huedo-Medina, Johnson, and Kirsch have submitted a further response on PLOS Medicine in reply to further comments by myself and others, after their last reply.*


Placebo response and severity

Interestingly, while I observed that they needed to:

"clarify their position on the claim that placebo response decreases with increasing baseline severity, since this appears to be an artefact"
Rather than address this observation they instead repeat the claim, saying that it is the 'unique' contribution of their article:

"without the within-group analyses it would not have been possible to conclude that placebo responses were lessening as initial severity of depression increased (whereas drug response remained constant; see our article’s Figures 2 and 3). This unique contribution of our article contradicts Wohlfarth’s conclusion that it contained “nothing new.”"

Flawed meta-analytic methods

Further, they defend their bizarre and biased analytical method:

"One of the main concerns in the new commentaries centred on one of our main analyses, which evaluated change for drug and placebo groups without taking a direct difference between them. Thus, effect sizes were calculated separately for each group for this analysis, though the analysis combined them. Leonard regarded this practice as “unorthodox” and Wohlfarth regarded it as “erroneous because the effect size in an RCT is defined as the difference between the effect of active compound and placebo.” First, these concerns ignore the fact that our article’s between-group analyses confirmed the major trends present in the analyses that considered within-group change. Specifically, both sets of analyses concluded that antidepressants’ efficacy was greater at higher initial severity, attaining clinical significance standards only for samples with extremely severe initial depression. Second, although the commentators may be correct that our within-group analyses are relatively innovative in this literature, it does not mean that they were wrong. To the contrary, these statistics are in conventional usage elsewhere (e.g., 3, 4, 5), as Waldman’s commentary implies...Finally, the analyses did incorporate a direct contrast between drug and placebo (see Table 2, and Model 2c, for example).

...

Although alternative weighting strategies may yield somewhat different results, the choices converge well both for the overall mean difference and for analyses of the trends across the literature. As an example, Leonard (04 March 2008) reported replicating our meta-regression patterns using alternative precision weights.

Importantly, as our article documented (Figures 2 & 3), the size of the difference between drug and placebo grows as the samples’ initial severity increases to extremely severe depression (but is very small at lower observed levels of initial severity). Because the overall differences between drug and placebo depended on initial severity, it is misleading to consider the overall difference in isolation."

But this does not address my objection that:

"This is not an acceptable analytic technique because it ignores that there is a relationship between the improvement in placebo and drug groups from the same study, but that the placebo and drug groups from any given study can have grossly different weightings when considered separately (e.g. there would be half as much weighting to the results from the fluoxetine trials in the drug analysis as the placebo analysis, the result of, for example, different sample sizes between the experimental arms)."
And as I say about the more conventional analysis they claim supports their 'unorthodox' analysis:


"I note that Figure 4 in the paper of Kirsch et al is actually more consistent with my finding of 'clinical significance' at a baseline of 26 (this threshold is found both by regression on the difference scores, or separate regressions for each group's change score) than their suggestion of 28 points..."

And Robert notes:

"The available unbiased estimate of the overall average benefit of NDA’s is equal to 2.65 HRSD units, which is considerably higher than Kirsch et al’s biased estimate [of 1.8]."

So while my and Robert's analyses confirm that the effect size of antidepressants increases with increasing baseline severity, they also show that their claim that placebo responses decrease with baseline severity of depression are false, and that Kirsch et al report effect sizes that are considerably biased downwards.

It is worth thinking about the references they give to support their analytical method (numbers 3,4, & 5, notice they are either in psychology or education journals), the most recent (and thus most easily available) is reference 5, Morris & DeShon (2002) 'Combining Effect Size Estimates in Meta-Analysis With Repeated Measures and Independent-Groups Designs' in Psychological Methods 7(1) 105-25. It is about combining the results from repeated measures designs and independent group designs, concentrating on training effectiveness, organizational development, and psychotherapy, and is not an article about medical meta-analysis:

"The issue of combining effect sizes across different research designs is particularly important when the primary research literature consists of a mixture of independent-groups and repeated measures designs. For example, consider two researchers attempting to determine whether the same training program results in improved outcomes (e.g., smoking cessation, job performance, academic achievement). One researcher may choose to use an independent-groups design, in which one group receives the training and the other group serves as a control. The difference between the groups on the outcome measure is used as an estimate of the treatment effect. The other researcher may choose to use a single-group pretest-posttest design, in which each individual is measured before and after treatment has occurred, allowing each individual to be used as his or her own control.1 In this design, the difference between the individuals’ scores before and after the treatment is used as an estimate of the treatment effect."

I hope you can already see why this situation is not comparable to a meta-analysis of double blind randomised placebo controlled drug trials because these repeated measures designs would not be appropriate (you can have within-subjects cross-over designs but that is not what is being discussed here) because we know placebo effects are very important in drug trials so we require the use of placebo control arms. Therefore the Kirsch et al meta-analysis only involved independent groups and there is no need to worry about combining repeated measures and independent groups, and, as Morris & DeShon say:

"When the research base consists entirely of independent-groups designs, the calculation of effect sizes is straightforward and has been described in virtually every treatment of meta-analysis"

That is there is no need to use this unusual method because perfectly good methods already exist for analysing this data.

So Morris & DeShon are concerned with what to do when you have no control group for some of your studies - which is not the case in double blind RCTs because a study without a control group is considered an invalid measure of drug effects. The other two references, number 4, Gibbons et al (1993), and number 3, Becker (1988), also emphasise this aspect of the method (I haven't read these studies):

"With this approach, data from studies using different designs may be compared directly and studies without control groups do not need to be omitted."

But we have no need to do this, so we have no need for the analytical method used by Kirsch et al, and we have no need for this method precisely because medical meta-analysis consider only double blind RCTs and explicitly rejects studies without control groups precisely because an estimate of placebo responses in each trial is considered essential.

So we have no reason to use the method of Kirsch et al, but what reasons do we have for not using these "innovative...statistics...in conventional usage elsewhere"? Well what do Morris & DeShon have to say? Well obviously they're concerned about when it is acceptable to combine studies with and without controls, and conclude that ideally, if you intend to do this, there oughtn't to be a change in the control group with time, i.e. there should be no placebo effect in the control group. But they do refer to Becker for a meta-analytic method proposed for use when there is a placebo effect:

"Becker (1988) described two methods that can be used to integrate results from single-group pretest-posttest designs with those from independent-groups pretest-posttest designs. In both cases, meta-analytic procedures are used to estimate the bias due to a time effect. The methods differ in whether the correction for the bias is performed on the aggregate results or separately for each individual effect size. The two methods are briefly outlined below, but interested readers should refer to Becker (1988) for a more thorough treatment of the issues.

An important assumption of this method is that the source of bias (i.e., the time effect) is constant across studies. This assumption should be tested as part of the initial meta-analysis used to estimate the pretest-posttest change in the control group. If effect sizes are heterogeneous, the investigator should explore potential moderators, and if found, separate time effects could be estimated for subsets of studies." [my emphasis]

That is, if you can't be sure that the placebo effect is constant across studies, you shouldn't combine studies using this method. And, of course, this is precisely the objection that I and others have raised to this method - because we already know that placebo responses can vary between trials - that is why we have placebo control arms in randomised controlled trials!

So Huedo-Medina, Johnson, and Kirsch are advocating the rejection of the usual meta-analytic techniques used in medical research where the highest standards are required and control groups considered very important, in favour of adopting a methods from psychology and education that is only used when two different designs, one of which is rejected in medical research, need to be combined, and where placebo effects are downplayed, a method that even its advocates recognise is unsuitable with heterogeneous placebo responses between studies.

This is quite some defence when you look at the scatter on the placebo responses in the Kirsch et al meta-analysis, that's about as heterogeneous as it gets, and it isn't explained by baseline severity of depression - so the assumptions underlying the meta-analytic method used by Kirsch et al are violated, even according to the citations they refer to in justifying their approach! Even the original study (with standardised mean differences rather than raw change scores) showed great heterogeneity in the placebo arm:

"The amounts of change for...placebo groups varied widely around their respective means, Q(34)s = ... 74.59, p-values [less than] 0.05, and I2s = ... 54.47"

Precision versus bias

Huedo-Medina et al also completely misunderstand the objections of Robert Waldmann and others by saying:

"Waldman argued that our estimates of the overall difference between drug and placebo was conservatively biased (i.e., too small) because of assumptions present in our estimates of precision for each effect size. It is of course not possible to be certain that one has completely removed error from any measurement, or for that matter, to do so in an analysis of measures from independent trials. As Young noted, there are uncontrolled measurement errors or artefacts that necessitate the use of a control group and the randomised controlled trial design.
...
The calculation of a weighted effect size by using the inverse of each within-subjects variance is more precise than a sample-size weighted average (9), contrary to the Waldman’s assertion."
When, of course, his assertion is that their estimates may be precise but are biased:

"In each case, Kirsch chose a method which, under strong assumptions, gives an efficient and unbiased estimate of the true overall average benefit. In each case there are alternative approaches which are less efficient under those assumptions but which are unbiased not only when the Kirsch et al estimates are unbiased, but also for many cases in which the Kirsch et al estimates are biased. That is they are less efficient under the null but more robust. In each case the null hypothesis that the Kirsch et al estimator is unbiased has been tested and overwhelmingly rejected. The available unbiased estimate of the overall average benefit of NDA’s is equal to 2.65 HRSD units, which is considerably higher than Kirsch et al’s biased estimate."

* UPDATE
PJ Leonard replies to Huedo-Medina et al on PLoS Medicine.

Amateur psephology

Something that always annoys me about election coverage is the amazingly unsophisticated interpretation of the resulting statistics. For instance, in the current 2008 local elections, with something like a 35% turnout, the vote shares are projected nationally for the general election, which had a 61% turnout last time - and dire comparisons are made with seemingly little awareness of how meaningless this projection is. A great example is by Martin Kettle in the Guardian:
"Take, for instance, what has happened in Southampton. The Conservatives didn't just win the local council there this week. They also hoisted their share of the vote to levels that place both the city's Labour MPs - including the universities secretary John Denham - on clear notice of losing their seats. Three years ago, Denham had a 21-point majority over the Tories in Southampton Itchen; it made Itchen Labour's 189th most marginal seat. Yet this week the Tories stacked up a 20-point lead over Labour among the selfsame voters."
Ok, first of all we have to point out the boundary changes in Southampton Itchen, which makes comparisons difficult (this doesn't seem to make much difference to the 20-point Labour lead in the 2005 general election with a 55% turnout overall in the old constituency, which was 70% in 1997). A quick look suggests that in the new Southampton Itchen area the 2005 general election turnout was only 30% so Kettle doesn't have problems of differential turnout to content with,

Looking at the 2008 local election results for the new Itchen wards (Bargate, Bittern, Bittern Park, Harefield, Peartree, Sholing, Woolston) we get an 18-point Conservative lead with a 30% turnout. So pretty much what Kettle said. But local elections are, unsurprisingly, not the same as general elections, and we can demonstrate this by considering what the results were in the local elections just preceding that 20-point Labour lead in the 2005 general election.

In the 2004 local election in the new Itchen wards (obvious caveats about boundary redrawing problems), we find that the Tories actually had an 8-point lead over Labour, on a 31% turnout. So if we take a very naive approach to normalising general and local election results we could suggest that an 18-point Conservative lead in the 2008 local elections in Southampton Itchen predicts that Labour will be reduced to a 10-point lead over the Tories in the general election (not considering what will happen if turnout returns to the 70% high of 1997) i.e. it is all a little more complicated than the overpaid opinion makers of the media like to make out.

Ah, while writing this I notice that Kettle has another piece on Southampton Itchen, making most of the same mistakes:

"It also marks a big shift from the results in last year's local elections in Southampton, Tony Blair's last election as Labour leader. In 2007, these same Itchen wards produced a share as follows: Labour 34.9%, Tories 39.4%, Liberal Democrats 20.7% and others 4.9%. In other words, the swing in 2008 compared with last year was nearly 8%. Labour's change of leader to Gordon Brown has not improved things - it has made them much worse.

Yes, of course, there are caveats to enter. A local election is not a general election. Fewer people vote in locals and people can vote differently in different elections. Nevertheless, in this one constituency, Labour's 189th most marginal seat, Labour trailed by more than 19 points yesterday. If it had been a general election, Denham would have been swept out, a high profile victim of an enormous surge to the Tories. If he is to hold this seat in the next general election there will have to be an immense reversal of Labour fortunes"

Notice that the logic of his 8% swing to the Conservatives suggests the 10% Labour victory I indicate above, rather than the Tory victory he implies.

Saturday 3 May 2008

Oh dear god

Just think how many Tories there are who, while I wouldn't vote for them, would at least be competent to run a major world city. Instead we get Mayor Boris. Well done Londoners, and a BNP assembly member too.
"Democracy is a system ensuring that the people are governed no better than they deserve"