Friday, 15 October 2010

Levelling the spirit - pt 1.75

Following on from part 1.5, here's my final set of analyses. I've had quite some trouble reproducing the analyses from the Spirit Level. This time I'm going to source my data from somewhere other than the OECD, the UN Human Development Report 2009 - this gives data for income inequality (average Gini 1992-2007), life expectancy at birth (2007), and government health expenditure (2006 in $PPP).

I've specified my data set in advance, I'm going to look at all countries (where data is available) shown on Wilkinson & Pickett's (W&P's) figure below which have greater national income than Porugal (the poorest country included in W&P's analyses). I'll also look at all those countries excluding those with a population less than 3 million (a specification made by W&P in their reply to critics). Finally I'll just look at those countries included by W&P in their analyses.

So, I'm throwing out Luxembourg, Liechtenstein, Hong Kong, and Andorra at the start because they aren't on the graph. This leaves me with 36 countries, all the usual ones plus Singapore, Korea, Israel, Slovenia, Brunei, Kuwait, Cyprus, Qatar, UAE, Czech Republic, Barbados, Bahrain, Hungary, and the Bahamas. Of these I have no Gini data on Iceland, Brunei, Kuwait, Cyprus, Qatar, UAE, Barbados, Bahrain, and the Bahamas. So I start with 27 countries - and the correlation between Gini coefficient and life expectancy is r=.13 (p>.5). The relationship between health expenditure and life expectancy was r=.37 (p=.06), so 'trend significant' as we say in the trade.*

Ok, population size, well the lack of data got rid of most of the small ones, but it's bye bye Slovenia with its 2m population, and unsuprisingly it makes little difference to the relastionship with life expectancy not significantly correlating with Gini (r=.12, p>.5) and the relationship with health expenditure even less significant than before (r=.35, p=.08).

Finally we'll look at W&P's subset of 23 countries: that means farewell to Hungary, the Czech Republic, and Korea, but we get to keep Singapore and Israel in addition to the usual crowd of Anglo-Saxon and Western European countries plus Japan. In this sample the correlation between life expectancy and Gini is still not significant at r=-.13 (p>.5) but now there is no correlation with health expenditure (r=0). Below I've reproduced the scatterplot of this relationship:

Compare this chart with the one by W&P in the Spirit Level reproduced below:

The two graphs don't look massively different, yet W&P report that they found a statistically significant relationship between income inequality and life expectancy, whereas I didn't find much of a relationship at all. Why would this be? Well, as I've been discussing in the two previous posts the exact composition of countries in the subsample is important, but that can't be the issue here since we've used exactly the same arbitrary sample of countries as W&P. I've also discussed how data source is important, which is why I've sought to do my analyses using different sources to W&P to check how robust they are. But even then the two graphs look pretty similar, and life expectancy estimates are not likely to differ by enormous amounts between sources.** But one areas where my data in all my analyses differ from those of W&P is in how we estimated inequality. This is important because if you look at the two figures an important differences between them seems to be that Japan in the top left hand corner and the USA, Singapore, and Portugal in the bottom right hand corner seem more extreme in the scatterplot from W&P than they do in my graph.

So how did I estimate inequality? Well I just took Gini data from a UN report as a stated above. What is the Gini coefficient? I'll let W&P explain:
"Other more sophisticated measures include one called the Gini coefficient. It measures inequality across the whole society rather than simply comparing the extremes. If all income went to one person (maximum inequality) and everyone else got nothing, the Gini coefficient would be equal to 1. If income was shared equally and everyone got exactly the same (perfect equality), the Gini would equal 0. The lower its value the more equal a society is. The most common values tend to be between 0.3 and 0.5."
Sounds like quite a good measure of inequality then. So why didn't W&P use it in the Spirit Level? Well here they are to explain:
"To avoid being accused of picking and choosing our measure, our approach in this book has been to take measures provided by official agencies rather than calculating our own. We use the ratio of the income received by the top to the bottom 20% whenever we are comparing inequality in different countries: it is easy to understand and is one of the measures provided ready-made by the United Nations. When comparing inequality in US states, we use the Gini coefficient: it is the most common measure, it is favoured by economists and it is available from the US Census Bureau. In many academic research papers we and others have used two different inequality measures in order to show that the choice of measures rarely has a significant effect on results."
Gosh, if only there was a way to obtain Gini data for the countries they studied. Never mind, since they've told us it doesn't make any difference I'm sure their selection of income ratios rather than the more commonly used and academically accepted Gini coefficient was just for convenience.

It seems that to both find no relationship between wealth or other markers such as health expenditure and life expectancy, whilst simultaneously finding a relationship between income inequality and life expectancy requires an exquisite balance between exactly which data source, subsample of countries, and measure of inequality you choose to use. Can it really be the case that W&P just stumbled onto this analysis first time and didn't think to check how robust it was to slight variations? Or could it be that there is some intellectual dishonesty at work?

* For 'trend significant' read 'not statistically significant'

** As Neuroskeptic points out in the comments there is a big difference between the life expectancy data in my graph and in the one from the book. Having downloaded the data from the Equality Trust website it seems that their data is about 3yrs older (from 2004) but from the same source. However, reanalysing the data using the Gini coefficients from this post and the life expectancy data from the book I get a stronger correlation (r=-.34) but this is still not statistically significant (p=.12). Re-analysing the data using their inequality measure confirms their analysis with r=-.44 (p<.05). So my failure to reproduce their analysis is partly about which year the life expectancy data is taken from and partly about the choice of inequality measure.


Neuroskeptic said...
This comment has been removed by the author.
Neuroskeptic said...

There is however a big difference between the two graphs: everyone lives 1-2 years longer in your graph.

In your graph the Japanese live almost 83 years, in theirs it's more like 81.5; in yours the lowest country is 78 and a bit, in theirs it's 76 and a bit.

Could it be that your life expectancies are all higher, which also "compresses" the life expectancy differences, and weakens the correlation due to restricting the range?

pj said...

I'd just noticed that myself - since both my OECD and UN data are both very similar (and both give data from 2007) I can only presume that W&P have older life expectancy data - although to lose 2yrs of life expectancy implies the data is pretty old.

There might be a ceiling effect (which isn't quite a range restriction) compressing the relationship (note that in my first OECD graph I got the same slope as W&P did with their data - so it isn't necessarily impossible to get such a slope) but I'm not sure that W&P can use that explanation as a get out since their argument doesn't really allow for a secular increase in life expectancy because that would imply some kind of progressive factor increasing in prevalence over time.

pj said...

Updated - looking at their data the relationship with Gini is still not statistically significant - so a little to do with the life expectancy year used and a little to do with choice of income ratios rather than Gini coefficient.