Why abandoning the census could be a good idea.

Sample size requirements are totally counter intuitive. I remember my statistics professor mathematically demonstrating to us the relationship between sample size and population size. Our minds were boggled.

For your personal enbogglement, I’ve charted that relationship here. It shows the sample size needed to get a 1 per cent confidence interval with a 99 per cent confidence. The salient point is the flattening out at the top.

samplesizes
Please note the compressed nature of the horizontal axis. Data source: Research Advisors.

It is intuitive that if you’re enquiring about a big population, you need a big sample. Intuitive, but wrong.

Once you’re surveying a population of more than 500,000, there’s scarcely any need to increase your sample size. Sampling more than 16,000 people out of a large population means adding very little value. This is good news. It means our society’s reliance on survey data – for everything from who should be PM to how peanut butter should taste – is efficient.

So why do we run censuses? Partly because the Census Act 1905 compels us to, every five years. Partly because we always have, and partly because the UN encourages us to.

The census offers value, no doubt. I use the data often. It makes sure no community is left out. You can use the census portal to dive down into detail on where you live and get accurate data.

census portal

But does the census get localised data effectively?

If you’re interested in small, remote, unusual communities that you will never otherwise survey, do you really want to ask them the same few questions that are appropriate for everyone else? Or is that a missed opportunity?

Instead of the census, we could ask small communities specific and relevant questions. In remote aboriginal communities, it might not make sense to ask people about their journey to work, but it might make more sense to ask for example about what they eat, which the census does not do.

census mug
I love the ABS, and I cherish this mug I got on the day of the release of the 2011 census.

The census costs a lot – reportedly $440 million last time –  with the price tag going up for the 2016 iteration. The entirety of the ABS Budget is just $400 million, with which they put out new data and analysis just about every day. 

I do not – DO NOT – support cutting the budget of the ABS to do their important day to day work. But I can see why running a census every five years might seem like a waste of resources that would be better used supporting that work.

The best reason to axe the census would be that it adds little empirical exactitude when obtaining estimates of the great homogenous mass of us, and is too blunt to ask the questions that matter of the smaller communities it covers.

Anyone who has read this far may be interested in my review of the US census, which I was lucky enough to participate in during 2010: The US census incensed us. I sense a lack of consensus.

How focusing on “trend” unemployment figures is like chanting “scoreboard” at the football.

Today’s unemployment figures were SHOCKING: the unemployment rate shot up to 6.4 per cent, a whopping increase from last month’s result of 6.0 per cent, in seasonally adjusted terms.

Except.

There are two main series that report the unemployment rate. Trend and Seasonally Adjusted. The former is more stable, the latter is more variable,.

There is a constant fight online between two gangs, the “wonks” and the “journos”. The former generally think the latter are too sensational with their taste for the more wildly variable series.

bloods_crips_
Wonkz v Press.

Here’s the latest update on the two series:

unemp july

Trend looks like a sensible person who never gets too carried away, while seasonally adjusted is a wild ball of emotions, one moment in the dumps, the next elated.

It’s obvious which one serious-minded people should prefer, right?

But what if I told you trend is faking it? See how it claims to be sloping up all year? Let’s go back in time and consider the countenance of our “friend” the trend back in April.

unemp apr

At the time, it also claimed to be feeling glum. Now it has changed its tune. Trend is like a talented politician, flip-flopping around to try to claim the middle ground and seem more reasonable than the rest.

unemp may

As recently as May, trend was headed downward. Then in June it made a small concession to the last two months of movement in the seasonally adjusted series:unemp juneBelow is how the trend is figured out. Essentially it uses a combination of old and new data to get a sense of how the series is moving over a longer time period.

“The smoothing of seasonally adjusted series to produce ‘trend’ series reduces the impact of the irregular component of the seasonally adjusted series. These trend estimates are derived by applying a 13-term Henderson-weighted moving average to all months except the last six. The last six monthly trend estimates are obtained by applying surrogates of the Henderson average to the seasonally adjusted series. Trend estimates are used to analyse the underlying behaviour of a series over time.

 While this smoothing technique enables estimates to be produced for the latest month, it does result in revisions in addition to those caused by the revision of seasonally adjusted estimates. Generally, revisions due to the use of surrogates of the Henderson average become smaller, and after three months have a negligible impact on the series.”

When wonks say “the trend is your friend” they are focusing on a more than just the latest month’s data.

It’s like at the footy. One side kicks a goal and cheers. The other side points to the score, and chants “Scoreboard!” But in doing so, you can miss an important turning point.

Seasonally adjusted data look at what’s happened in the last month alone, just like the goal that just got kicked is the best measure of the passage of play that preceded it. Because it uses less data, it can also include more statistical noise.

The scoreboard, like the trend series, shows more than that and exhibits less statistical noise.

But this is a game that never ends. If you want to know what’s happening, focusing on the most recent figures seems perfectly fair to me.

Revealed: Australia’s manliest and womanliest foods and drinks.

The ABS have set up and released a brand new dataset on Australia’s food consumption. It’s a banquet, a buffet, an all you can eat smorgasbord of delight for data gluttons.

Apparently we consume on average 3.1 kilograms of food and drink in 24 hours. It’s gross to think about.

I’ve gone to the ABS website (or as I call it, the Sizzler of data) and brought you back a doggy bag of sample treats from this survey of 9500 dwellings.

For example, guess what demographic drinks the least water??

Image
Proportion in each group consuming the item in the 24 hours before the survey

The aged! I always imagine them sipping tap water and doing the crossword while muttering darkly about Tony Abbott and their pension. So I’m somewhat amazed. I guess they’re not running 10km very often so perhaps they’re not that thirsty. (Meanwhile, the 14-18 year old bracket loves fizzy drink. No surprises there.)

How about this? Coffee is clearly for people with work to do, while the under-13s and over-71s are busy hosting tea parties.

Image
Proportion in each group consuming the item in the 24 hours before the survey

 

And how about this one, which shows why you easily sell a bottle of wine for the same price as 24 bottles of beer:

Image
Proportion in each group consuming the item in the 24 hours before the survey

I’d note the ABS probably did this survey during the week, and alcohol consumption is more skewed to the weekend.

Anyway, the dataset is big and quite amazing, and I was able to run some numbers to see what foods and drinks are more skewed to men and women.

1. The biggest skew in the whole dataset was for men aged 51-70. In that age bracket, men were ten times as likely to have chugged back a brew in the preceding 24 hours. 26.8 per cent of men, vs 2.6 per cent of women.

Image
Differences between proportion of women and men consuming in 24 hours preceding survey

2. The next biggest skew in the whole dataset was from women aged 19-30. In that category, women were nearly twice as likely to have sipped a cuppa as men.  35.6 per cent of women vs 20 per cent of men.

Image
Differences between proportion of women and men consuming in 24 hours preceding survey

3. For food, the manliest thing there is, is breakfast cereal, and this is especially so in the nutri-grain demographic, 9-13.

Image
Differences between proportion of women and men eating the food in 24 hours preceding survey

 

(This fact also reminded me of this line in this song by this quite popular comedy folk duo from New Zealand.)

4. Meanwhile, and lastly For women, the biggest skew is in a little, tiny, unimportant category you’ve probably never heard of. Fruit. Across the age groups, 10 per cent more women had eaten fruit in the preceding 24 hours.

Image
Differences between proportion of women and men eating the food in 24 hours preceding survey

Damnit, men, why’ve you got to be so stereotypical, eating nutrigrain and beer your whole lives and toppling off the perch by having a heart attack?! 

Anyway, it’s Friday so I should probably not lecture you any more about this. See you at the pub.