X Factor 2009 Vote Analysis

Thanks to ITV, the voting stats for X Factor 2009 are available at ITV X Factor 2009 vote stats.

Unfortunately, they’re expressed as percentages of the vote. I’ve massaged the numbers to be the percentage of the average vote – in the first round (with 12 contestants) the average vote would be 8.3%, so 16.7% would be +100%. In the last round, with 2 contestants, the average vote would be 50%, so 16.7% would be -67%.

This chart lets you see how the contestants’ popularity changed over the contest. Joe was always popular. Stacey started well, peaked, then seemed to hand her support to Joe. Olly’s support was steady all the way through, but not high. Danyl started out popular, troughed, peaked, and then slipped away.

(Click to see full size, or see the spreadsheet with data)

Enjoy the chart!

–KW 😎

The things they teach you

“I” before “E” except after “C”

…or so I was taught, and you probably were too. But now schools in England have been asked to stop teaching it, because “there are too few words that follow the rule”.

I remember the rule with fondness, so I have sympathy with the KCL lecturer quoted. But facts are more reliable than impressions, so I thought I’d check it out myself.

The fab team at Lancashire have made their English-language word frequency lists freely available under a Creative Commons license. A quick download and a little bit of Perl later, we have the following:

Total: per million words of English, the rule is wrong 5820 times, and right 8475 times (the remainder do not contain either “ie” or “ei”). In other words, for any random word you might want to write or speak, the rule will be right 59% of the time – just fractionally better than tossing a coin! In fact, the top four words containing “ie” or “ei” all break the rule – it’s not until you reach the fifth that it works:

  1. Wrong: their 2608
  2. Wrong: being 862
  3. Wrong: society 238
  4. Wrong: either 220
  5. Right: view 214
  6. Right: believe 212
  7. Right: experience 189
  8. Right: companies 178
  9. Right: patients 173
  10. Wrong: eight 173

The British National Corpus has written and spoken English in it, but that doesn’t make much difference – for the written corpus only, we have wrong 5760, right 7680, or 57% – slightly worse!

So the educationalists are right for once, and our intuition is wrong – the rule isn’t much better than flipping a coin!

Enjoy.

PS: Here’s the Perl:


#!/usr/bin/perl -w
use strict;

my $wrong = 0;
my $right = 0;

while (<>) {
chomp;
# use [0,1] for 1_2_all_freq.txt or [0,5] for 2_2_writtenspoken.txt
my ($word,$freq) = (split)[0,1];
if ($word =~ /(?:ie|ei)/i) {
# interesting
if ($word =~ /cie|(?:^|[^c])ei/i) {
print "Wrong: $word $freq\n";
$wrong += $freq;
} else {
print "Right: $word $freq\n";
$right += $freq;
}
}
}

print "Total wrong $wrong, right $right\n";