The things they teach you

“I” before “E” except after “C”

…or so I was taught, and you probably were too. But now schools in England have been asked to stop teaching it, because “there are too few words that follow the rule”.

I remember the rule with fondness, so I have sympathy with the KCL lecturer quoted. But facts are more reliable than impressions, so I thought I’d check it out myself.

The fab team at Lancashire have made their English-language word frequency lists freely available under a Creative Commons license. A quick download and a little bit of Perl later, we have the following:

Total: per million words of English, the rule is wrong 5820 times, and right 8475 times (the remainder do not contain either “ie” or “ei”). In other words, for any random word you might want to write or speak, the rule will be right 59% of the time – just fractionally better than tossing a coin! In fact, the top four words containing “ie” or “ei” all break the rule – it’s not until you reach the fifth that it works:

  1. Wrong: their 2608
  2. Wrong: being 862
  3. Wrong: society 238
  4. Wrong: either 220
  5. Right: view 214
  6. Right: believe 212
  7. Right: experience 189
  8. Right: companies 178
  9. Right: patients 173
  10. Wrong: eight 173

The British National Corpus has written and spoken English in it, but that doesn’t make much difference – for the written corpus only, we have wrong 5760, right 7680, or 57% – slightly worse!

So the educationalists are right for once, and our intuition is wrong – the rule isn’t much better than flipping a coin!

Enjoy.

PS: Here’s the Perl:


#!/usr/bin/perl -w
use strict;

my $wrong = 0;
my $right = 0;

while (<>) {
chomp;
# use [0,1] for 1_2_all_freq.txt or [0,5] for 2_2_writtenspoken.txt
my ($word,$freq) = (split)[0,1];
if ($word =~ /(?:ie|ei)/i) {
# interesting
if ($word =~ /cie|(?:^|[^c])ei/i) {
print "Wrong: $word $freq\n";
$wrong += $freq;
} else {
print "Right: $word $freq\n";
$right += $freq;
}
}
}

print "Total wrong $wrong, right $right\n";

4 thoughts on “The things they teach you

  1. Are you weighting the words for how frequently they are used, or just taking number of words regardless? The weighting surely makes a difference to how useful the rule is. Although your list of 10 most frequently used words
    suggests you may well be using a weighting…

  2. Yes, this is weighted by frequency. So "their" counts as 2608 words per million, but "view" only counts as 214 words per million. So the rule is right 59% of the time for words weighted by the frequency with which they are used in English (per the British National Corpus).

  3. For a more reasoned response than you'll find in the Media, try Language Log. Basically, the correct form of the rule is highly accurate (the full form is "I before E except after C when the sound is EE"), and the actual recommendation isn't that the rule is wrong and should be dropped! Rather, they recommend teaching the "c" words as exceptions to the rule explicitly!

    (Indeed, about the only exceptions to the correct form of the rule appear to be the words 'species' and 'protein', and possibly 'weird', depending how you pronounce it.)

  4. Interesting. I'd never heard the "full form" until this story broke – I was certainly taught the short form when I was a kid (in New Zealand). Clearly it's not just NZ, since the journalists in the paper were also taught the short form. So "The rule is always taught, by anyone who knows what they are doing…" (per Language Log) is merely idealist ranting.

    I don't think rules like this do any harm, but I don't think they help much either. Rules don't get you very far in spelling English – you just have to read lots and lots and lots to get exposure to correct spelling.

Leave a Reply

Your email address will not be published. Required fields are marked *