“A word,” said Humpty Dumpty to Alice, “means just what I choose it to mean” and went on to assert dogmatically, “The question is which is to be master.” Is the language to rule us or we it?
This goes to the heart of modern dictionary-making. Their editors feel, almost to a man or woman, that language is what language does and that if people at large choose to join the fragile old egg in believing that glory means “a nice knock-down argument”, then the word means just that, “neither more nor less”, as Humpty Dumpty peremptorily said to Alice. This permissive approach (descriptive in lexicographic jargon, as opposed to a prescriptive one), by which dictionaries record usage without claiming authority, still saddens some people.
That view, however, is at the heart of this new book from Jeremy Butterfield, a lexicographer himself. Central these days to making a dictionary is the corpus, a huge electronic word hoard. He begins by describing the Oxford corpus of 2006, a monumental accumulation of more than two billion words from every type of English language use, ranging from the free-wheeling text of the blogger to the formal language of scholars. Each text is tagged in various ways, such as the subject matter and the sex and country of origin of the writer; these allow detailed analysis of what words are used in what context and by which groups of people.
It’s magical what a skilled researcher can pull out of this conglomeration. One quarter of all that we write, on average, is made up of just ten words: the, be, to, and, of, a, in, that, have, I; it requires only another 90 words to cover half of our writing. (Strictly, as Butterfield is careful to explain, we should replace word by lemma, the term for the version of the word that appears at the head of an entry in a dictionary and which stands in for all the varied forms that it can take; for example, drive in corpus discussions implies also the other forms of the verb — drives, driving, drove, and driven.)
Butterfield shows how the corpus illustrates the presence of common errors, some of which are well on their way to acceptance. The incorrect phrase just desserts, for example, is nearly twice as common as just deserts (60% against 40%), suggesting that it may well become the standard form. On the other hand, baited breath, though common at 34%, is as yet some way from taking over from bated breath. He notes that in the past ten years, Web site (which is how, in my conservative way, I continue to write it) has been largely replaced by website, with 80% of examples in the latter form. One day, I’m going to have to change, or people will think I’m making a mistake. His title is taken from another error of similar type: damp squid instead of damp squib, which makes sense if you don’t know about the firework and feel that squid are more likely to be wet.
In later chapters, he continues to use the corpus to tease out the nature of English. He focuses for example on collocations, another technical term, this time for words that tend to appear together. These provide dictionary editors with personality profiles that help to make clear how words are being used, in particular in the kinds of ways that entrap unwary learners of English. One pair he uses is naked and bare. We don’t much talk about naked knees or feet, for example — they’re bare — but when we’re speaking about the body we almost always use naked, not bare. Each word has a well-defined constituency of associations.
His style is chatty and examples are plentiful. If you want a quick glimpse into the way dictionaries are compiled today, along the way getting insights into our language, both static and changing, you could do worse than buy this book.
[Jeremy Butterfield, Damp Squid: The English language laid bare, published by Oxford University Press on 29 October 2008; hardback, pp179, including index; ISBN-13: 978-0-19-923906-1, ISBN-10: 0-19-923906-1, publisher’s list price £BP9.99.]