Sick or ill, or both? Using a corpus for investigating meaning

Author: Dr. Tess Slavíčková

Position: Communication and Mass Media Faculty

I am fairly sure that all of us at UNYP consider ourselves learners of at least one foreign language.

In that case we are no doubt all familiar with the problem of not just learning new words, but learning how to use them properly. I remember as a Czech learner being horrified when I heard a friend had “angína” (“angina” in English is a serious heart condition, not a sore throat). When confronted with new words, we can ask a teacher or look for a definition in the dictionary, but still not be entirely sure about when and how they should be used.  Meanwhile, English native speakers who have spent time around Czechs have probably lost count of the times we have heard the Czenglish  “I enjoy going to the nature”.

Often dictionaries provide us with more than one definition for a single word (sometimes dozens), but they are not always so helpful with how to pick the right one.  Furthermore, as in the case of going to “the nature”, how can we figure out that this form does not typically appear in English?

A linguistic corpus (plural: corpora) is basically a large computerized database of naturally occurring language. A general corpus typically comprises transcribed texts taken from a wide range of sources, including newspaper articles, technical journals, novels, letters, speeches and spoken conversations.   A corpus is a valuable resource because it shows us language as it is really used. There are many corpora of major languages, e.g. British and American English, and many modern dictionaries are based on corpus data, so definitions appear in order of frequency of use. There is also, for example, a corpus of (written only, obviously) medieval English, and there are excellent corpora of Czech ( Some can be accessed online for free.  And there is also the option to build your own corpus, and use a tool such as AntConc ( to analyse it.

We can use a corpus in a simple way to search for things like word frequencies, or we might want to see how a word is typically used in real contexts, or we might want to look at how meanings change over time.

For example, we might want to know something about the (apparently) synonymic words “ill”, “poorly” and “sick”. We might want to know overall which of the three is most frequently used. Actually, the British English 2006 corpus shows that overall “ill” and “sick” occur with almost the same frequency (about 35 times per million words), whereas “poorly” is much less frequent at 14. And in fact the word “poorly” is far more frequently used with a different meaning - not as an adjective meaning ill but as an adverb of “poor” as in “the company performed poorly last year”.  We might want to compare use in American and British English (“sick” is more common than “ill” in American English), whether one form is used more frequently in written or spoken English, or whether one of them is now declining in use (“poorly”). With “sick” and “ill” we would need to explore at a deeper level if we wanted to check whether their meanings are 100% interchangeable. We could explore how these words occur near other words (collocation). For example we have the common colloquial expression “sick and tired”, as in “I’m sick and tired of your behavior”. But if someone said “I’m ill and tired of your behavior”, they would sound distinctly odd.

We can also use historical corpora to show how meaning changes over time, for example by comparing corpora collected in two distinct time periods. For instance, we might want to study the interesting history of the word “gay” and its transformation in meaning from “happy”, or “carefree” to “homosexual”. This transformation probably began to appear first in slang spoken by homosexuals before becoming the predominant meaning from about the 1970s. Unfortunately, most older corpora are comprised of written language only, so we can only imagine that (in some circles at least) this word was being used in conversation perhaps as early as the 1930s, later being taken up by the media and so from the late 1970s beginning to appear with increasing frequency in corpus databases, in both written and spoken form.

Because a corpus can reveal a great deal about how language is used in real life contexts, it is a great resource not just for language learners and linguists, but it can also show us much about how language is a mirror of social change, and as such, could be of interest to researchers in any field.

