1 The beginnings of corpus linguistics can be traced back much earlier than the widespread availability of the digital computer.
A O1
O1 True
R1
O2 False
R2 No, corpora were used well before then; see p.000.
2 In the modern sense of the term, corpus linguistics is an empirical investigation of large digitized collections of actual instances of language use.
A O1
O1 True
R1
O2 False
R2 No, this is true. See p.000.
3 Being an empirical discipline, corpus linguistics is independent of theory and aims to make no contribution to linguistic theory.
A O2
O1 True
R1 No, this is quite false: being empirical does not mean it is non-theoretical.
O2 False
R2
4 What was the approximate size of the earliest electronic corpora?
A O2
O1 10,000 words
R1 They were much larger than this; see p.000.
O2 1,000,000 words
R2
O3 10,000,000 words
R3 They were much smaller than this; see p.000.
O4 100,000 words
R4 They were larger than this; see p.000.
O5 100,000,000 words
R5 They were much smaller than this; see p.000.
5 What is the ultimate concern of corpus linguistics?
A O5
O1 To understand speakers' intuitions about their language
R1 No; see p.000.
O2 To provide statistical information about the frequency of use of linguistic variables
R2 No; this is one of the concerns of corpus linguistics, but is too narrow; see p.000.
O3 To inform the applications of linguistics to various domains such as language teaching and forensic linguistics
R3 No. These are certainly uses to which corpus linguistics has been put, but they are not its ultimate concern.
O4 To provide a solid basis for making prescriptions about how a language should be spoken or written
R4 No; corpus linguistics is not inherently prescriptive.
O5 To understand how speakers actually use their language, and ultimately to understand language itself
R5
6 A corpus that is designed to provide a snapshot of a particular language or language variety as it is spoken at some point in time is called:
A O2
O1 A multilingual corpus
R1 No, see p.000.
O2 A specialized corpus
R2 No, see p.000.
O3 A general corpus
R3
O4 A learner corpus
R4 See p.000.
O5 A parsed corpus
R5 See p.000.
7 Relevant considerations for general corpora include that it:
A O1 & O6
O1 Is balanced
R1
O2 Is at least a billion words
R2 Not all general corpora are this large
O3 Represents all ideolects of a language
R3 Impossible for languages with large numbers of speakers
O4 Represents all dialects of the language
R4 Not necessarily: a general corpus could be of a single dialect
O5 Is parsed
R5 Not all general corpora are parsed
O6 Is representative
R6
O7 None of the above
R7 See p.000.
8 An example of a learner corpus is:
A O2
O1 COHA
R1 This is a historical corpus; see p.000.
O2 CHILDES
R2
O3 COBUILD
R3 This is a general corpus; see p.000.
O4 SiBol
R4 This is a specialized corpus; see p.000.
O5 None of the above
R5 See p.000.
9 Some modern digital corpora comprise more than a billion words.
A O1
O1 True
R1
O2 False
R2 See p.000.
10 A general corpus of a million words will be large enough to address any grammatical question.
A O2
O1 True
R1 See p.000.
O2 False
R2
11 What is meant by the range of an item in a corpus?
A O1
O1 The number of texts it occurs in
R1
O2 The frequency of occurrence of the item
R2 See p.000.
O3 The number of allomorphs it shows
R3 See p.000.
O4 The number of phonemes making up the item
R4 See p.000.
O5 The relative frequency of the item per million words
R5 See p.000.
12 What is the citation form of the set of inflected forms of a word called?
A O1
O1 A lemma
R1
O2 A headword
R2 See p.000.
O3 A paradigm
R3 See p.000.
O4 A root
R4 See p.000.
O5 A collocation
R5 See p.000.
13 High frequency words in a corpus tend to be short, typically monosyllabic.
A O1
O1 True
R1
O2 False
R2 Read p.000 again.
14 Which of the following generalizations is supported by the frequency studies of the three corpora referred to on pp.000-000:
A O4
O1 Around ten word types account for the majority of word tokens
R1 See p.000.
O2 Hapax legomenas account for a small number of word types
R2 See p.000.
O3 Hapax legomenas account for the majority of word tokens
R3 See p.000.
O4 Around 10 word types account for about a fifth of the word tokens
R4
O5 The word types of a language show a bell-curve (normal) distribution in token frequency
R5 The figures given in Tables 9.1 and 9.2 show that the distirbution can't be normal
15 Keywords identified by corpus software tell us what a text is about.
A O2
O1 True
R1 Not necessarily: see examples on p.000.
O2 False
R2
16 A list of all instances of a word or expression in a corpus, together with some context, is called:
A O3
O1 KWIC
R1 See p.000.
O2 A collocation
R2 See p.000.
O3 A concordance
R3
O4 A dictionary
R4 See p.000.
O5 A lexicon
R5 See p.000.
17 Which of the following regular expressions would you use to search for instances of the verb see in a corpus.
A O1
O1 see|sees|seen|seeing|saw
R1
O2 see*
R2 This will give too many irrelevant strings: see p.000.
O3 see?
R3 A number of forms of the verb will not be found: see p.000.
O4 s??
R4 This will give too many irrelevant strings, and not give a number of relevant ones: see p.000.
O5 SEE
R5 This is not a regular expression, though in some software tools it will find lemmas; see p.000.
18 Which of the following is the best description of a collocation:
A O3
O1 A list of instances of a word in its contexts of occurrence
R1 This is a concordance: see p.000.
O2 A lexical item made up of two lexical items
R2 This is a description of a compound; see p.000.
O3 A statistically significant cooccurrence of words
R3
O4 A sequence of words with a non-compositional meaning
R4 This describes an idiom: see p.000.
O5 A word that occurs only once in a corpus
R5 This is a hapax legomena: see p.000.
19 According to J.R. Firth, the meaning of a word is partly determined by its collocates.
A O1
O1 True
R1
O2 False
R2 This is true, he did say this: see p.000.
20 Synonyms and near synonyms often differ in terms of the words they collocate with.
A O1
O1 True
R1
O2 False
R2 See p.000.
21 A sequence of words that is repeated a number of times in a corpus is called which of the following?
A O5
O1 A compound
R1 See p.000.
O2 A collocation
R2 A collocation is more general than this, and need not refer to a sequence of words: see p.000.
O3 A binomial
R3 This is a more specific term that refers to sequences of exactly two words; see p.000.
O4 A syntagm
R4 A syntagm need not be repeated a number of times in a corpus; see p.000.
O5 A cluster
R5
22 Which of the following research projects would a corpus study not be appropriate for?
A O1 and O5
O1 A study of speaker's intuitions about the grammaticality of a construction type
R1
O2 A study of the meaning of the word position
R2 Corpora are frequently used in studies of the meanings of words; see p.000.
O3 A study of the meaning of the construction Xs will be Xs
R3 Corpora are frequently used in studies of the meanings of constructions, and this construction could be fairly easily found in a corpus; see p.000.
O4 A study of words peculiar to a dialect
R4 This could be done by use of keyword software; see p.000.
O5 To determine whether a particular sequence of words is ungrammatical in a language
R5
23 Assuming you found that humour was a positive keyword when you did a keyword comparison of BE06 and AmE06, what would you suggest as the most likely explanation?
A O3
O1 A difference in dialect
R1 The word is found in both British and American English.
O2 A difference in the topics that British people and Americans commonly talk about
R2 This is unlikely given the selection criteria for texts in the two corpora.
O3 A difference in spelling conventions in Britain and America
R3
O4 A difference in formality between the corpora
R4 Formality seems unlikely to account for the greater use of this word in British English.
O5 A difference in register between the corpora.
R5 Register seems unlikely to account for the greater use of this word in British English.