1 The beginnings of corpus linguistics can be traced back much earlier than the widespread availability of the digital computer. A O1 O1 True R1 O2 False R2 No, corpora were used well before then; see p.000. 2 In the modern sense of the term, corpus linguistics is an empirical investigation of large digitized collections of actual instances of language use. A O1 O1 True R1 O2 False R2 No, this is true. See p.000. 3 Being an empirical discipline, corpus linguistics is independent of theory and aims to make no contribution to linguistic theory. A O2 O1 True R1 No, this is quite false: being empirical does not mean it is non-theoretical. O2 False R2 4 What was the approximate size of the earliest electronic corpora? A O2 O1 10,000 words R1 They were much larger than this; see p.000. O2 1,000,000 words R2 O3 10,000,000 words R3 They were much smaller than this; see p.000. O4 100,000 words R4 They were larger than this; see p.000. O5 100,000,000 words R5 They were much smaller than this; see p.000. 5 What is the ultimate concern of corpus linguistics? A O5 O1 To understand speakers' intuitions about their language R1 No; see p.000. O2 To provide statistical information about the frequency of use of linguistic variables R2 No; this is one of the concerns of corpus linguistics, but is too narrow; see p.000. O3 To inform the applications of linguistics to various domains such as language teaching and forensic linguistics R3 No. These are certainly uses to which corpus linguistics has been put, but they are not its ultimate concern. O4 To provide a solid basis for making prescriptions about how a language should be spoken or written R4 No; corpus linguistics is not inherently prescriptive. O5 To understand how speakers actually use their language, and ultimately to understand language itself R5 6 A corpus that is designed to provide a snapshot of a particular language or language variety as it is spoken at some point in time is called: A O2 O1 A multilingual corpus R1 No, see p.000. O2 A specialized corpus R2 No, see p.000. O3 A general corpus R3 O4 A learner corpus R4 See p.000. O5 A parsed corpus R5 See p.000. 7 Relevant considerations for general corpora include that it: A O1 & O6 O1 Is balanced R1 O2 Is at least a billion words R2 Not all general corpora are this large O3 Represents all ideolects of a language R3 Impossible for languages with large numbers of speakers O4 Represents all dialects of the language R4 Not necessarily: a general corpus could be of a single dialect O5 Is parsed R5 Not all general corpora are parsed O6 Is representative R6 O7 None of the above R7 See p.000. 8 An example of a learner corpus is: A O2 O1 COHA R1 This is a historical corpus; see p.000. O2 CHILDES R2 O3 COBUILD R3 This is a general corpus; see p.000. O4 SiBol R4 This is a specialized corpus; see p.000. O5 None of the above R5 See p.000. 9 Some modern digital corpora comprise more than a billion words. A O1 O1 True R1 O2 False R2 See p.000. 10 A general corpus of a million words will be large enough to address any grammatical question. A O2 O1 True R1 See p.000. O2 False R2 11 What is meant by the range of an item in a corpus? A O1 O1 The number of texts it occurs in R1 O2 The frequency of occurrence of the item R2 See p.000. O3 The number of allomorphs it shows R3 See p.000. O4 The number of phonemes making up the item R4 See p.000. O5 The relative frequency of the item per million words R5 See p.000. 12 What is the citation form of the set of inflected forms of a word called? A O1 O1 A lemma R1 O2 A headword R2 See p.000. O3 A paradigm R3 See p.000. O4 A root R4 See p.000. O5 A collocation R5 See p.000. 13 High frequency words in a corpus tend to be short, typically monosyllabic. A O1 O1 True R1 O2 False R2 Read p.000 again. 14 Which of the following generalizations is supported by the frequency studies of the three corpora referred to on pp.000-000: A O4 O1 Around ten word types account for the majority of word tokens R1 See p.000. O2 Hapax legomenas account for a small number of word types R2 See p.000. O3 Hapax legomenas account for the majority of word tokens R3 See p.000. O4 Around 10 word types account for about a fifth of the word tokens R4 O5 The word types of a language show a bell-curve (normal) distribution in token frequency R5 The figures given in Tables 9.1 and 9.2 show that the distirbution can't be normal 15 Keywords identified by corpus software tell us what a text is about. A O2 O1 True R1 Not necessarily: see examples on p.000. O2 False R2 16 A list of all instances of a word or expression in a corpus, together with some context, is called: A O3 O1 KWIC R1 See p.000. O2 A collocation R2 See p.000. O3 A concordance R3 O4 A dictionary R4 See p.000. O5 A lexicon R5 See p.000. 17 Which of the following regular expressions would you use to search for instances of the verb see in a corpus. A O1 O1 see|sees|seen|seeing|saw R1 O2 see* R2 This will give too many irrelevant strings: see p.000. O3 see? R3 A number of forms of the verb will not be found: see p.000. O4 s?? R4 This will give too many irrelevant strings, and not give a number of relevant ones: see p.000. O5 SEE R5 This is not a regular expression, though in some software tools it will find lemmas; see p.000. 18 Which of the following is the best description of a collocation: A O3 O1 A list of instances of a word in its contexts of occurrence R1 This is a concordance: see p.000. O2 A lexical item made up of two lexical items R2 This is a description of a compound; see p.000. O3 A statistically significant cooccurrence of words R3 O4 A sequence of words with a non-compositional meaning R4 This describes an idiom: see p.000. O5 A word that occurs only once in a corpus R5 This is a hapax legomena: see p.000. 19 According to J.R. Firth, the meaning of a word is partly determined by its collocates. A O1 O1 True R1 O2 False R2 This is true, he did say this: see p.000. 20 Synonyms and near synonyms often differ in terms of the words they collocate with. A O1 O1 True R1 O2 False R2 See p.000. 21 A sequence of words that is repeated a number of times in a corpus is called which of the following? A O5 O1 A compound R1 See p.000. O2 A collocation R2 A collocation is more general than this, and need not refer to a sequence of words: see p.000. O3 A binomial R3 This is a more specific term that refers to sequences of exactly two words; see p.000. O4 A syntagm R4 A syntagm need not be repeated a number of times in a corpus; see p.000. O5 A cluster R5 22 Which of the following research projects would a corpus study not be appropriate for? A O1 and O5 O1 A study of speaker's intuitions about the grammaticality of a construction type R1 O2 A study of the meaning of the word position R2 Corpora are frequently used in studies of the meanings of words; see p.000. O3 A study of the meaning of the construction Xs will be Xs R3 Corpora are frequently used in studies of the meanings of constructions, and this construction could be fairly easily found in a corpus; see p.000. O4 A study of words peculiar to a dialect R4 This could be done by use of keyword software; see p.000. O5 To determine whether a particular sequence of words is ungrammatical in a language R5 23 Assuming you found that humour was a positive keyword when you did a keyword comparison of BE06 and AmE06, what would you suggest as the most likely explanation? A O3 O1 A difference in dialect R1 The word is found in both British and American English. O2 A difference in the topics that British people and Americans commonly talk about R2 This is unlikely given the selection criteria for texts in the two corpora. O3 A difference in spelling conventions in Britain and America R3 O4 A difference in formality between the corpora R4 Formality seems unlikely to account for the greater use of this word in British English. O5 A difference in register between the corpora. R5 Register seems unlikely to account for the greater use of this word in British English.