ABSTRACT

A. Centers and Associations 269 (1) NCCH (Norwegian Computing Centre for Humanities) 269 (2) CTI (Computers in Teaching Initiative Centre for Textual Studies) 269 (3) CETH (Center for Electronic Texts in the Humanities) 270 (4) ACH (Association for Computers and the Humanities) 270 (5) ALLC (Association for Literary and Linguistic Computing) 271 (6) ACL (Association for Computational Linguistics) 271

B. Electronic Mail Distribution Lists and Discussion Lists 272 (1)HUMBUL 272 (2) CORPORA 272 (3) HUMANIST 272 (4) LINGUIST 273 (5) LN, Langage Naturel, 273 (6) PROSODY 274 (7) Comserve 274 (8) Applied linguistics (TESL-L, SLART-L, MULTI-L, LTEST-L) 274 (9) FUNKNET 275

(10) info-childes and info-psyling 275

(11) ASLING-Linguistics of Signed Languages 275 (12) List of lists 275

C. Email Addresses 276

3. TEXT ENCODING STANDARDS (TEI, IPA, SAM, TOBI) 276

4. DATA SOURCES 278

A. Electronic Data Archives and Repositories 278 (1) OTA (Oxford Text Archive) 278 (2) ICAME (International Computer Archive of Modem English) 278 (3) CHILDES (The Child Language Exchange System) 279 (4) CETH (Center for Electronic Texts in the Humanities) 279 (5) The AIATSIS Aboriginal Studies Electronic Data Archive 280 (6) Project Gutenberg 280 (7) Library of the Future 280

B. Surveys of Electronic Language Data 280 (1) Oxford Text Archive (OTA) catalogue 280 (2) University of Lancaster Survey 280 (3) Georgetown University Catalog of Archives and Projects 281 (4) Walker and Zampolli survey 281 (5) List of Electronic Texts in Philosophy 281 (6) List of Electronic Dictionaries 281 (7) Catalog of the University of Cambridge Literature

and Linguistics Computing Centre 282 (8) Linguistic Society of America List 282 (9) The Marchand list of CD-ROM Projects 282

(10) ARL Directory of Electronic Publications 282

5. CORPORA AND TEXTBANKS 282

A. Running text: English Language 283 (1) Brown Corpus 283 (2) Lancaster-Oslo/Bergen (LOB) 284 (3) London-Lund Corpus 285 (4) Lancaster Spoken English Corpus (SEC) 285 (5) PDCI Corpora 285 (6) Helsinki Corpus of Historical English 286 (7) Macquarie (University) Corpus 286 (8) Kolhapur Corpus of Indian English 286 (9) American Heritage Intermediate Corpus 286

(10) Birmingham Collection of English Text (BCET) 286

(11) Longman/Lancaster English Language Corpus 287 (12) Corpus of Spoken American English (CSAE) 287 (13) International Corpus of English (ICE) 287 (14) British National Corpus Initiative (BNC) 287 (15) Bellcore Lexical Research Corpora 288 (16) Association for Computational Linguistics Data Collection

Initiative (ACL/DCI) 288 (17) European Corpus Initiative (ACL/ECI) 289 (18) Cambridge Language Survey (CLS) 289 (19)Linguistic Data Consortium (LDC) 289 (20) American News Stories 290 (21) Nijmegen TOSCA Corpus 290 (22) Melboume-Surrey Corpus 290 (23) Corpus of English-Canadian Writing 290 (24) Warwick Corpus 290 (25) Cornell corpus 290 (26)NEXIS, LEXIS, MEDIS (Mead Data Central)

and WESTLAW (West Corporation) 291

B. Running text: French Language 291 (1) OTA holdings 291 (2) Hansard Canadian Parliamentary Sessions 291 (3) Ottawa-Hull Corpus of Spoken French 291 (4) Tr6sor de la Langue Fran§aise (TLF or ARTFL) 291

C. Running text: German Language 292 (1) Mannheim Corpus 292 (2) Bonner Zeitungskorpus 292 (3) Freiburger Corpus 292 (4) LIMAS Corpus 292 (5) Pfeffer Spoken German Corpus 292 (6) Ulm Textbank 292 (7) Muenster Textbank 292

D. Running text: Italian Language 292 (1)PIXI corpora 292 (2) Pisa corpus 292

E. Running text: Other Languages 293 (1) Native American Languages 293 (2) Australian Indigenous Languages 293 (3) Danish 293 (4) Estonian 293

(5) Finnish 293 (6) Spanish 293 (7) Swedish 293 (8) Yugoslavian 293

F. Running text: Language Acquisition 294 (1) Child Language Acquisition (CHILDES, PoW) 294 (2) Adult Second Language Acquisition (ESFSLDB, Montreal) 294

G. Phonetic Databases 295 (1) DARPA Speech Recognition Research Databases 295 (2) Phonetic Database (PDB) 295 (3) Multi-Language Speech Database 295

H. Electronic Dictionaries 296 (1) See the Wooldridge list 296 (2) Oxford Text Archive (OTA) holdings 296 (3) Oxford English Dictionary (OED) 296 (4) Le Robert Electronique 296

I. Lexical Databanks 296 (1) MRC Psycholinguistic Database 296 (2) Consortium for Lexical Research (CLR) * 297 (3) Centre for Lexical Information (CELEX) 297 (4) Acquisition of Lexical Knowledge (ACQUILEX) 298 (5) Cambridge Language Survey (CLS) 298 (6) Japanese Electronic Dictionary Research Project 298

J. Treebanks 298 (1) Lancaster-Leeds Treebank 298 (2) Lancaster Parsed Corpus 298 (3) Linguistic DataBase System (LDB) 298 (4) Penn Treebank Project 299 (5) Treebank of Written and Spoken American English 299

K. Translation into English 299

6. LITERATURE PERTAINING TO ELECTRONIC CORPORA 300

ACKNOWLEDGMENTS 300

REFERENCES 301

1. INTRODUCTION

Corpora and textbanks of natural language sentences or utterances are becoming increasingly widely used in linguistics, lexicography, and computer science research, in part due to facilitatory technological advances but also due to a broadening of focus in these three fields to include a greater interest in produced language (vs. introspective knowledge), structured interdependencies involving larger stretches of text (vs. individual utterances or sentences), and contrasts across language varieties, genres, and modalities (e.g., British vs. American English; narratives vs. interviews; spoken vs. written language). For further discussion, see Chafe (1992), Church (1991), Fillmore (1992), Francis (1982), Halliday (1992) Leech (1991,1992), Sinclair (1992), and Svartvik (1992a).