ABSTRACT

Although many researchers use LSAtoday, not that may run their own SVDs (single value decompositions) to create their own spaces. There are several reasons for this: (a) A public Web site exists that is sufficient for most uses of LSA, and the group at the University of Colorado, Boulder, has had programmers that are able to do some “extra jobs” when researchers requested features not available in the Web site. (b) On some occasions, it is difficult to collect a representative corpus of text. (c) There are memory requirements: Current SVD methods place the entire matrix in memory to invert it (see Martin & Berry, chap. 2 in this volume). That places a significant bottleneck in the size of the LSA analyses that one can run, and was the main reason why LSA analyses could not be run on personal computers. However, nowadays the memory bottleneck is no longer an issue, because a consumer-level PC can be configured with more than enough memory to run a large SVD. (d) Computer expertise is required. Traditionally, computers with a memory large enough to run SVD ran UNIX. Also, the numerical computation support needed for large SVDs was available only under machines using UNIX. Not that many people in psychology had the expertise needed to run or desire to learn that operating system, not available on personal computers. Linux had been available for PCs since the early 1990s, but its user base was

small. This situation has changed. On one hand, both apple and IBM-compatible PCs have reached a point where UNIX-based operating systems (OS) are available and mainstream (MacOS-X and Linux, respectively). Most of the classical and current work on LSA takes place on UNIX machines. One reason is that the UNIX philosophy is text-oriented: The fact that UNIX relies on a command line interface where small utilities are chained together makes it ideal to process text. Another reason is that in the past, machines with memories large enough to carry out LSA analyses were large mainframes. Nowadays, LSA analyses can be performed on personal computers under any operating system. However, learning UNIX will help any researcher that is interested in working with text. This chapter focuses on implementations of SVD programs for this OS. The researcher that still opts for staying within a Microsoft Windows framework can, however, still make use of several commercial alternatives (Matlab, Mathematica, R, general programming languages) that will be described later.