ABSTRACT

Rubber . . . UF Caoutchouc Ebonite Gum elastic India rubber Vulcanite BT Latex Non-timber forest products RT Elastomers Gutta-percha NT Elastic fabrics Electric insulators and insulation--Rubber Foam rubber Guayule rubber Vulcanization . . . Vulcanite USE Rubber When indexing using this list, whether the document being indexed uses the term ‘Vulcanite’ or the term ‘Rubber’ in the constituent text makes no difference. The document must be indexed under the term ‘Rubber’. In this way, as was explained in the previous chapter, the indexing language is controlled. A search for the term ‘Vulcanite’ would be unproductive but the searcher could, by referring to the list, ascertain that the preferred term is ‘Rubber’. In addition, by referring to the term ‘Rubber’, a range of related terms that could be used in searching will be revealed. If natural language, where terms are taken directly from the text of a document, is employed, a thesaurus, or an alphabetical subject heading list such as the Library of Congress list, may still assist in the search process. Imagine that an enquirer is searching a natural language system for information on the possible use of parachutes to slow down ocean-going ships. A search is made under ‘Parachutes AND Ships’ but nothing is found. There is, in fact, a relevant document in the information system entitled ‘Putting a halt to super tankers’ but the textual content does not include the word ‘ship’ and, as this is a natural language system, the document has not been retrieved. A check in Thesaurofacet (see page 103) indicates that among the possible alternative terms to ‘Ships’ is the narrower term ‘Tankers’.Amending the search to ‘Parachutes AND Tankers’ will retrieve the document. Without this aid, the searcher would either (a) have assumed, wrongly, that there is nothing relevant in the

system, or (b) have used some other source or even guesswork to come up with possible alternative search terms. Boolean searching and full text databases The search for ‘Parachutes AND Tankers’ utilises the ‘operator’ AND to link the two required terms. ‘AND’ is only one of three operators, the others being ‘OR’ and ‘NOT’. The use of these operators in searching is now very common and widely accepted. AND, OR and NOT are known as Boolean operators and this type of searching is referred to as Boolean searching. A search for ‘Venice AND Climate’ would locate items that had been indexed under both of these terms. A search for ‘Venice AND (Climate OR Weather)’ would find items which had been indexed under or contained ‘Venice’ and either ‘Climate’ or ‘Weather’. A search for ‘Venice AND (Glass NOT Crystal)’ would yield those items indexed under ‘Venice’ and ‘Glass’ but not those indexed under ‘Venice’ and ‘Crystal’. Today, many online information systems contain abstracts or the full text of articles, documents, books, etc. It is now possible to search a complete text, even a large work such as an entire encyclopaedia, for a single word or term, that is ‘full text’ searching. Because of this, some writers argue that a knowledge of classification is no longer necessary for the information worker. Burton (1997), for instance, maintains, perhaps tongue in cheek, that ‘Internet search engines can rapidly find the growing volumes of information created and stored electronically using full text retrieval techniques. These search engines … are capable of handling complex search strategies with Boolean operators’. What this statement fails to recognise is that even Boolean searching involves elements of classification. To take a practical example, let us imagine that an enquirer searches for the subject of this book: ‘Classification’ and the search results in thousands, or even millions, of ‘matches’ or ‘hits’. This is far too great a number of items to browse and many of the items found will be irrelevant as classification can be concerned with a number of different subject areas: ‘Biology’; ‘Diseases’; ‘Plants’; and so on. The enquirer will then add a second term, such as ‘Information retrieval’, in order to narrow the search to the particular subject area required and thereby reduce the number of items retrieved and increase their relevance. If we examine what has happened here, this type of searching, although Boolean in that the user is searching for ‘Classification AND Information retrieval’, can be seen also as a type of classification, in that the user is, in effect, identifying characteristics of a subject in order

to bring like items together and separate unlike, the basic principle of classification. The above Boolean search could be represented diagrammatically thus, the shaded area representing documents that have been indexed under both terms:

Despite the extensive use of Boolean in online searching, not everyone is convinced that it is the best search methodology. Hildreth (1989), for example, states that ‘Much research and experience with Boolean retrieval systems … indicates clearly and repeatedly that Boolean search formulation syntax and retrieval techniques are not very effective in search performance and not very usable or efficient search methods for end-users’. ‘Determined explorers and the just plain curious need a flexible, rich, contextual subject search and browsing mode which offers plenty of navigation and trail blazing options’. Schneiderman (1997) maintains that ‘until recently, computer scientists argued that the best way to search for information on the Web was by using keyword searching . . . But keyword searching often fails miserably’. There is one word in the first of the above quotes that implies that classification must have a more significant and direct role to play over and above its inherent link with Boolean searching, and that word is ‘contextual’. Imagine that a user searches for the term ‘Churches’ and gets the response Your search did not match any documents: Try to broaden your search using more general keywords If the user is interested in the church as a physical entity, then, using a similar strategy as in the manual search for ‘Stegosaurus’ already described, he or she might enter the more general, broader term ‘Buildings’ or perhaps ‘Architecture’. Clearly, this type of searching is making use of classification and, where a hierarchical relationship such as

this is concerned, the diagrammatic representation would be very different to the Boolean diagram shown above. It would appear thus:

This sort of relationship is therefore not conducive to Boolean type searches but requires some more explicit form of hierarchical classificatory facility. The distinction between the two kinds of inter-term relationship described here equates with the syntactical (or posteriori) and thesaural (or priori) relationships described in the international standard ISO 27881986. The thesaural, or priori, relationship, this standard states, ‘adds a second dimension to an indexing language’ and ‘the effectiveness of a subject index as a means of identifying and retrieving documents’ in any system (including ‘those systems in which a computer is used to store and manipulate terms or to identify documents associated with terms’) ‘depends upon a well-constructed indexing language’ (International Standard for Organization, 1986). The use of classification is essential for efficient subject access. Use of the classification schedule So far in this chapter we have been concerned with searching under alphabetical terms but it is, of course, also possible to make use of the classification schedules. Notation can help solve terminological problems and allow complex subjects to be represented by simpler coding. For example, using the Dewey Decimal Classification, a search for 598.2 would find ‘Birds’, ‘Ornithology’ and ‘Aves’. Also using Dewey, a search for 670.427 would equal a search for ‘Mechanisation and automation of factory operations’ and, using the British Catalogue of Music

Classification, a search for QPG would equal a search for ‘Suites for solo piano’. When searching a system arranged according to an enumerative classification, the search will usually be made for a classification number that represents the complete subject. For example a search for the Dewey Decimal Classification number 629.4753, or the Library of Congress Classification number TL783.5, is a search for the subject ‘Nuclear propulsion systems for spacecraft’. When using a faceted scheme, the search may be for the complete subject, for example using Thesaurofacet for:

RKM/SBH where RKM is ‘Spacecraft’ and SBH is ‘Nuclear propulsion’, or a ‘string’ search (that is, a search for a specified string of characters contained within a larger string) could be made for any class number which includes a particular element, for example a search for:

RKM The idea of searching for a particular element rather than a complete subject can be extended further. In order to understand how this can be done the reader must be aware of the difference between pre-coordinate and post-coordinate indexing. If the indexer devises an alphabetical subject entry or classification number for the complete subject, for example:

Spacecraft : Nuclear propulsion or 629.4753 or RKM/SBH

then this is referred to as pre-coordinate indexing; the significant point being that the concepts, or elements, which together make up the complete subject, are coordinated, or combined, by the indexer. If the indexer merely indexes the constituent components of the subject, for example:

Spacecraft or Nuclear propulsion

or RKM

or SBH then this is referred to as post-coordinate indexing. The components of the subject description or class number are left separate and they must be coordinated by the searcher. Boolean searching is a post-coordinate method, the terms ‘Spacecraft’ and ‘Nuclear propulsion’ could be coordinated, or linked, by the searcher as: ‘Spacecraft AND Nuclear propulsion’. Because a faceted classification scheme provides class numbers for separate concepts rather than complete subjects, such a scheme can also be used post-coordinately and searches may be made for subjects such as ‘RKM AND SBH’. Note that a hierarchical enumerative scheme must always be used precoordinately but a faceted scheme may be used in either a pre-or postcoordinate manner. When using a faceted scheme post-coordinately no attempt is made to combine class numbers for constituent concepts into a composite number. Citation order and facet linking devices are therefore irrelevant (see also page 91). Searches can be broadened or narrowed by reducing or increasing the number of elements. For example, using the London Classification of Business Studies (see pages 31 and 104), a search for ‘Safety measures for materials handling in the explosives industry’ would be for:

CMG AND JZRD AND KDQ where CMG is ‘Materials handling’, JZRD is ‘Safety measures’ and KDQ is ‘Explosives industry’. It is then possible to broaden the search by reducing the number of elements. If a search is begun using three elements, as in the above example, each of these could be discarded in turn by searching for:

CMG AND JZRD i.e. ‘Safety measures for materials handling’ CMG AND KDQ i.e. ‘Materials handling in the explosives industry’

or JZRD AND KDQ i.e. ‘Safety measures in the explosives industry’ Alternatively, the three se arches could be combined as:

(CMG AND JZRD) OR (CMG AND KDQ) OR (JZRD AND KDQ)

The search could be broadened even further by searching for a single element, for example:

KDQ i.e. ‘Explosives industry’

This illustrates the degree of flexibility that can be achieved when searching post-coordinately with a faceted scheme. Some of the other special facilities introduced to improve alphabetical term searching in computerised systems, can be adapted for use with classification numbers. An example is the ‘truncation’ device, that allows searching on word stems, e.g. a search for ‘Comput*’ would find ‘Computer’, ‘Computers’, ‘Computing’, ‘Computerization’ and ‘Computerisation’. This same device can also be used on classification numbers as a means of broadening a search. Where the previous example of ‘Churches’ is concerned, in the Dewey Decimal Classification this topic would be classified at 726.5. Truncation would allow the search to be widened progressively. Searching for: 726.5 would equal a search for ‘Churches’ (Buildings associated with Christianity) 726 “ ‘Religious buildings’ 72 “ ‘Architecture’

Thus, if a classification scheme is hierarchical and the notation is expressive, the search can obviously be broadened, narrowed or widened in a similar manner to that explained above for a thesaurus. Chain procedure The above type of search relates to what Ranganathan referred to as a ‘chain’, that is, the classification hierarchy for a subject, working through from the general to the more specific. Such a chain can play a significant part in the search process, not only by the use of classification numbers but also by means of alphabetical entries derived from the classification number by the ‘chain procedure’ method. This method can be applied not only to hierarchical classification with expressive notation but also to faceted classification and non-expressive notation. For example, using the Universal Decimal Classification, the classification number for the subject ‘Public health aspects of petroleum pollution of sea water’ would be:

614.777(26):665.6 This number clearly is not expressive but a hierarchical ‘chain’ can still be constructed, that is:

6 Technology 61 Medicine

614 Public health 614.7 Pollution

614.777 Water pollution 614.777(26) Seas 614.777(26):665.6 Oil. Petroleum

Alphabetical subject index entries can be produced from this chain by beginning with the last, or most specific ‘link’ and proceeding step by step back through the chain, qualifying where necessary by a more general term or terms to indicate the subject context: Petroleum: Sea water pollution: Public health 614.777(26):665.6

Oil: Sea water pollution: Public health 614.777(26):665.6 Sea water pollution: Public health 614.777(26) Water pollution: Public health 614.777 Pollution: Public health 614.7 Public health 614 Medicine 61 Technology 6

The process of qualification would lead to the production of a relative index (see pages 42-3) to the classified arrangement, for example: Petroleum: Economic geology 553.982 Petroleum: Mining 622.323 Petroleum: Sea water pollution: Public health 614.777(26):665.6 Because of problems relating to terminology and ‘missing’ or ‘false’ links, this method is not purely mechanical but semi-mechanical in that some adjustment to the chain, as derived from the classification, may be required. Nevertheless it does provide a means of producing a specific alphabetical entry for a subject, based upon the classification schedule in use, which will indicate the context in which the subject is treated. Entries for related aspects of the same subject can also be pinpointed. Chain procedure has been applied to a number of information systems, especially in libraries and information services, and it may also be used, either consciously or unconsciously, for book indexing. The prime example of the successful use of chain is probably the British National Bibliography (BNB). The BNB is a weekly list which aims to include all new and forthcoming books newly published in the United Kingdom and Republic of Ireland. Works are arranged by the Dewey Decimal Classification and name, title and subject indexes to this classified sequence are provided. Chain procedure was used to produce the printed subject index between 1950 and 1970.