Parts of speech tagging tagsets for Penn Treebank: A survey

doi:10.1201/9781003350057-39

ABSTRACT

Useful information from natural language can be extracted by the process of natural language processing (NLP). NLP contains various tools where parts-of-speech (PoS) tagging plays a significant part. Every word in a sentence can be tagged by the process of PoS which includes noun, adjective, preposition, pronoun, article, verb, adverb and many more. In Penn Treebank have a 48 tagset, so we need to add many more tag sets. Penn Treebank in its eight years of procedures composed of generally seven trillion part-of-speech words tagging markers, three trillion words of underfed translate contents, over two trillion words of content translate for state words format, and 1.6 trillion sentences reproduced verbal content observation for oral communication speech. All these materials consist of such huge genres, for example, IBM PC manuals and WSJ (Wall Street Journal) and so on. This paper defines a review of various techniques, for PoS tagging and the PoS tagging process.