Skip to main content
Taylor & Francis Group Logo
    Advanced Search

    Click here to search products using title name,author name and keywords.

    • Login
    • Hi, User  
      • Your Account
      • Logout
      Advanced Search

      Click here to search products using title name,author name and keywords.

      Breadcrumbs Section. Click here to navigate to respective pages.

      Chapter

      Text Preprocessing David D. Palmer
      loading

      Chapter

      Text Preprocessing David D. Palmer

      DOI link for Text Preprocessing David D. Palmer

      Text Preprocessing David D. Palmer book

      Text Preprocessing David D. Palmer

      DOI link for Text Preprocessing David D. Palmer

      Text Preprocessing David D. Palmer book

      Edited ByNitin Indurkhya, Fred J. Damerau
      BookHandbook of Natural Language Processing

      Click here to navigate to parent product.

      Edition 2nd Edition
      First Published 2010
      Imprint Chapman and Hall/CRC
      Pages 22
      eBook ISBN 9780429149207
      Share
      Share

      ABSTRACT

      In the linguistic analysis of a digital natural language text, it is necessary to clearly define the characters, words, and sentences in any document. Defining these units presents different challenges depending on the language being processed and the source of the documents, and the task is not trivial, especially when considering the variety of human languages and writing systems. Natural languages contain inherent ambiguities, and writing systems often amplify ambiguities as well as generate additional ambiguities. Much of the challenge of Natural Language Processing (NLP) involves resolving these ambiguities. Early work in NLP focused on a small number of well-formed corpora in a small number of languages, but significant advances have beenmade in recent years by using large and diverse corpora from a wide range of sources, including a vast and ever-growing supply of dynamically generated text from the Internet. This explosion in corpus size and variety has necessitated techniques for automatically harvesting and preparing text corpora for NLP tasks.

      T&F logoTaylor & Francis Group logo
      • Policies
        • Privacy Policy
        • Terms & Conditions
        • Cookie Policy
        • Privacy Policy
        • Terms & Conditions
        • Cookie Policy
      • Journals
        • Taylor & Francis Online
        • CogentOA
        • Taylor & Francis Online
        • CogentOA
      • Corporate
        • Taylor & Francis Group
        • Taylor & Francis Group
        • Taylor & Francis Group
        • Taylor & Francis Group
      • Help & Contact
        • Students/Researchers
        • Librarians/Institutions
        • Students/Researchers
        • Librarians/Institutions
      • Connect with us

      Connect with us

      Registered in England & Wales No. 3099067
      5 Howick Place | London | SW1P 1WG © 2022 Informa UK Limited