ABSTRACT
This chapter examines how “imperfect yet practical” AI outputs can measurably enhance Buddhist scripture research by integrating high-accuracy AI-OCR with IIIF and by deploying a literature-centred RAG system. Building on the SAT Daizōkyō Text Database and a community curation platform, this study applies the National Diet Library’s AI-OCR to large woodblock Tripiṭaka image sets and implements passage-level alignment between OCR text and SAT’s line-numbered main text. Click-through navigation from text to IIIF images closes the “last-mile” gap of locating passages within long volumes, turning OCR – despite residual errors – into an efficient pointer to material evidence. The paper reports accuracy benchmarks across OCR versions, describes an editing pipeline that incrementally corrects OCR at the line level, and discusses TEI-conformant strategies for recording variants. Complementarily, it introduces “Bauddha AI”, a RAG service that synthesises findings from domain papers retrieved via Apache Solr, enabling sourced, link-backed summaries. The author argues that (1) aligning imperfect OCR with trusted texts, and (2) constraining generation with curated literature, jointly produce reliable, scholar-auditable workflows. While limitations remain – coverage, multilingual expansion, evolving LLM behaviour – the approach is portable across humanities domains and lowers the cost of source verification and literature synthesis.
