Tibetan OCR Project

Project Representative(s): Masami Kojima (Associate Professor, Dept. of Electrical Communication,Tohoku Institute of Technology <mkojima@titan.tohtech.ac.jp>
Yoshiyuki Kawazoe (Professor, Institute for Materials Research, Tohoku University)
Masayuki Kimura (Professor, Japan Advanced Institute of Science and Technology, Hokuriku)
This research has originated from the desire to facilitate the work in coding and compiling Buddhist texts written in Tibetan scripts from original Tibetan scripts into romanized form to encourage Buddhist literature studies by using the present-day computer assistance. As an example, we have used the "rGyal rabs gsal ba'i me long" published by " Mi rigs dpe skrun khang," in February 1993, as a volume of 250 pages to perform the present experiment by using modern methods of character recognition designed by the authors. It is hoped that a computer system capable of automatic recognition of these characters would be eagerly welcome by all scholars engaged in Buddhist literature studies because of many printed Buddhist literature's are recently being converted in this form.

The result of the experiments performed is that the segmentation rate achieved is more than 99.9 % for 141,988 characters, and the recognition rate achieved is more than 99.0 % for 17,753 characters. The present research subject of this group is to try to recognize wooden block printed Tibetan manuscripts.