Digital Dictionary of Buddhism / CJKV-English Dictionary: An Introduction
July 5, 2015
A. Charles Muller
|Middle Period: Heading for Critical Mass|
The Digital Dictionary of Buddhism [DDB] is a compilation of Chinese ideograph-based terms, texts, temple, schools, persons, etc. found in Buddhist canonical sources. The Chinese-Japanese-Korean-Vietnamese/English Dictionary [CJKV-E] is a compilation of Chinese ideographs, as well as ideograph-comprised compound words, text names, person names, etc., found primarily in the Confucian and Daoist classics. It also includes vocabulary from Neo-Confucian texts, as well as other philosophical and historical sources. Its information on individual ideographs is intended to be comprehensive, containing pronunciations and meanings from ancient and modern sources from the Sinitic cultural sphere including China, Korea, Japan, and Vietnam. Modern-day compound words are included incidentally, but the coverage of modern materials is not intended to be comprehensive.
The compilation of these two lexicons—now separate entities, originally started out as a single work, initiated by me (Charles Muller) in 1986, during my first semester of graduate school, upon my realization of the dearth of reliable English language reference works for Buddhist and Classical Chinese technical terminology. Since my basic area of interest was in the Chinese Buddhist canon, the orientation of the DDB was toward East Asian sources, and therefore the dictionary was known during its first 15 years of existence as the Dictionary of East Asian Buddhist Terms (DEABT). Realizing, however, that a large portion of the content actually dealt with Indian and other cultural manifestations of Buddhism, and not wanting to discourage potential collaborators who work in these areas, we renamed it in 2001 to the present Digital Dictionary of Buddhism (DDB). Thus, while there is a predominance of East Asian terminology, since much of what East Asian Buddhists have written about is the Buddhism of India, Central Asia, and Tibet, the content of this reference work is intended to be pan-Buddhist in character.
The project was initiated at a time (1986) before anyone had conceived of the World Wide Web as we know it today. In 1995, shortly after the inception of the Web, and after learning the basics of creating an HTML document, it occurred to me that this dictionary-in-progress might benefit from being made available on the web. From the present-day perspective this might seem like a no-brainer, but at that time the notion of being able to make such reference materials available more freely, more quickly, and less expensively, to a wider range of people than one could have ever imagined was mind-boggling—a huge leap from the print reference works that were the standard at the time. It enabled the kind of collaboration not conceivable in the age of paper publication.
Within a few months of my placing of this compilation on the web in a simple hard-linked HTML format, it was discovered by Christian Wittern, a scholar of Chinese Chan Buddhism, who also happened to be one of the most advanced theorists and practitioners of digital technology in the Humanities fields. Christian immediately converted the data to SGML format, and I was over time able to learn enough about SGML such that I could continue to develop the underlying format on my own with SGML as the data storage format. After this, a few of the earliest content contributors, including Gene Reeves, Jamie Hubbard, Charles Patton, and Iain Sinclair contacted me to offer their own digitized glossaries to the DDB.
During the process of developing this compilation in the late 90's, two distinct areas of content became apparent: (1) content related directly to Buddhism, and (2) Confucian-Daoist content, historical information, and other secular-oriented content. For the purpose of organizing content, contributions, and contributors, as well as for grant application purposes, it became clear that it would be advantageous to separate the data into two separately identifiable compilations, which were renamed as the present DDB and CJKV-E dictionaries.
During the late 90's, the SGML world turned to the emerging XML standard, and we followed. This publicized shift in the format of the DDB and CJKV-E attracted Louis-Dominique Dubeau, who wrote the first proper DTD for both works. At this time, however, while the local data was saved in XML, the online version was published in static HTML every few months or so, lacking a search engine or any other technological advantage other than simple hyperlinking. But in 2001 I was extremely fortunate (through the Mulberry XSLT list) to make contact with Michael Beddow (ret., Leeds University), an expert in the area of humanities computing, who, with consummate skill, care, and generosity, took the XML data and created a search mechanism using XPath/XLinking, along with Perl—a search engine which was to the best of our knowledge the first at the time that would search mixed Latin and double-byte East Asian text in XML/Utf-8 encoding. So Michael's search engine was a novel creation which still serves its purpose quite well (albeit in a regularly-updated and enhanced manner). 1 Michael has continued to support the DDB up to the present, adding various enhancements, periodically updating the system, as well as providing web site security. The level of technical support Michael has provided to the project is rather incredible. Simply put, the project would never have succeeded to anything close to its present level without his support.
Middle Period: Heading for Critical Mass
The dictionaries had gained a limited amount of attention almost immediately, with Jamie Hubbard introducing them in his 1995 article (probably the first in the field in a major journal on digital resources) on web resources 2 In 1996, I made my first presentation on the project at the second meeting of the Electronic Buddhist Text Initiative (photo). By the late 90's, usage of the dictionaries was steadily increasing. In 2002, student assistants working in the project completed the input of digitized (and edited) materials from a major copyright-expired reference work on Buddhism (Soothill’s Dictionary of Chinese Buddhist Terms—funded by JSPS). This, along with my own input, raised the DDB content to 15,000 entries, thus creating a respectable basic range of coverage. The same grant also enabled us to digitize Lewis Lancaster's landmark work, The Korean Buddhist Canon: A Descriptive Catalogue . Using the data from this compilation whenever we created a new entry on a text from the Chinese canon, we were able to quickly gain all the basic information of dating, provenance, translation, variant versions, and so forth. To this we were able to add content information for the given text from other sources. And of course, we could at the same time include corrections based on interim research. Around the same time, we completed the main part of the construction of a master index of the major East Asian Buddhist lexicons and encyclopedias (300,000 words, an ongoing project), which we set into place as a supplementary digital concordance for our work. All these enhancements led to a dramatic increase in the number of users of the dictionaries.
What? No Contributions?
But there were also disappointing moments. Despite the extensive volunteer efforts of our team to offer all this material for free with the hope of stimulating collaboration, as of 2002, despite our strongly-expressed requests for contributions from users, except for a very small handful of generous, forward-thinking scholars, we were receiving almost no contributions, despite our clear explanation that it was intended to be a collaborative project. On the other hand, we had clear numerical data and anecdotal information to confirm that this resource was now being used extensively for teaching and research by large numbers of scholars and students in our field.
We thus began to experiment with leveraging the password policy (which had originally been set up only for security purposes) to establish a two-tiered access structure of member/guest. We started out by allowing guests 50 searches a day. Users seemed quite happy with this, very few felt motivated to contribute. We then gradually decreased the number in increments until we reached the number of 10. At this point users began to scream—and we knew we had the right number. And so, we began to tell them . . . “If you want full access, you have to contribute, one way or another.” We set the bar low: For qualified scholars, one A4 (letter) page of data for two years of full access. Actually, quite small, but the aim, which has been successful, was to create a sense of being a collaborator rather than a simple “user.”
And it worked. Before long, a surprising number of highly-respected scholars began to operate in a way that they never had before, and even began to develop a sense of pride and belonging in being part of the project. We made further adjustments as we went along. For example, we responded to the suggestion made by a representative of Starr Library at Columbia University to allow institutions pay for access, setting a fee of $300 per year (an incredibly low fee, I am told by most librarians). Berkeley and UCLA soon followed, and within a year, more than twenty more institutions were subscribed. The present number of subscribing libraries is fifty-five. Thinking along the same lines, to meet the demand for non-scholars who wanted access, we offered the option of individual subscriptions at $30 per year. From this, we secured a small, but steady income that we used for creating and adding new data—mostly by paying graduate students to do input—and thus the size of the database continued to grow faster and faster, and this continues to be the case up to today.
The above series of transformations resulted in a situation where contributions, large and small, began to flood in. Most important were the massive contributions by scholars who had compiled, or were compiling their own reference materials. These consisted of either thousands of small entries or hundreds of pages of longer entries, often coupled with extensive voluntary proofreading of other entries in the dictionary. These scholars, listed in order of the approximate size of their contributions are: Ockbae Chun, Paul Swanson, Michael Radich, Jeffrey Kotyk, Griffith Foulk, John Powers, Dan Lusthaus, Gene Reeves, Seishi Karashima, Iain Sinclair, Stephen Hodge, and Ven. Ñāṇatusita. (Details of the contributions of these scholars are provided here). In addition to this, more than three hundred individuals have made contributions, the more prolific of which are listed on our credits page in approximate order of the significance of their contributions. And then there are a number of people who, when reading through the dictionaries, carefully record and report errors, an indispensable dimension of the project. 3
Based on these collaborative developments, the coverage of both dictionaries has leaped dramatically, such that the combined coverage of both compilations is presently (in July 2015) more than 104,000 entries. The DDB has become a basic reference work for the field of Buddhist studies. It is used in the teaching of courses on Buddhism, and is regularly cited in scholarly research articles and monographs. Many DDB authors are acknowledged as leading authorities in their sub-areas of Buddhist Studies.
Based on the expert programming work of Kiyonori Nagasaki of the International Institute of Digital Humanities, the DDB is also implemented in a interoperative manner with the online SAT Taishō Text Database, set up on a way wherein when one opens up a text from the online Taishō canon and selects a portion of text with one's mouse, the words in the DDB contained in that text will be displayed in the right-hand window with short definitions and links into the DDB entries themselves. Jean Soulat, a Windows specialist based in France has also created the superb parsing/lookup application called DDB Access, which allows one to parse and look up words from Sinitic texts in a powerful manner. The DDB is also integrated into Jean Soulat's web-based Smarthanzi Chinese lookup and parsing tool.
The CJKV-E Dictionary has more competition as a Web resource, where Chinese ideographic dictionaries presently abound. However, the CJKV-E is distinguished from the rest of these in its being the only online lexicon of its type that is (1) not merely a computerized aggregation, and (2) not merely a reproduction of an older print dictionary. It is being actively developed in an ongoing manner by scholars in conjunction with the reading of classical texts. Besides its inherent digital advantages, the CJKV-E dictionary already surpasses many of its hard-copy counterpart dictionaries in a number of ways. The total number of entries in July 2015 was 42,000, with 12,050 of these being single-logograph entries. As distinguished from the numerous computer-aggregated East Asian language dictionaries proliferating on the Web, each of the entries in this CJKV-E dictionary is human-edited, and usually contains far more detailed information than any other comparable lexicon, being developed while consulting a wide range of authoritative Chinese, Korean, and Japanese sources, and usually through the direct reading of primary classical texts. During the past three years, the coverage of the CJKV-E has increased by more than 10,000 entries, based on the work of an incredibly productive and reliable UTokyo PhD student named Yao Zhang.
While a number of the Japanese-oriented modern kanji dictionaries that have appeared during recent decades have been of acceptable quality in terms of precision within their respective purviews, they are, from the perspective of the classical scholar, limited in their scope and orientation to modern vocabulary, and thus are not useful to those who are doing scholarly research/translation of pre-modern han-wen texts, who need to know all of the ancient semantic implementations and readings of a particular ideograph.
There is no limit to the intended future expansion in coverage of the both works. We are interested in developing and expanding these compilations in any direction where we can receive collaboration: from any linguistic/cultural region of Buddhist or East Asian studies where scholars would like to contribute information. We have no limit on the length of articles, and we are be happy to add images and any other sort of data that is appropriate. It is our hope, in terms of reflecting the history of the Buddhist tradition, to provide as balanced and accurate account as possible.
For more detailed background material on the history and development of the DDB and CJKV-E, we have published a few papers and have made a number of conference presentations on the topic over the years, which are available through my personal publications page.
1. Soon after the completion of this framework, Michael and I were asked to submit an article to the online Journal of Digital Information. That article, entitled “Moving into XML Functionality: The Combined Digital Dictionaries of Buddhism and East Asian Literary Terms,” can be read at http://journals.tdl.org/jodi/article/view/jodi-65/82. (Journal of Digital Information: Special Issue on Chinese Collections in the Digital Library, Volume 3, issue 2, October 2002).
2. Jamie Hubbard, “Upping the Ante: firstname.lastname@example.org,” in the Journal of the International Association of Buddhist Studies, vol. 18-2, pp. 309-322 (1995).
3. A few of the more steady contributors of this category include Wolfgang Waletzki, Gene Reeves, Robert Kritzer, Dan Lusthaus, Jeffrey Kotyk, Charles Jones, Ven. Ñāṇatusita, Achim Bayer, Jimmy Yu, Charles Patton, Ockbae Chun, Michael Beddow, and Pierce Salguero.