Webology, Volume 2, Number 4, December, 2005 |
Home | Table of Contents | Titles & Subject Index | Authors Index |
A. Neelameghan
Hon. Visiting Professor, Documentation Research and Training Centre (DRTC), ISI, Bangalore 560059, India. E-mail: anm2002@vsnl.net
K.S. Raghavan
Professor, DRTC, ISI, Bangalore 560059, India. E-mail: ksragav@hotmail.com
Received November 2, 2005; Accepted December 1, 2005
Reports the progress on a project to design and develop a machine-readable multi-lingual, multi-faith thesaurus, specifically for the domain Religious Mysticism. Describes the procedure adopted for identifying the core concepts of the subject and related fields, and ensuring literary warrant for the concepts and relationships among them. Problems of variations in meaning for a concept in different cultures and languages, and alternative structuring and presentation of the schedules are discussed. The system permits selecting a term occurring in Sufi, Vedic, or English sources on mysticism, and navigation through hypertext linking to equivalent term(s) in the other two sources. Future work envisaged is briefly described.
Online Thesaurus, Mysticism, Sufism, Facet structure
Devices, such as, classification schemes, taxonomies, thesauri, ontologies, term maps, termnets, framenets, semantic maps/nets, and self-organizing maps, are used for vocabulary management in information processing, presentation, organization, search and retrieval from databases including web resources. Most of these devices implicitly or explicitly exhibit relationships (hierarchical and lateral or non-hierarchical associative relationships) among the concepts. Neelameghan and Satish (2003) noted that application of such relationships among concepts is practised in such domains as the following:
Interestingly such applications were reported in conferences and periodicals devoted to different areas such as the following:
Artificial intelligence | Memory and cognition |
Text analysis / summarization | Experimental psychology |
Linguistics | Learning |
Computational linguistics | Hypermedia |
Natural language processing | Information retrieval |
Knowledge organization | System studies |
In a comparative study of multilingual thesauri, InfoDEFT and Esser's EXPO 2000 thesauri, Jorna and Davies (2001) remarked that: ". . . multilingual tools are getting importance as increasingly diverse groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information". In the expanding globalization scenario "Conflict arises in the minds of men" through misinterpretation and misunderstanding of messages from different cultures, classes of people and linguistic groups (UNESCO) and it is therefore important to device means, methods and tools for improving inter-cultural and inter-faith exchange of ideas. Jorna and Davies also mention the problems of developing a vocabulary tool such as a multilingual thesaurus for different user groups. See also the review by Nielsen (2004).
This paper reports on the progress in a project to design and develop a machine-readable online thesaurus for the field of religious mysticism (or mysticism in world religions) that may facilitate inter-faith inter-cultural communication. Currently terms (mostly of Persian / Farsi origin) occurring in Sufi sources (printed and web-based) are listed together with their corresponding equivalents and near-equivalents in English and, where available, in Sanskrit from Vedic sources. The terms from Sufi and Vedic sources are transliterated into Roman script. For each Descriptor, a Scope Note (SN), Broader Terms (BT), Narrower Terms (NT) and Related Terms (RT) are given. USE and UF (Used For) cross-references are given where necessary. In the course of actual preparation of the thesaurus some problems relevant to the design and development of multilingual thesauri in general were identified. We also examine some of these issues. To facilitate searching the alphabetical index transliterated Sufi terms beginning with a diacritical mark are cross referred: For example: 'Ajz USE Ajz; 'ilm ilhaami USE ilm ilhaami.
In building a thesaurus for a subject-field, to begin with, it is helpful to identify the core area and the related fields. This can
Such work is facilitated by using dictionaries and glossaries, encyclopedic essays, books, papers etc. dealing with the subject. Existing thesauri covering the field are useful. The methodology for study of subjects developed at the Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, India, can also be adopted (Bhattacharyya, 1975).
"Mysticism is concerned with the nature of Reality, the individual's struggle to attain a clear vision of Reality, and the transformation of consciousness that accompanies such vision." (Mysticism in world religion, Platt, see Appendix 1). The major religions of the world - Hinduism. Buddhism, Taoism, Judaism, Christianity, and Islam - have common elements in their mystical traditions: the movement and progress of the individual / seeker from the unreal to the realization of the Real (Bhashyananda, 2003; Meher Baba, 2000, 2001; Wilber, 2002), even though there may be differences in the prescriptions for reaching the goal in the different religions (Ashokananda, 2001). The common elements of the traditions in the path of the movement toward the Real may be broadly categorized as follows:
These categories of ideas (and their inter-relationships) may also be mapped into the framework of S.R. Ranganathan's generalized facet structure for a subject - Context Specifying element (BASIC FACET); PERSONALITY FACET; MATTER MATERIAL FACET; MATTER PROPERTY FACET; ACTION or ENERGY FACET; SPACE FACET; TIME FACET; and SPECIATOR or QUALIFIER applicable to any of the components. Any element can be a qualifier to any other element; and a concept may have two or more qualifiers simultaneously. An example of concepts in Mysticism organized according to Ranganathan's Generalized Facet structure is given below:
Qualifier / Speciator: By religion: | |
Mysticism-Hinduism --Vedanta | |
Mysticism-Judaism | |
Mysticism-Buddhism | |
Mysticism-Islam (Sufism) | |
Mysticism-Christianity |
Core Entity of Study or Personality (P) Elements With Qualifiers |
Property (MP) Elements With Qualifiers |
Action /Energy (E) Elements With Qualifiers |
---|---|---|
Ultimate Reality (God, Allah, Paramatma, etc.) Deities Avatars Realized / Liberated souls Saints Alwars and Nayanmars
Seeker (Individual, Devotee) (By Religion)
. . . . . . . . . |
(Attributes Relating to the Ultimate Reality):
|
Movement / Journey toward
the Reality (By Modality)
|
Space Facet (S) elements | |
---|---|
On earth | |
In heaven | |
In hell | |
. . . . . . . . . |
Time Facet (T) elements | |
---|---|
Past | |
Present | |
Future | |
. . . . . . . . . | |
Spring | |
Summer | |
Fall / Autumn | |
Winter | |
. . . . . . . . . |
Given this commonality of elements in the different mystical traditions and a framework for analysis of the concepts, it is possible to develop vocabulary tools that facilitate inter-faith communication.
Building a multilingual thesaurus for a culture-specific domain raises several issues. Most of these are related to the nature of the respective culture-specific domains. Concepts encountered in and associated with Human Sciences (Humanities) in general and culture-specific domains in particular are abstract in nature and rarely can we relate these concepts to concrete referents. Secondly, a large number of concepts encountered in culture-specific domains are those that have some meaning in the life of the members of the community belonging to the culture. These have implications for a multilingual thesaurus. A language is a product of, and reflects the culture of the particular community(ies). In other words, it is the culture and lifestyle prevalent among the members of a particular community that necessitates and results in the formation of lexemes / expressions (words / terms) for concepts associated with that culture and lifestyle. Given this, it is highly likely that, unless the communities that speak two different languages share the same culture, certain concepts in culture-specific domains may have verbal expressions only in a particular language. In building a multilingual thesaurus therefore, the focus should often be on, finding near-equivalent concepts / ideas in the languages of the other cultures for a given concept in a particular language. This problem is unique to humanities, a situation that normally does not arise in the physical and life sciences. By and large in these domains, communities irrespective of their geographical location speak about the same concepts and ideas. The concepts / ideas in these domains are more universal than in the human sciences.
The construction of a thesaurus and more so a multilingual one is an abstract process. There are three general approaches to building a multilingual thesaurus (Hudon, 1997; IFLA, 2005; ISO, 1985; Landry, 2004; Nielsen, 2004):
The second and third approaches pre-suppose the existence of a thesaurus in the domain under consideration. In this project, therefore, the only option available was the first one. As mentioned earlier, one of our principal objectives is to facilitate inter-faith communication, i.e. communication across different religions and cultures. Early in our work we realized the usefulness of having terms from one of the sources as the base particularly so in culture-specific domains where one often encounters concepts unique to a particular culture and therefore to the language widely used by the people belonging to that culture. In such a situation it was noted that starting simultaneously with all the languages of a multilingual thesaurus might, in effect, amount to building a multi-domain thesaurus making it difficult to focus on the specific domain. Therefore, in building this multi-lingual multi-faith thesaurus it was important to have adequate control over the scope of the thesaurus. We started with Sufi terminology as the base and proceeded to identify corresponding concepts in English and Vedic writings. This way the domain was reasonably bounded (namely, Sufi Mysticism) even though terms from different faiths (religions) are linked in the thesaurus. Once this was decided other issues such as source of candidate concepts, semantic and structural issues related to the thesaurus became more focused.
A personal (A. Neelameghan's) collection of over 450 books on religion, spirituality and mysticism was extensively used for identifying candidate concepts. Some of these books were glossaries and some others included a glossary of terms in spirituality, religion and mysticism. There are also some useful Web sources. A few examples are listed in Appendix 1.
An earlier paper described an integrated, interlinked multimedia set of databases - OM Database Service - to support studies in the spiritual / religious domains (Rajashekar, Ravi, Neelameghan, 1998).
The databases of the OM Database Service include:
UNESCO's CDS-ISIS and then WINISIS software have been used in developing these databases and for inter-linking among the records.
Example of a record from OM02 Database:
CONCEPT/S: EMANATIONS |
TEXT: A particular emanation possesses a particular degree of spiritual perfection. The word "Allah", for example, cannot be used for the emanation at the plane of sensible things... The first degree of God's perfection consists of His transcending Himself. "In the first degree, He is unmanifested and unconditioned and exempt from all limitation or relation. " Here He is beyond all categories and attributes. He is beyond human thinking and transcends all the ways of description." His first characteristic is the lack of all characteristics, and the last result of the attempt to know Him is stupefaction ('hairaani'). The second degree of perfection lies in the emanation of God's 'active', 'necessary', 'divine', 'passive', 'contingent' and 'mundane' aspects. This is the stage of the First Emanation (ta'ayyun-i awwal)(taayyun-i awwal) or the Universal Reason ('aql-i kull') ('aql-i kull'). The third degree of God's perfection consists in His active and efficient phases. It is the 'Unity of the Whole Aggregate.' It can be called the Second Emanation or Divinity ('Ilaahiyat'). The fourth degree of His perfection is contained in the detailed expression of the Second Emanation. It is exposed in various names and forms. This is the plane of the Third Emanation or Necessary Being (wujud). The fifth degree of His perfection lies in 'passivity' or the quality of receiving impressions. This is again a 'Unity of the Whole Aggregate.' It is the Fourth Emanation or 'Mundane Existence' and 'Contingency.' The sixth degree of perfection lies in the detailed manifestation of the Fourth Emanation. It is the stage of the Fifth Emanation or the Sensible World ('aalam'). The last two emanations are the outward aspects of the intelligible world belonging to Contingency. |
SOURCE : Abdu'l Rahmaan Jaami. Abdur Rahmaan Jaami. |
CONTEXT : Unitism and pantheism |
NOTES : Islam, Mysticism, Sufism |
GLOSSARY | ||
CONCEPT | : | Hairaani |
MEANING | : | Stupefaction |
OCCURS IN | : | Writings on Islam, Sufism |
CONTEXT | : | Emanations |
LANGUAGE | : | Arabic / Persian |
REC. NO. | : | 4700 |
GLOSSARY | ||
CONCEPT | : | Ta'ayyun-i awwal |
MEANING | : | First stage of the First Emanation |
OCCURS IN | : | Writings on Islam, Sufism |
CONTEXT | : | Emanations |
LANGUAGE | : | Arabic / Persian |
ALTER. TERM | : | Taayyun-i awwal |
REC. NO. | : | 4701 |
GLOSSARY | ||
CONCEPT | : | Wujud |
MEANING | : | Existence. Third Emanation or Necessary Being |
OCCURS IN | : | Writings on Islam, Sufism |
LANGUAGE | : | Arabic / Persian |
ALTER. TERM | : | Arif-e-wujud |
REC. NO. | : | 214 |
It may be noted in the above example that the Sufi terms (in Arabic / Persian transliterated into Roman script) are automatically linked to the corresponding terms in the GLOS database (See below).
The GLOS (Glossary) database is a component of the OM Database service. It was the main source of terms for the thesaurus. The fields and structure of a record in the GLOS database is shown below. The Fields of the database are:
Tag | Field Name | |
---|---|---|
1 | Term | |
2 | Definition | |
3 | Occurs in | |
6 | Context | |
7 | Reference | |
8 | Notes | |
10 | Orig. lang. | |
11 | Alter. Term | |
12 | Cross ref. |
Example of data entry:
1 Term (1) | Qalandar | |
2 Definition (2) | A dervish who does not recognize outward mystical form | |
or convention. | ||
3 Occurs in (3) | Writings on Islam; Sufism | |
7 Reference (7) | BHATGLOS | |
10 Orig. lang. (10) | Arabic / Persian |
Example of display:
CONCEPT | : | Qalandar |
MEANING | : | A dervish who does not recognize outward mystical form or convention. |
OCCURS IN | : | Writings on Islam; Sufism |
LANGUAGE | : | Arabic / Persian |
REC. NO. | : | 2 |
SOURCE | : | Bhatnagar, R.S. Dimensions of classical Sufi thought. Delhi: New Age |
Books; 1984; 1992. |
The structure for the thesaurus database (F-THES) was designed, again using WINISIS 1.5. The fields included are:
Tag | Field Name | Subfield |
---|---|---|
1 | Descriptor | |
2 | SN | |
3 | US | |
4 | UF | |
5 | BT | |
6 | NT | |
7 | RT | a |
90 | BS | |
91 | Type | |
92 | IN | |
93 | CN | |
99 | TOP term | |
100 | Remarks | |
900 | Links | abcde |
Fields 1 to 7, and 100 are usually found in most thesauri. Fields 90 to 93 are included for data input for future research. Field 900 is for the code for the hypertext links, e.g. links to corresponding terms in the GLOS database (see example above).
To begin with, the terms in the GLOS database were exported to the F-THES database using an appropriate Field Select Table (FST). The conversion FST (CONX.FST) has the following structure:
Tag | IT | Data extraction format |
---|---|---|
1 | 0 | v1 |
2 | 0 | mhl,(v2/); (v3/);(v6/);(v8/);(v10/) |
3 | 0 | mhl,(v11/) |
7 | 0 | mhl,(v12/) |
100 | 0 | mhl,(v7/) |
After downloading the records from the GLOS database, duplicate entries were eliminated, other corrections were carried out, and new records added to the F-THES database. The thesaurus database currently contains records for over 7500 descriptors, one record for one descriptor. As already mentioned, the Sufi, Vedic, and Pali terms are transliterated into Roman script. About 30 percent of the descriptors are Sufi Descriptors; another 30 percent are English Descriptors mostly corresponding to the Sufi terms; about 30 percent are Sanskrit Descriptors occurring in Vedic sources corresponding to the Sufi terms; and about 8 percent are Pali and Japanese terms occurring in works on Buddhism including Zen Buddhism. In this paper we do not deal with Pali and Japanese terms. The BT, NT, and RT terms for a descriptor in a particular language is hyper-linked to equivalent / near-equivalent term(s) in the same language and in the other two languages to facilitate surfing. The objective is to enable the user to search a concept using the English, Sanskrit or Arabic / Persian term and get linked to all related concepts / terms in the same language or the other two languages. An example is given in Fig. 1.
QALANDAR (S)
|
Each of the underlined terms is hyper-linked to the corresponding thesaurus record. For example, clicking on the term 'Dargah (S)' we get a schedule of terms related to it, and so on. Initially the terms in Arabic / Persian, English, and Vedic Sanskrit were arranged in a single alphabetical sequence. However, we encountered problems with this kind of display:
The first improvement in presentation was to group the terms listed under a descriptor by the source - Sufi (S), Vedic (V), English (E) - as shown in Fig. 2.
After some experimentation and discussions it was decided that for a descriptor say from Sufi writings, its Sufi BTs, NTs, and RTs, be enumerated, and only the Equivalent or Near-equivalent term (to the descriptor) in English and Vedic sources be listed in that schedule. Thus the schedule given in Fig. 1 now appears as shown in Fig. 2.
In this the RTs are limited to related terms from the same source, that is, Sufi terms in this particular example. The Equivalent / Near-equivalent terms corresponding to the descriptor from other sources - in English and Vedic sources - in this case, are shown as EE (Equivalent / near equivalent English Term), ES (Equivalent / near equivalent Sufi Term), EV (Equivalent / near equivalent Vedic Sanskrit Term) in the multi-faith thesaurus. Clicking on Dervish (E) one gets linked to the thesaurus record for the term showing BT, NT, and RTs to it. So is the case with Sannyaasin (V). This allows navigating back and forth.
QALANDAR (S)
|
DERVISH (E)
|
SANNYAASIN (V)
|
The scope note (SN) for the descriptor term explains and indicates the degree of correspondence (exact equivalence or near-equivalence) between a term and its corresponding terms from the other sources. It may be useful to look up the related original sources mentioned in Appendix 1.
The experience in working on this multilingual thesaurus has helped in identifying some of the semantic issues relevant to multi-lingual thesaurus in general, and more particularly, multilingual thesaurus in culture-specific domains. Traditionally semantic relations in a thesaurus have been grouped under three broad categories of relations, namely, Equivalence Relations, Hierarchical relations and Lateral (Non-Hierarchical Associative) Relations. Hierarchical relations are not discussed in this paper. Equivalence Relations include both intra-language equivalence and inter-language equivalence. It has already been pointed out that it is difficult to find exact inter-language equivalence in culture-specific domains. In the present context, for many Sufi terms in the original, exact equivalent terms in Vedic Sanskrit and English were not available. As such in most cases near-equivalent terms had to be used. In practice it has to be a combination of the above two and the decision should be based on the degree of correspondence between the two near-equivalent terms in the two different sources. This approach has been adopted here as mentioned in the preceding section.
Lateral Relations (non-hierarchical associative relations) among concepts is another important issue. Neelameghan has discussed lateral relations in the spiritual domain, more particularly in the context of inter-cultural, inter-faith communication (Neelameghan, 2001). It is important to explore the possibility of developing guidelines for lateral relations in thesauri and other vocabulary control devices. The general guideline suggested by Soergel (1974) for identifying and defining lateral (Associative) relations in thesaurus construction is:
"Concept A is related to Concept B if the following holds: an indexer or searcher weighing the use of A should be reminded of the existence of B and there is no hierarchical relationship between A and B."
The major problem in adopting such a broad guideline for the construction of a thesaurus is that it often leads to inconsistency in linking to laterally related terms in a thesaurus.
What should be the scope and nature of 'Related Terms'? In the 1970s a typology of RT relationships was developed (Neelameghan and Ravichandra Rao, 1973; Neelameghan and Maitra, 1975). The work has had good acceptance among thesaurus builders and in term-relationship studies. More recently the typology of lateral relations has been expanded taking into consideration application of the typology in other disciplines as well (Neelameghan and Raghavan, 2005). An attempt is being made to map the RTs in the present thesaurus to this schema of lateral relations (See Appendix 2). The idea is to examine whether a minimal set of lateral relations can be identified and defined that is necessary and sufficient to represent semantic structures and relations across several domains (See also: Hudon, 2001; Milstead, 2001; Molholt, 2001).
If we do recognize and categorize different types of RT relations, an issue that arises is: 'Should the display in a thesaurus indicate the nature (type) of relation among the RT terms to a descriptor?' This will also have a bearing on the sequence of RTs displayed for it may now be possible to group the RTs according to the category they belong.
Multi-lingual, multi-cultural thesauri are gaining importance as tools for knowledge organization as people with different cultural and linguistic background continue to publish and seek information. There have been suggestions that presentation and structure of thesaurus should be language independent and efforts be made to adopt a semantic classificatory structure especially in a multilingual thesaurus. As has been shown in the preliminary sections of this paper the General Theory of Classification of S.R. Ranganathan does offer such a framework.
Navigation between terms of Latin origin and those of non-Latin origin and using different scripts (e.g. Arabic, Sanskrit, Pali, Yiddish) raise additional problems.
An important issue that has a bearing on the design of tools for knowledge organization including multilingual thesauri is that related to the uses to which such tools can be put in the emerging information environment. It has been suggested earlier that multilingual thesauri in culture-specific domains facilitate inter-cultural communication and comparative studies. In the context of the changing information landscape and growing importance of distributed digital resources on the Web, it is possible that tools for knowledge organization could be put to use in organizing, indexing and searching such resources.
Traditionally the focus in research in the area of developing tools for Knowledge Organization such as thesauri and schemes of classification has been on the construction of such tools. In view of what has been discussed above it may be useful, while designing such tools, to focus on the use and possible applications of such tools. For example, the structure of a multilingual thesaurus may have a bearing on the design of search interfaces.
Term 1 | Term 2 | Relation Type |
---|---|---|
Qalandar | Dargah | Structural / Functional Relation (8) |
Qalandar | Faqih; Faqir; Rahib; Sayaa; Zahid | Affiliation Relation (8) |
Qalandar | Zuhd | Entity - Property (17) |
Al-Qalb | Fuad | Near Equivalent Terms (13) |
Qand't | Itminan; Qarar; Radiya; Rida | Near Equivalent Terms (13) |
Al-Qayyum | Abad; Al-Qadim; Al-Qidam | Near Equivalent Terms (13) |
Al-Qayyum | Fana; Fana Baqa | Entity - Process / Entity - Attribute (?) |
Qudsi | Aqdasi | BT-NT (?) Relation |
Qurb | Qurb - I -Faraid; Qurb - I - nawafi | BT-NT (genus-species) |
Rabb | Rabbani; Rahbur-ul-'alum; Rububiyyat | Near Equivalent (13) |
Rahib | Dargah | Affiliation Relation (8) |
Rahib | Faqih; Faqir; Rahib; Sayaa; Zahid | Near Equivalent (13) |
Rahib | Zuhd | Entity - Property |
Al-Rahman | Rahmaniyya | Entity - Property |
Ribat | Dargah | Near Equivalent (13) |
Rida | Itminan; Qarar; Radiya; Quand't | Near Equivalent (13) |
Riya | Nafaq | Near Equivalent (13) |
Rububiyyat | Rabbani; Rahbur-ul-'alum; Rabb | |
An-Nafs | Ar-Ruh; Jan; Ruh; Ruh-Allah; Ruhu'l -'Alam; Ruhu'l -Azam; Sirr | Near Equivalent (13) |
Ruhu'l -Azam | Ar-Ruh; Jan; Ruh; Ruh-Allah; Ruhu'l -'Alam; An-Nafs; Sirr | Near Equivalent (13) |
Term 1 | Term 2 | Relation Type |
---|---|---|
SANNYAASIN (V) | Rishi (V) | |
Sannyaasa (V) | ||
Sannyaasa aashrama (V) | ||
Sannyaasini (V) |