Refining thesaurus relational structure: implications and opportunities

Fulvio Mazzocchi and Paolo Plini

(CNR: National research council. Institute for Atmospheric pollution. Environmental knowledge organization laboratory)
paper presented at the 10th German ISKO conference : Vienna : 3-5 July 2006


The usefulness of a well-defined and well-structured domain-specific thesaurus for the management of information is acknowledged. However there is a widespread opinion that the thesaurus format -- as it is conceived by the International Standard -- doesn't completely fit the current needs required by the knowledge organization systems.

One of the main problems posed by traditional thesauri seems to be the fact that they provide a poorly differentiated set of relationships among terms, distinguishing only among hierarchical relationships, associative relationships and equivalence relationships. It has been also said that since thesaurus relationships are characterised by semantic vagueness, they are not applied consistently. This causes ambiguity in interpretation and can result in unpredictable semantic structures. Moreover thesauri are expected to be developed on the basis of a more fully concept-oriented model -- while a term-oriented model, according to this viewpoint, may promote ambiguity and incompatibility -- where concepts are considered to be independent and precede their designations.

The solution that is commonly proposed to overcome these limitations and to enable more powerful searching and intelligent information processing implies the reengineering of traditional knowledge organization systems into systems that contain domain concepts linked through a rich network of well-defined relationships and a rich set of terms identifying these concepts.

The augmentation of thesaurus relationships should ensure a stronger semantic control -- also because different relationships can hold each other in check - and open up new application possibilities for information retrieval. Their enrichment and the increased semantic clarification of the relations should, for example, enable a better semantic description of Web resources and guide a user in meaningful information discovery on the Web. Moreover it will increase the possibility of using them also for artificial intelligence applications. Traditional thesauri, in fact, were not designed for automated information processing: their semantic structure supports it in a limited way.

Thesauri need also to strengthen their role as semantic connectors, i.e. to reinforce the RTs structure in order to emphasize the weak ties, the bridges that -- by limiting the degrees of separation -- make the structure of connections of a conceptual field evident. This will be very useful also to deal with the networked and barely hierarchical information and knowledge management on the Internet.

Thesauri of the new generation seem thus to need a refinement of their structure in order to provide more powerful tools for semantic control and knowledge organization. Nevertheless other characteristics seem also to be important in their future development.

We should, in fact, be aware of how it is difficult to deal with the semantics of terms. Contemporary lexicons are highly complex and dynamic systems and they are rich in redundancies, ambiguity, polysemies and so on. The meaning of each term results from a process of cultural sedimentation and stratification. We have to provide tools able to ensure a stronger semantic control as much as possible. But we have also to use great care in not creating an excess of compulsory way or artificially compressed meanings. While applying a highly elaborated net of semantic relationships unwanted effects of this kind should be avoided. Thesauri should not consist of a rigid and static monolithic structure. It seems then reasonable to adopt a more hermeneutical attitude open to "accept", to a certain extent, the weak nature of lexicons and to maintain flexibility and openness.   

Moreover we also believe that ensuring a high modularity of these systems is another important requisite to be achieved. This should allow also other kinds of utilization by users that simply don't need such a fine distinction of the thesaurus relations and that are interested in using a simpler or more traditional version of the thesaurus relational structure.   

Of course, things are even more complex in developing multilingual thesauri. We cannot reduce it to finding equivalents for concept and terms. We should consider the cultural dimension of it too. Even though we refer to regional diversities that happen in the context of a common general culture, as the Western culture, conceptualizations may differ. In order to better reflect the point of view of each culture and language, a multilingual thesaurus may then be developed with a non-symmetrical structure. In this sense, we also need systems that are able to integrate differences and to work both at local and global level.

