Indexing concepts and/or named entities
Building: Main building
Room: Hall V
Date: 2010-02-23 05:30 PM – 06:00 PM
Last modified: 2009-12-31
Abstract
[Submitted condensed paper]
The couple “concept or named entity” appears in the first IFLA’s Principles underlying subject heading languages (SHLs) (1999). The principle of the uniform heading refers to both of them in the same way. In the following principles “names of persons, places, families, corporate bodies and works” are mentioned to recommend the use of the same form used in author and title catalogues. No other difference is stated between the two elements of the couple.
Therefore, it is assumed, and actually commonly practiced in alphabetical subject indexing, that named entities have a separate treatment in authority control systems or authority files, particularly persons, corporate bodies and geographical entities. This affects only the vocabulary, the form of headings, the morphology or formal side of indexing. And we are not interested, now, in problems normally arising when a name, instead of standing alone, is incorporated in a phrase, for example the issue whether it has to maintain the inverted form or to assume the direct one (e.g. Romantic drama – Influence by Shakespeare, William, or – Influence by William Shakespeare).
The real question is: which are the differences between concepts and named entities concerning the theory and the models of subject indexing? if there are any.
In IFLA’s final report on Functional requirements for bibliographic records, FRBR (1998) the relationship ... is subject of a work (and reciprocally a work has as a subject ...) has not been developed. The entities considered for subject relationship are concept, object, event, place (group 3 entities), plus all the other entities already considered in the model for bibliographic resources (work, expression, manifestation, item, person, family, corporate body). Note that after this flawdefect, the Statement of International Cataloguing Principles, approved by IFLA in 2009, aiming to cover “all aspects of bibliographic and authority data used in library catalogues”, has to admit that “with regard to subject thesauri, there are additional principles that apply but are not yet included in this statement”. In the draft of the new international rules Resource Description and Access, RDA, substituting AACR2, published in November 2008, chapters 33-37 on indexing are empty: “to be developed after the first release of RDA in 2009”.
The lack of international agreement on subject indexing is historical. Starting from different traditions, it is not easy to agree now, but we have sound theories (e.g. S.R. Ranganathan, the Classification Research Group) and tools (ISO, NISO, BSO norms for subject analysis and thesaurus construction) at our disposal.
Probably we need to reconsider the current practices, as they are reflected in the FRBR model. The entities in an entity-relationship model are classes of individual instances, they are normally identified with a unique number or code for the functions of a database (id), so it is convenient to apply the model to bibliographic entities like works, persons, corporate bodies. Each instance of the entity work is a distinct individual work: Hamlet or King Lear, the 5th or the 9th symphony by Beethoven, and so on; never a collective entity like “tragedies” or “symphonies”. Each instance of the entity person is a real person, with a personal name; it is never a group or class of persons, like engineers, psichiatrists or madmen, and so on. Therefore only “named entities” have been accepted in the FRBR model as entities functioning as subjects: concepts, objects, events, places are exemplified with individual instances like Romanticism, Buckingham Palace, The Battle of Hastings, Bristol.
However if we browse subject headings in any subject catalogue, we found a lot of nouns in the plural (or in the singular) for common concepts, objects, events, places, as well as for works, persons, collective bodies.
Considering “concept” as a “unit of thought that can be expressed by a single term” (as in ISO 2788), using the term “theme” to refer to the entity that acts as the subject of a work, and recognizing the existence of relationships among different concepts in a theme (as it was presented in The Italian model in ISKO 2008 in Montreal): this is a good basis we have already gained to proceed a step forward towards a satisfactory model of the subject relationships.
Another step would be to analyse named entities to test the correctness of their separation or identification with concepts.
On one hand, we must consider the name for an individual entity - its proper name - only as a referent, the way to denote it without saying anything about it. Thus it lacks the features of a concept, the qualities that permits to a subject to be part of a reasoning, a speech, and the thema of a work, untill we know something about it. A proper name has no definition, while for a concept we can obtain a definition, which relates it with other concepts and gives it a particular place or role in knowledge. A proper name has only denotation and no connotative property, as in some linguistic theories. The instance relationship typically used in thesauri and subject indexing systems to connect a proper name to a term for the category of persons, or places, or things, etc. of which it is an example, seems to deny this position. But, is it a real semantic relationship (a priori, independent from circustamcies and documents), or is it only a good device to collect individuals and make easier their overall recall, without making a survey of the members of the category? Is a proper name allowed to have more then one instance relationship?
On the other hand, we must recognize that we cannot hear a famous name, a proper name, without associating it to a lot of notions about his or her life, images and activities, in the case of a person, or its position and history, in the case of a place. So we immediately connect many proper names to their referents, but also to connotations rising from our knowledge, without the necessity that anybody mentions them.
In classification systems individuals have no dedicated notation, but are classed in different ways depending on the discourse in which they are involved. So there are no class of one, and actually they could not exist, if, following the teaching of Ranganathan, the most specific class may always be subdivided. We can equally say that individuals or named entries are not considered by classifications and at the same time that they are included in classifications.
From a conceptual point of view we consider that a named entity enters the subject relationship when it is assumed as (or transformed in) a concept: what is said about it.
From a practical point of view challenging the semantic value of proper names and the place of named entities in the flow of information and in the organization of knowledge is a means to better understand and realize the indexing process, even in comparison with derivated indexing.