Integrative Levels Classification project ILC2 schedules developing schedules how it works people references

The structure of subject classifications for document retrieval

by Brian Vickery

Originally published (2008-2009) in Brian Vickery's personal website <www.lucis.me.uk> closed after the author's death. This version is republished here in consideration of its connection to the ILC project and the León Manifesto, a comment on which developed into this paper.

Classification for what?

A document is any kind of recorded text, in whatever format and in whatever medium; any uniquely identifiable (and hence, separately searchable) part of a larger document; a textual description of a document, such as a catalogue entry or an abstract; any uniquely identifiable part of a description, such as the title field; a title or description of any non-textual item (a picture, a museum object, a video, a program, a data table, etc); and no doubt more.

A subject is a summary indication, on the one hand, of what a document is "about", and on the other hand, of what a searcher is "interested" in.

Document retrieval (by subject) is the act of finding a document on a particular subject in a collection that is held in a store - that may be a library, an archive, a database, the Internet.

How do we find documents in a store? One way is to select separately searchable portions of the text of the document (e.g. the title of a book or chapter or section, or a caption or table heading, etc, or indeed every word in the document) and to treat them as "the subject(s)" of the document: a search mechanism seeking a particular subject then in some way scans the document collection in the store to find documents containing that subject. Alternatively, externally created subject markers can be assigned and added to each document, and be sought by the search mechanism.

A subject marker is a combination of words, phrases, or other symbols (let us call each of these a "term") intended to express what the document is "about". Normally the decisions on what the document is about, and on the terms to be used to express this, are made humanly (I will not discuss machine assignation of subject markers). For a given document store, the set of terms chosen to figure in subject markers may be called its term list. For convenient use by indexer and searcher, the term list may be displayed in some ordered sequence (e.g. as a thesaurus).

A classification is a term list displayed in some kind of structured form, initially a hierarchy based on the generic, class-inclusion relation, but subsequently more complex.

This paper will discuss the form(s) of classification - and, more particularly, of general classifications of wide scope, intended for use by many document stores. The stimulus for this discussion has come from interaction with Claudio Gnoli and his work.

Traditional classification

Traditionally, general classifications for document retrieval have grouped terms first within a set of "main classes", which are held collectively to make up the "universe of knowledge" represented in the document store. These have emerged historically, and are often known as "disciplines", suggesting that each is an area of subject matter that is the object of study by some group of scholars. As new areas have become of scholarly interest, so new subjects have become candidates as main classes. No specific criteria were put forward as to what kinds of subject should be valid candidates. Ranganathan wrote that we had not yet got a clear-cut definition of a main class, so they had to be postulated. "Of course, the postulation should conform to the ideas about them conventionally current among scholars and currently figuring in educational curricula" (Ranganathan 1967, p.84) - criteria that had been proposed by Bliss. Thus, traditional classifications of wide scope usually have main classes such as Philosophy, Mathematics, Physics, Biology, Agriculture, Medicine, Education, History, Sociology, Law, Engineering, Construction, Industrial chemistry, Metals technology, Literature, Arts, Religion.

Now it is true that these are all subject areas that "currently figure in educational curricula". But that they are "subjects taught" is a secondary characteristic. Their principal characteristic is that they are human activities pursued by philosophers, physicists, agriculturists, medicos, historians, lawyers, engineers, chemists, writers, artists, and other professionals. Each subject is about "what people do and what they do it to". It begins to figure in education only when it becomes socially important enough to need a regular supply of educated entrants to the profession.

Of the total volume of documents that exist in any subject area, relatively few are concerned with educational, or even "scholarly" aspects. People produce documents about, and want to retrieve documents about, what interests them, and what interests a professional is what he does and what he does it to - e.g. a doctor is interested in such topics as diagnosis, prognosis and therapy, medical materials and equipment, the human body, its organs and diseases, hygiene and public health.

Because all such topics are the subjects of documents, they necessarily figure among the subject markers that are assigned, and thus classifications and other term lists do in practice cover this wide range of topics. It has been said "General classifications are based primarily upon subject disciplines, which are methodologies and special points of view usually, but not necessarily, focussing upon a definite set of entities or phenomena" (Coates) - the methodologies being "what people do", and the entities and phenomena being "what they do it to". The steps between the world and classification might be summarised thus:

The existence of both activities and phenomena in classifications was clearly evident when we started to group the terms within classifications into facets. Heavily influenced by my science/technology background, I put forward a set of facets as a "guideline" for those likely to be needed in the classification of a subject field (Vickery 1975). It included both phenomena (things, their parts and attributes, their interactions) and activities (operations on things, and subsequent facets). Similarly, but more briefly, Coates for the BSO found that the commonest facet pattern he used was as follows, in which the facets 1-2 relate to activities and 3-5 to phenomena.

Coates pointed out that while methodologies are generally specific to a subject area (for example, radiotherapy is not used outside medicine), entities and phenomena are not necessarily so specific (for example, the human body is of interest in many fields other than medicine), and so could be present as subject markers in many subject areas, i.e. in many main classes. He made a place in his classification (Broad System of Ordering, BSO), outside the main classes, for entities and phenomena discussed from many points of view (Coates). This raised the question, is there a case for treating "what people do" (i.e. human activities) and "what they do it to" (i.e. entities and phenomena) differently, even separately, in documentary classification?

Phenomena

A case against traditional classification has been put by Gnoli (2008). "Many scholars in bibliographic classification theory have observed that the disciplinary approach is not the only possible one, and that together with benefits (like reflecting the most frequent approach of researchers) it also brings limitations, especially concerning interdisciplinary and innovative research. Indeed, disciplines act like a canonical grid that forces the researchers to follow their conceptual structures, failing to find potentially fruitful cross-disciplinary relations between information items. Interdisciplinary classification is especially desirable in the present age of increasing cross-fertilization between knowledge fields, and of information exchange on a global scale, making impossible to foresee the identity of users that will access a particular knowledge base: today, classification should serve an international and intercultural user target."

The alternative that Gnoli puts forward is as follows: "To abandon disciplines as the primary structuring principle of knowledge organization means that now what should be organized are directly phenomena of the world (as known by us). A good classification scheme should then have phenomena as its primary subdivisions. The scheme should make its users able to express, instead of 'the objects of zoological studies', directly the concept of 'animals', without any a-priori implication of their being studied by genetics, or zoology, or veterinary medicine, or transport history. This does not mean that the approaches and methods of disciplinary study cannot be expressed in the classmarks: but they will be one optional specification among others, while priority will be given to phenomena. Disciplines, in turn, can be viewed as one kind of cultural phenomenon, and be classified themselves as an object of knowledge among the others" (Gnoli, 2008).

In a series of papers, Gnoli has argued strongly in favour of a new structure of classification, based on (entities and) phenomena rather than on traditional main classes, and he has in fact implemented a draft classification of this kind (ILC). "One suitable principle to classify phenomena independently from disciplines has been found to be the notion of integrative levels, also referred to as 'levels of organization' or (less accurately) 'levels of complexity'. These terms refer to the observation that phenomena of the world belong to different ontological levels, spanning from the material ones, to the organic, the mental, and the cultural. This notion can be found in various formulations in the writings of many philosophers and scientists. In the modern age, it is connected with the idea of a cosmic evolution, through which more and more organized entities have been formed (particles, atoms, molecules, celestial objects, cells, organisms, minds, societies, cultures)" (Gnoli, 2008).

In the draft ILC, the "main classes" are mostly entities such as are listed above, but some of them (e.g. Behaviours, Consciousness, Signals, Public health, Wealth, Wisdom) cannot easily be so considered. Gnoli comments "I acknowledge that the captions of some of my main classes don't look like entities. However, they would be intended to mean generically the corresponding entities associated with that level. Maybe the most problematic case is 'Behaviours'. This term is usually referred to processes, so in notation it should correspond to a process facet of the corresponding entities. What actually behaves are 'intelligent animals/animated agents/active subjects', but I fail to find a suitable term suggesting immediately what I'm speaking about, so I provisionally use 'behaviours'. Something similar happens with the others that you mention. Your comment is one more reason to find better terms" (private correspondence).

It appears, therefore, that by "phenomena" Gnoli means all entities of the world (physical, organic, mental and cultural), at a series of levels of organisation, and the facets associated with each. The set of facets used by Gnoli for phenomena is as follows (I have combined terms from various lists):

It is not clear how "human activities", as discussed above, fit into the schedule, either as main classes or as facets.

Commentary

Many documentary subjects relate to particular phenomena considered from a particular viewpoint, the consideration being a human activity. Gnoli complains that, in a disciplinary classification, a phenomenon is structurally "tied" to one particular viewpoint. Particular examples of the phenomenon are therefore scattered among all the viewpoints that have considered it. By making a phenomenon into a main class, all its occurrences are brought together, all the viewpoints from which it has been considered. But conversely, particular examples of each viewpoint then become scattered among all the phenomena that they have considered, even though, as Gnoli agreed above, viewpoints reflect the most frequent approach of researchers. Gnoli quotes, "concepts are not bound to disciplinary classes, but organised in classes of phenomena". But this simply frees them from one bounding structure to deliver them to another.

The solution, as Gnoli has stressed elsewhere (Gnoli 2007), is "free faceting". "Every concept has a constant notation, and can be combined with any other, by expressing the kind of relationship between them", as can be readily implemented on computer. To me, this implies that there could be two schedules, one listing phenomena in all their variety, the other listing "viewpoints" (or preferably, activities) in all their variety, so that each set of concepts has the same freedom. Each human activity can then, in principle, be applied to any phenomenon (e.g. we can sell anything). Each phenomenon can be made the object of any activity (horses can be bred, reared, sold, be subject to law, medically treated, used in polo, used by the police, painted, studied scientifically).

So overall, the Phenomena classes would list what is known to exist in the world, the entities and their characteristics. In contrast, the Activity classes would list in what ways men interact with phenomena (both natural, man-made and other people), seeking to understand them and to direct and use them to meet human needs. These human needs might be summarised as: food and nutrition, water, clothing and adornment, housing and furnishings, good health, satisfying work, leisure and recreation, sports and games, tourism and travel, good environment, artistic insight, moral support, security and personal protection, knowledge. There are also secondary needs, e.g. for processes, tools (physical and mental), machines and materials used in satisfying the primary needs. Areas of human activity to meet all these needs might be summarised as:

Some examples

I thus visualise two schedules in a classification, one for Phenomena and one for Activities (maybe more for bibliographic form, etc). I will illustrate this by considering the following example. Activities cover what chemists do and how they do it, and Phenomena the entities that they do it on, and their attributes. We may need to combine terms from both if we are to represent specific subjects.

PHENOMENA/Molecules (chemical level)

I would expects this class to contain facets such as:

ACTIVITIES/Chemical science

Facets such as:

This is probably a relatively easy example. Chemicals, in principle, are "naturally occurring" entities, studied and used by man in a variety of ways - although in practice, of course, most of the now known chemical compounds have been man-made. They and their attributes can be listed as Phenomena, and various Activity schedules would combine with them to form subjects, e.g. Chemical science, Medicine (drugs), Agriculture (e.g. pesticides). It has been recognised that any entity, such as a chemical, may crop up in many contexts, because it can play various roles or functions within different activities, for example, that of:

More problems arise with wholly man-made artefacts - where should they be listed among Phenomena? This was discussed by Coates: "One problem is simply that of individualising the great number of kinds of products which emerge from technical processes. In BSO products defined by purpose or designed for a particular purpose are classed at the end of the Technology schedule, and individualised by reference to the BSO code for the particular purpose, elsewhere in the scheme. It is necessary to emphasise 'elsewhere in the scheme' as the purpose of some products is simply to contribute to more complex technology. Such products (e.g. Switchgear) with a role internal to technology are normally enumerated within the BSO Technology schedules" (Coates).

Second, what range of Activity classes is needed to cover the human operations associated with artefacts, such as design, manufacture, testing, maintenance, use? How specific to particular groups of artefacts will be such operations and their facets? Consider the Colon class Food technology, which includes the following very specific facets (both phenomena, facets 1-5, and activities, facets 6-9):

A further level of difficulty appears when we come to mental and cultural phenomena. Here we have people who are both the phenomena and the sources of activity. For example:

PHENOMENA/People, persons

Facets such as:

(1) Personal characteristics e.g.
Male/female
Adult/child/age group
Sexual preference: heterosexual, homosexual (gay, lesbian), bisexual
Literate/illiterate
Handicapped, sick, disabled
Nationality
Ethnic, linguistic, religious, etc group
Married/single/separated/divorced/widowed
(2) Senses: e.g. sight, smell, taste, touch, hearing, pain perception, proprioception
(3) Emotions, moods, temperaments
(4) Higher mental processes, e.g. ideation, imagination, memory, thought, reasoning, calculation, learning
(5) Personal actions, e.g. walking, running, resting, swimming, diving, jumping, sleeping, feeding (eating), defecating, urinating, watching, talking, listening, hitting, kicking, weeping, laughing, singing, dancing, gesticulating, shouting, kissing, copulating, giving birth, reading, writing, manipulating (objects of every kind)
(6) Social behaviour, e.g. cooperation, competition, communication, imitation, conflict, conformity, deviance
(7) Occupational status, level (managerial, professional, associate professional, administrative and secretarial, skilled trade, personal service, sales and customer service, operative, elementary), unemployed, retired, homekeeper, student [Educl level of student, e.g. pre-school, primary, secondary, higher, postgraduate, vocational, further education] , prisoner, hospitalised, self-supporting
(8) Occupational field
(9) Needs: food and nutrition, water, clothing and adornment, housing and furnishings, health, satisfying work, leisure and recreation, good environment, moral support, security and protection, knowledge

ACTIVITIES/Behavioural science

Facets such as:

(1) Data acquisition, e.g. interview, survey, observation, participant observation, textual analysis, experiment
(2) Data analysis, e.g. statistical analysis, classification, mathematical modelling
(3) Theories, schools

People as phenomena would also be the target of activities such as Law, Welfare, Medicine, Commerce, etc.

To conclude

This paper has only been able to present a tentative proposal as to how a future general classification might be structured. The devil, of course, is in the details, and I am in no position to be able to get into them. I applaud Claudio Gnoli for his draft listing of phenomena, and hope that ideas in this paper may prove of some use in the discussions about new classification structures that are now going on.

References

Coates, E.J. Broad System of Ordering. http://www.ucl.ac.uk/fatks/bso
Gnoli, C. (2007). Classic versus freely faceted classification. http://www.iskoi.org/ilc/iskouk.ppt
Gnoli, C. (2008). Categories and facets in integrative levels. Axiomathes, 18, n. 2, p. 177-192. http://springerlink.com.
Ranganathan, S.R. (1967). Prolegomena to library classification, 3rd ed.
Vickery, B.C. (1975). Classification and indexing in science, 3rd ed.

 


The structure of subject classifications for document retrieval / Brian Vickery = (ILC) — <http://www.iskoi.org/ilc/vickery.php> : 2010.04.15 - 2011.07.20 -   [vickery.htm until 2011.07.20]
« <http://www.lucis.me.uk/classification.htm> : 2008-2009