ISKO Italia. Documenti

Evaluating automated subject indexing
A framework

by Koraljka Golub

keynote speech at 7th ISKO Italy Meeting : Bologna : April 20, 2015


The presentation is available in PDF format. Download


In order to improve search for information by people, it is important to have a good idea to what degree is it possible to apply automated subject indexing or classification, based either on controlled indexing languages or on derived indexing of keywords from the resource at hand itself.

Automated subject assignment has the potential to help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim automated tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations.

The talk reviews issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval. A comprehensive evaluation framework is proposed, informed by a review of the literature on manual and automated indexing.

Three major steps are predicted.

  1. Automatically assigned terms are compared against a carefully crafted "gold standard". The "gold standard" of subject index terms is developed through input of professional catalogue librarians, end users who are experts in the subject at hand, end users who are inexperienced in the subject, as well as automated subject indexers.
  2. The evaluation takes place in the context of actual information retrieval. This step involves end users conducting actual searching on the indexed collection of resources and marking how relevant each retrieved resource is. The analysis also includes looking at what caused the retrieval of the document at hand: a cataloguer's term, subject expert's term, inexperienced user's term or an automated term.
  3. Third, the quality of computer-assisted indexing is evaluated in the context of an indexing workflow. Methodology is also enriched by log analysis and questionnaires to help contextualize the results.


Evaluating automated subject indexing : a framework / Koraljka Golub = (ISKO Italia. Documenti) – <> : 2015.03.31 - 2015.04.17 -