ISKO Italia. Documenti

Inside semantic Web search engines

Semantic annotation and Natural Language Processing (NLP)

by Mela Bosch


relazione presentata al 4' incontro ISKO Italia : Torino : 3 aprile 2009

Mela Bosch


La presentazione completa è disponibile in formato PPS [1 MB]. Scarica


Web search engines are constantly being developed in order to answer to user needs. This development process focuses not only on lexical pattern matching, but also on processing the sense of the query.

There are two ways of doing this. The first is to extract content through Natural Language Processing (NLP); the second is to assign semantic descriptors from controlled languages. Therefore, the technological options available are either free text analysis, or semantic annotation. In the first case human interaction is essential; in the second one the quality of semantic retrieval by means of NLP is still under discussion.

Although these solutions represent contrasting positions in the traditional debate on this matter, these methodologies are now mixing. In fact, semantic Web search engines need many pages to be annotated (which requires an enormous effort), so NLP represents an important help for automatic or semi-automatic annotation. At the same time, the precision of text analysis can be optimized by techniques of assignment applied by users and professionals.

In conclusion, the trend is the development of collective knowledge systems that improve as more people participate, as they are based on human contributions. All of this will possibly be integrated by chunking, clustering, parsing, spell-checker and other NLP algorithms.


Los buscadores en Internet se estan desarrollando para ofrecer a los usuarios no solo la respuesta mecanica a sus consultas sino tambien para procesar el sentido, hay dos caminos que se estan explorando y que reviven la discusion tradicional de tecnicas analisis semantico: la extraccion del contenido por medio de NLP o la asignacion de indicadores semanticos pertenecientes a lenguajes controlados. Asi hoy el debate parece estar entre: anotacion semantica o free text analysis.

El primer caso requiere la interaccion humana y la segunda no, pero los limites se estan haciendo difusos porque porque para que los buscadores semanticos se difundan es necesario que haya muchas paginas con anotacion semantica, pero realizarlo aun en forma semiautomatica requiere una extraordinaria carga de trabajo, entonces para realizar la anotación y para cuidar la coherencia de la anotación respecto al contenido y se vuelve a pensar en la tecnicas de NLP. Complementariamente se estan explorando nuevas posibilidades para que los usuarios activamente realicen anotaciones semanticas por un lado y por otro normalicen el vocabulario en uso para favorecer la calidad de la recuperacion de los algoritmos automaticos de NLP.

En resumen nos encontramos en un momento donde es necesario pensar en collective knowledge systems, which are able to provide useful information based on human contributions and which get better as more people participate.


Mela Bosch is currently working as an independent document management and professional Spanish language consultant in Milan. She is a member of the research project on Integrative Level Classification of ISKO Italia. She is also among the E-learning staff at the Journalism Department, La Plata National University, Argentina. Her educational background includes training in Linguistics and Software Engineering in Argentina, E-Learning in Milan and Document Management in Barcelona. She has worked as an ordinary professor at the Library Department, La Plata National University, Argentina. Articles and papers can be found at E-LIS.


Inside semantic Web search engines : semantic annotation and Natural Language Processing (NLP) / Mela Bosch = (ISKO Italia. Documenti) – <> : 2009.03.10 - 2009.04.06 -