Tutorial 7

Evaluation of NLP systems

Abstract:

This tutorial will present and illustrate several issues in the evaluation of natural language processing systems. It will be structured as follows:
Part 1. Introduction
1.1 Several kinds of evaluation
1.2 Several uses of evaluation
1.3 A typology of evaluation-related concepts
  • Measures
  • Resources
  • Evaluation contests
  • Test setups
  • User involvement
  • Bibliography
1.4 Things to be aware of about evaluation generally

Part 2. NLP as seen through applications (or problems it intends to solve)

Part 3. Some basic concepts
  • precision/recall
  • comparison with human performance
  • baseline
  • gold standard
  • test suite
  • treebank

Part 4. Evaluation in several domains
  • spellcheckers
  • grammar checkers
  • machine translation
  • text alignment
  • information retrieval
  • question-answering
  • Web search
  • parsing
  • POS tagging
  • PP attachment
  • semantic discrimination
  • dictation machines
  • etc.

Part 5. Other bridges between NLP and evaluation

Part 6. Relevant references

--------

Diana Santos got her PhD at IST with a dissertation on corpus-based contrastive semantic studies in 1996, following a MsC on machine translation in 1988.

She has worked on natural language processing since 1987, when she engaged in an international machine translation project as leader of the IBM-INESC Scientific Group. She has worked as a researcher at INESC and IBM in Portugal, and at the University of Oslo and at SINTEF in Norway. In 1993, she was associate professor of NLP at IST.

Diana Santos is currently a member of the HCI group at SINTEF Telecom and Informatics, Oslo. Since 1998, she is in charge of the Computational Processing of Portuguese project (http://www.portugues.mct.pt), a project initiated by the Portuguese Ministry of Science and Technology to foster and advance language engineering of Portuguese, in which evaluation activities are an important part.

Her main interests and activity are translation, evaluation, corpus processing, Web services and semantics. She has also worked in spelling and grammar checking, parsing, and computational lexicography. See her page for publications, most of them on-line:
http://www.portugues.mct.pt/Diana/public.html