International Conference on AI in Work, Innovation, Productivity and Skills

Benchmarks and competitions: How do they help us evaluate AI?

Feb 22, 2022 | 12:45 PM - 1:45 PM


Along with the constant development of AI, evaluating what systems can and cannot do has become a challenging necessity for understanding AI’s impact on our societies and guiding future policies. The session will describe some of the ways that computer scientists have evaluated AI systems. It will first consider competitions and benchmarks that have been used in the field, including the well-known Turing Test, work on games such as chess and Go, as well as more specialised datasets. The session will then discuss the more formal evaluation campaigns of the United States National Institute of Standards and Technology (NIST) and the French Laboratoire National de Métrologie et d’Essais (LNE). The speakers will discuss the insights and limitations of these different ways of evaluating AI. Moderator: José Hernandez-Orallo, Professor, Universitat Politècnica de València, Spain; Senior Research Fellow. Leverhulme Centre for the Future of Intelligence, University of Cambridge, UK Panellists: • Anthony Cohn, Professor of Automated Reasoning, University of Leeds • Guillaume Avrin, Manager, LNE • Lucy Cheke, Lecturer, Department of Psychology, University of Cambridge The live session time above reflects your computer's local time zone. The session will be recorded and available on replay the day after the live stream.

Presented by

More sessions of interest