Danielle Villa 'Testing Faithfulness of Language Model-Generated Explanations' (25 Sep 2024)

Показать описание

Language models (LMs) are often prompted to explain their outputs for increased accuracy and transparency. However, evidence shows that important factors that influence LM outputs are not always included in LM-generated explanations. For this reason, measuring the faithfulness of LM-generated explanations has emerged as an important problem. Existing solutions tend to focus on global faithfulness, i.e. the general tendency of a model to produce unfaithful explanations. In contrast, this talk discusses a follow-up question generating framework for measuring local faithfulness, i.e. the faithfulness of individual explanations. Our framework consists uses a cross-examiner model, which is responsible for probing the target model's explanations via targeted follow-up questions.