filmov
tv
#217 Large Language Models are Human-like Annotators. KR 2024 tutorial Part 1
Показать описание
Mounika Marreddy, Subba Reddy Oota, Manish Gupta, Lucie Flek. Large Language Models are Human-like Annotators. 21st International Conference on Principles of Knowledge Representation and Reasoning. Nov 2-8, 2024. Hanoi, Vietnam.
Many knowledge reasoning (KR) tasks require labeled data to effectively train machine learning models. However, human annotation, which entails assigning precise labels to data for KR tasks, is both time-consuming and costly. An even more complex challenge is finding annotators who are suitable for the task, ensuring demographic diversity, and addressing biases in the annotations, all of which are essential. Additionally, continuous training is necessary to prepare annotators for challenging reasoning tasks on a daily or weekly basis. The advent of advanced language models, especially those built on Transformer architectures and pre-trained on large datasets, opens up new possibilities for solving complex math problems (Gemini Team et al. 2023), segmenting long narratives (Michelmann et al. 2023), and many more. In particular, large language models (LLMs), especially those trained with natural language instructions, have demonstrated that LLMs can serve as human-like annotators, effectively handling complex natural language understanding (NLP) and KR tasks. Previous research has specifically emphasized the effectiveness of LLMs like GPT-3 (Brown et al. 2020) in accurately annotating data for various NLP tasks, including sentiment analysis, keyword relevance, and question-answering. This approach has been further enhanced with the advent of instructiontuned LLMs (Chung et al. 2022), where the process of prompt engineering effectively conditions the model to generate labels, thereby streamlining the annotation process. Recent advancements in interactive LLMs, such as ChatGPT (OpenAI 2023), LLaMa (Touvron et al. 2023), Claude have showcased impressive results across a plethora of tasks (Chung et al. 2022; Chia et al. 2023; Ono and Morita 2024). Leveraging the capabilities of these LLMs, recent studies have begun utilizing LLMs as annotators, capable of generating labels in a zero-shot or few-shot manner. Inspired by traditional program synthesis methods and human techniques in prompt engineering, this tutorial delves into several key aspects: (i) Generating annotations for KR tasks using LLMs, (ii) Benchmarking LLM annotations, (iii) Evaluating LLM-generated annotations, (iv) Auto-label tools for annotating tasks, and (v) Overcoming hallucinations in LLM annotations and future trends. In this tutorial, our goal is not to go into deep mathematical or theoretical details of deep learning models. The tutorial will consider large language models as black boxes. Thus, rather than focusing on architectural and training details of large language models, this tutorial will focus on effective and efficient ways of using LLMs for annotation. The tutorial has been designed more from a practitioner’s perspective.
Many knowledge reasoning (KR) tasks require labeled data to effectively train machine learning models. However, human annotation, which entails assigning precise labels to data for KR tasks, is both time-consuming and costly. An even more complex challenge is finding annotators who are suitable for the task, ensuring demographic diversity, and addressing biases in the annotations, all of which are essential. Additionally, continuous training is necessary to prepare annotators for challenging reasoning tasks on a daily or weekly basis. The advent of advanced language models, especially those built on Transformer architectures and pre-trained on large datasets, opens up new possibilities for solving complex math problems (Gemini Team et al. 2023), segmenting long narratives (Michelmann et al. 2023), and many more. In particular, large language models (LLMs), especially those trained with natural language instructions, have demonstrated that LLMs can serve as human-like annotators, effectively handling complex natural language understanding (NLP) and KR tasks. Previous research has specifically emphasized the effectiveness of LLMs like GPT-3 (Brown et al. 2020) in accurately annotating data for various NLP tasks, including sentiment analysis, keyword relevance, and question-answering. This approach has been further enhanced with the advent of instructiontuned LLMs (Chung et al. 2022), where the process of prompt engineering effectively conditions the model to generate labels, thereby streamlining the annotation process. Recent advancements in interactive LLMs, such as ChatGPT (OpenAI 2023), LLaMa (Touvron et al. 2023), Claude have showcased impressive results across a plethora of tasks (Chung et al. 2022; Chia et al. 2023; Ono and Morita 2024). Leveraging the capabilities of these LLMs, recent studies have begun utilizing LLMs as annotators, capable of generating labels in a zero-shot or few-shot manner. Inspired by traditional program synthesis methods and human techniques in prompt engineering, this tutorial delves into several key aspects: (i) Generating annotations for KR tasks using LLMs, (ii) Benchmarking LLM annotations, (iii) Evaluating LLM-generated annotations, (iv) Auto-label tools for annotating tasks, and (v) Overcoming hallucinations in LLM annotations and future trends. In this tutorial, our goal is not to go into deep mathematical or theoretical details of deep learning models. The tutorial will consider large language models as black boxes. Thus, rather than focusing on architectural and training details of large language models, this tutorial will focus on effective and efficient ways of using LLMs for annotation. The tutorial has been designed more from a practitioner’s perspective.