filmov
tv
[QA] Inference-Time Scaling for Generalist Reward Modeling

Показать описание
This paper explores improving reward modeling and inference-time scalability in large language models using pointwise generative reward modeling and Self-Principled Critique Tuning, achieving enhanced performance and quality.