با برنامه Player FM !
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
Manage episode 376038108 series 3364760
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.
The paper contained the striking plot reproduced below, which shows sycophancy
- increasing dramatically with model size
- while being largely independent of RLHF steps
- and even showing up at 0 RLHF steps, i.e. in base models!
[...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?
At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.
I found very different results for these models:
- OpenAI base models are not sycophantic (or only very slightly sycophantic).
- OpenAI base models do not get more sycophantic with scale.
- Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.
Source:
https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size
Narrated for LessWrong by TYPE III AUDIO.
Share feedback on this narration.
[125+ Karma Post] ✓
543 قسمت
Manage episode 376038108 series 3364760
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.
The paper contained the striking plot reproduced below, which shows sycophancy
- increasing dramatically with model size
- while being largely independent of RLHF steps
- and even showing up at 0 RLHF steps, i.e. in base models!
[...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?
At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.
I found very different results for these models:
- OpenAI base models are not sycophantic (or only very slightly sycophantic).
- OpenAI base models do not get more sycophantic with scale.
- Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.
Source:
https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size
Narrated for LessWrong by TYPE III AUDIO.
Share feedback on this narration.
[125+ Karma Post] ✓
543 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.