"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist

LessWrong (Curated & Popular)

محتوای ارائه شده توسط LessWrong. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط LessWrong یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

۲ سال پیش 4:43

MP3•خانه قسمت

In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.

The paper contained the striking plot reproduced below, which shows sycophancy

increasing dramatically with model size
while being largely independent of RLHF steps
and even showing up at 0 RLHF steps, i.e. in base models!

[...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" character agrees with the user so strongly, lacking any other information about the scenario?

At the time, I ran one of Anthropic's sycophancy evals on a set of OpenAI models, as I reported here.

I found very different results for these models:

OpenAI base models are not sycophantic (or only very slightly sycophantic).
OpenAI base models do not get more sycophantic with scale.
Some OpenAI models are sycophantic, specifically text-davinci-002 and text-davinci-003.

Source:
https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size
Narrated for LessWrong by TYPE III AUDIO.
Share feedback on this narration.
[125+ Karma Post] ✓

606 قسمت

#Tech #Society #Philosophy #LessWrong #LessWrong Curated