“Self-fulfilling Misalignment Data Might Be Poisoning Our AI Models” By TurnTrout LessWrong (Curated & Popular) podcast

Artwork

Player FM - Internet Radio Done Right

12 subscribers

اضافه شده در three سال پیش

محتوای ارائه شده توسط LessWrong. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط LessWrong یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

N

Netflix Is A Daily Joke

Netflix Is A Daily Joke podcast artwork

Netflix Is A Daily Joke podcast artwork

1
Jerry Seinfeld:A Joke About The Phrase "It Is What It Is' 1:54

۳ روز پیش1:54

پخش در آینده

پخش در آینده

دوست داشته شد

1:54

Jerry Seinfeld jokes about the phrase "it is what it is' in his Netflix special, "23 Hours To Kill".

LessWrong (Curated & Popular) « »
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

حدود یک سال پیش 1:51

اشتراک گذاری

MP3•خانه قسمت

محتوای ارائه شده توسط LessWrong. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط LessWrong یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways to test for and mitigate these potential mechanisms. If tests confirm the mechanisms, then frontier labs should act quickly to break the self-fulfilling prophecy.
Research I want to see
Each of the following experiments assumes positive signals from the previous ones:

Create a dataset and use it to measure existing models
Compare mitigations at a small scale
An industry lab running large-scale mitigations

Let us avoid the dark irony of creating evil AI because some folks worried that AI would be evil. If self-fulfilling misalignment has a strong [...]
The original text contained 1 image which was described by AI.
---
First published:
March 2nd, 2025
Source:
https://www.lesswrong.com/posts/QkEyry3Mqo8umbhoK/self-fulfilling-misalignment-data-might-be-poisoning-our-ai
---
Narrated by TYPE III AUDIO.
---

Images from the article:

undefined

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

… continue reading

528 قسمت

Artwork

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

LessWrong (Curated & Popular)

12 subscribers

published حدود یک سال پیش

اشتراک گذاری

MP3•خانه قسمت

محتوای ارائه شده توسط LessWrong. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط LessWrong یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways to test for and mitigate these potential mechanisms. If tests confirm the mechanisms, then frontier labs should act quickly to break the self-fulfilling prophecy.
Research I want to see
Each of the following experiments assumes positive signals from the previous ones:

Create a dataset and use it to measure existing models
Compare mitigations at a small scale
An industry lab running large-scale mitigations

Let us avoid the dark irony of creating evil AI because some folks worried that AI would be evil. If self-fulfilling misalignment has a strong [...]
The original text contained 1 image which was described by AI.
---
First published:
March 2nd, 2025
Source:
https://www.lesswrong.com/posts/QkEyry3Mqo8umbhoK/self-fulfilling-misalignment-data-might-be-poisoning-our-ai
---
Narrated by TYPE III AUDIO.
---

Images from the article:

undefined

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

… continue reading

528 قسمت

All episodes

×

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

به بیش از 500 موضوع گوش کنید

راهنمای مرجع سریع

پادکست های برتر

Timche podcast | پادکست اقتصادی تیمچه

طنزپردازی | tanzpardazi

نود و هشت، پنجاه و سه

GoushFilm | گوش‌فیلم

Alireza Nourizadeh

رادیو فردا

ChannelB پادکست فارسی

در حین کاوش به این نمایش گوش دهید