13 subscribers
با برنامه Player FM !
"Sum-threshold attacks" by TsviBT
Manage episode 376623125 series 3364760
How do you affect something far away, a lot, without anyone noticing?
(Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.)
Source:
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks
Narrated for LessWrong by TYPE III AUDIO.
Share feedback on this narration.
[125+ Karma Post] ✓
562 قسمت
Manage episode 376623125 series 3364760
How do you affect something far away, a lot, without anyone noticing?
(Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.)
Source:
https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks
Narrated for LessWrong by TYPE III AUDIO.
Share feedback on this narration.
[125+ Karma Post] ✓
562 قسمت
همه قسمت ها
×
1 “Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda 11:13

1 “Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah 2:15

1 “Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck 5:19

1 “Why Do Some Language Models Fake Alignment While Others Don’t?” by abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger 11:06

1 “A deep critique of AI 2027’s bad timeline models” by titotal 1:12:32



1 “Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks 7:56




به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.