با برنامه Player FM !
Refusal in LLMs is an Affine Function
Manage episode 450334590 series 3524393
The paper introduces affine concept editing (ACE) for controlling language model behavior through activation manipulation, demonstrating improved precision in managing refusal responses across various prompts.
https://arxiv.org/abs//2411.09003
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1687 قسمت
Manage episode 450334590 series 3524393
The paper introduces affine concept editing (ACE) for controlling language model behavior through activation manipulation, demonstrating improved precision in managing refusal responses across various prompts.
https://arxiv.org/abs//2411.09003
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1687 قسمت
All episodes
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.