Artwork

محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

2:16:08
 
اشتراک گذاری
 

Manage episode 418754446 series 2966339
محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

55 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 418754446 series 2966339
محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

55 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع