Artwork

محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Owain Evans - AI Situational Awareness, Out-of-Context Reasoning

2:15:46
 
اشتراک گذاری
 

Manage episode 435779364 series 2966339
محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.

In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data”, alongside some Twitter questions.

LINKS

Patreon: https://www.patreon.com/theinsideview Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Ask questions: https://twitter.com/MichaelTrazzi Owain Evans: https://twitter.com/owainevans_uk

OUTLINE

(00:00:00) Intro

(00:01:12) Owain's Agenda

(00:02:25) Defining Situational Awareness

(00:03:30) Safety Motivation

(00:04:58) Why Release A Dataset

(00:06:17) Risks From Releasing It

(00:10:03) Claude 3 on the Longform Task

(00:14:57) Needle in a Haystack

(00:19:23) Situating Prompt

(00:23:08) Deceptive Alignment Precursor

(00:30:12) Distribution Over Two Random Words

(00:34:36) Discontinuing a 01 sequence

(00:40:20) GPT-4 Base On the Longform Task

(00:46:44) Human-AI Data in GPT-4's Pretraining

(00:49:25) Are Longform Task Questions Unusual

(00:51:48) When Will Situational Awareness Saturate

(00:53:36) Safety And Governance Implications Of Saturation

(00:56:17) Evaluation Implications Of Saturation

(00:57:40) Follow-up Work On The Situational Awarenss Dataset

(01:00:04) Would Removing Chain-Of-Thought Work?

(01:02:18) Out-of-Context Reasoning: the "Connecting the Dots" paper

(01:05:15) Experimental Setup

(01:07:46) Concrete Function Example: 3x + 1

(01:11:23) Isn't It Just A Simple Mapping?

(01:17:20) Safety Motivation

(01:22:40) Out-Of-Context Reasoning Results Were Surprising

(01:24:51) The Biased Coin Task

(01:27:00) Will Out-Of-Context Resaoning Scale

(01:32:50) Checking If In-Context Learning Work

(01:34:33) Mixture-Of-Functions

(01:38:24) Infering New Architectures From ArXiv

(01:43:52) Twitter Questions

(01:44:27) How Does Owain Come Up With Ideas?

(01:49:44) How Did Owain's Background Influence His Research Style And Taste?

(01:52:06) Should AI Alignment Researchers Aim For Publication?

(01:57:01) How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?

(01:58:52) Could Owain's Research Accelerate Capabilities?

(02:08:44) How Was Owain's Work Received?

(02:13:23) Last Message

  continue reading

55 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 435779364 series 2966339
محتوای ارائه شده توسط Michaël Trazzi. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Michaël Trazzi یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.

In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs” and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data”, alongside some Twitter questions.

LINKS

Patreon: https://www.patreon.com/theinsideview Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Ask questions: https://twitter.com/MichaelTrazzi Owain Evans: https://twitter.com/owainevans_uk

OUTLINE

(00:00:00) Intro

(00:01:12) Owain's Agenda

(00:02:25) Defining Situational Awareness

(00:03:30) Safety Motivation

(00:04:58) Why Release A Dataset

(00:06:17) Risks From Releasing It

(00:10:03) Claude 3 on the Longform Task

(00:14:57) Needle in a Haystack

(00:19:23) Situating Prompt

(00:23:08) Deceptive Alignment Precursor

(00:30:12) Distribution Over Two Random Words

(00:34:36) Discontinuing a 01 sequence

(00:40:20) GPT-4 Base On the Longform Task

(00:46:44) Human-AI Data in GPT-4's Pretraining

(00:49:25) Are Longform Task Questions Unusual

(00:51:48) When Will Situational Awareness Saturate

(00:53:36) Safety And Governance Implications Of Saturation

(00:56:17) Evaluation Implications Of Saturation

(00:57:40) Follow-up Work On The Situational Awarenss Dataset

(01:00:04) Would Removing Chain-Of-Thought Work?

(01:02:18) Out-of-Context Reasoning: the "Connecting the Dots" paper

(01:05:15) Experimental Setup

(01:07:46) Concrete Function Example: 3x + 1

(01:11:23) Isn't It Just A Simple Mapping?

(01:17:20) Safety Motivation

(01:22:40) Out-Of-Context Reasoning Results Were Surprising

(01:24:51) The Biased Coin Task

(01:27:00) Will Out-Of-Context Resaoning Scale

(01:32:50) Checking If In-Context Learning Work

(01:34:33) Mixture-Of-Functions

(01:38:24) Infering New Architectures From ArXiv

(01:43:52) Twitter Questions

(01:44:27) How Does Owain Come Up With Ideas?

(01:49:44) How Did Owain's Background Influence His Research Style And Taste?

(01:52:06) Should AI Alignment Researchers Aim For Publication?

(01:57:01) How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?

(01:58:52) Could Owain's Research Accelerate Capabilities?

(02:08:44) How Was Owain's Work Received?

(02:13:23) Last Message

  continue reading

55 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع