با برنامه Player FM !
“AI companies aren’t really using external evaluators” by Zach Stein-Perlman
Manage episode 420076539 series 3364760
New blog: AI Lab Watch. Subscribe on Substack.
Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it access to models before they are deployed. This is incorrect. METR (then called ARC Evals) did pre-deployment evaluation for GPT-4 and Claude 2 in the first half of 2023, but it seems to have had no special access since then.[1] Other model evaluators also seem to have little access before deployment.
Frontier AI labs' pre-deployment risk assessment should involve external model evals for dangerous capabilities.[2] External evals can improve a lab's risk assessment and—if the evaluator can publish its results—provide public accountability.
The evaluator should get deeper access than users will get.
- To evaluate threats from a particular deployment protocol, the evaluator should get somewhat deeper access than users will — then the evaluator's failure to elicit dangerous capabilities is stronger evidence [...]
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
May 24th, 2024
Source:
https://www.lesswrong.com/posts/WjtnvndbsHxCnFNyc/ai-companies-aren-t-really-using-external-evaluators
---
Narrated by TYPE III AUDIO.
702 قسمت
Manage episode 420076539 series 3364760
New blog: AI Lab Watch. Subscribe on Substack.
Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it access to models before they are deployed. This is incorrect. METR (then called ARC Evals) did pre-deployment evaluation for GPT-4 and Claude 2 in the first half of 2023, but it seems to have had no special access since then.[1] Other model evaluators also seem to have little access before deployment.
Frontier AI labs' pre-deployment risk assessment should involve external model evals for dangerous capabilities.[2] External evals can improve a lab's risk assessment and—if the evaluator can publish its results—provide public accountability.
The evaluator should get deeper access than users will get.
- To evaluate threats from a particular deployment protocol, the evaluator should get somewhat deeper access than users will — then the evaluator's failure to elicit dangerous capabilities is stronger evidence [...]
The original text contained 5 footnotes which were omitted from this narration.
---
First published:
May 24th, 2024
Source:
https://www.lesswrong.com/posts/WjtnvndbsHxCnFNyc/ai-companies-aren-t-really-using-external-evaluators
---
Narrated by TYPE III AUDIO.
702 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.