با برنامه Player FM !
LW - What is SB 1047 *for*? by Raemon
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 438415936 series 3314709
Emmett Shear asked on twitter:
I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What's the theory?
And I realized that AFAICT no one has concisely written up what the actual story for SB 1047 is supposed to be.
This is my current understanding. Other folk here may have more detailed thoughts or disagreements.
The bill isn't sufficient on it's own, but it's not regulation for regulation's sake because it's specifically a piece of the regulatory machine I'd ultimately want built.
Right now, it mostly solidifies the safety processes that existing orgs have voluntarily committed to. But, we are pretty lucky that they voluntarily committed to them, and we don't have any guarantee that they'll stick with them in the future.
For the bill to succeed, we do need to invent good, third party auditing processes that are not just a bureaucratic sham. This is an important, big scientific problem that isn't solved yet, and it's going to be a big political problem to make sure that the ones that become consensus are good instead of regulatory-captured. But, figuring that out is one of the major goals of the AI safety community right now.
The "Evals Plan" as I understand it comes in two phase:
1. Dangerous Capability Evals. We invent evals that demonstrate a model is capable of dangerous things (including manipulation/scheming/deception-y things, and "invent bioweapons" type things)
As I understand it, this is pretty tractable, although labor intensive and "difficult" in a normal, boring way.
2. Robust Safety Evals. We invent evals that demonstrate that a model capable of scheming, is nonetheless safe - either because we've proven what sort of actions it will choose to take (AI Alignment), or, we've proven that we can control it even if it is scheming (AI control). AI control is probably easier at first, although limited.
As I understand it, this is very hard, and while we're working on it it requires new breakthroughs.
The goal with SB 1047 as I understand is roughly:
First: Capability Evals trigger
By the time it triggers for the first time, we have a set of evals that are good enough to confirm "okay, this model isn't actually capable of being dangerous" (and probably the AI developers continue unobstructed.
But, when we first hit a model capable of deception, self-propagation or bioweapon development, the eval will trigger "yep, this is dangerous." And then the government will ask "okay, how do you know it's not dangerous?".
And the company will put forth some plan, or internal evaluation procedure, that (probably) sucks. And the Frontier Model Board will say "hey Attorny General, this plan sucks, here's why."
Now, the original version of SB 1047 would include the Attorney General saying "okay yeah your plan doesn't make sense, you don't get to build your model." The newer version of the plan I think basically requires additional political work at this phase.
But, the goal of this phase, is to establish "hey, we have dangerous AI, and we don't yet have the ability to reasonably demonstrate we can render it non-dangerous", and stop development of AI until companies reasonably figure out some plans that at _least_ make enough sense to government officials.
Second: Advanced Evals are invented, and get woven into law
The way I expect a company to prove their AI is safe, despite having dangerous capabilities, is for third parties to invent the a robust version of the second set of evals, and then for new AIs to pass those evals.
This requires a set of scientific and political labor, and the hope is that by the...
2437 قسمت
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 438415936 series 3314709
Emmett Shear asked on twitter:
I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What's the theory?
And I realized that AFAICT no one has concisely written up what the actual story for SB 1047 is supposed to be.
This is my current understanding. Other folk here may have more detailed thoughts or disagreements.
The bill isn't sufficient on it's own, but it's not regulation for regulation's sake because it's specifically a piece of the regulatory machine I'd ultimately want built.
Right now, it mostly solidifies the safety processes that existing orgs have voluntarily committed to. But, we are pretty lucky that they voluntarily committed to them, and we don't have any guarantee that they'll stick with them in the future.
For the bill to succeed, we do need to invent good, third party auditing processes that are not just a bureaucratic sham. This is an important, big scientific problem that isn't solved yet, and it's going to be a big political problem to make sure that the ones that become consensus are good instead of regulatory-captured. But, figuring that out is one of the major goals of the AI safety community right now.
The "Evals Plan" as I understand it comes in two phase:
1. Dangerous Capability Evals. We invent evals that demonstrate a model is capable of dangerous things (including manipulation/scheming/deception-y things, and "invent bioweapons" type things)
As I understand it, this is pretty tractable, although labor intensive and "difficult" in a normal, boring way.
2. Robust Safety Evals. We invent evals that demonstrate that a model capable of scheming, is nonetheless safe - either because we've proven what sort of actions it will choose to take (AI Alignment), or, we've proven that we can control it even if it is scheming (AI control). AI control is probably easier at first, although limited.
As I understand it, this is very hard, and while we're working on it it requires new breakthroughs.
The goal with SB 1047 as I understand is roughly:
First: Capability Evals trigger
By the time it triggers for the first time, we have a set of evals that are good enough to confirm "okay, this model isn't actually capable of being dangerous" (and probably the AI developers continue unobstructed.
But, when we first hit a model capable of deception, self-propagation or bioweapon development, the eval will trigger "yep, this is dangerous." And then the government will ask "okay, how do you know it's not dangerous?".
And the company will put forth some plan, or internal evaluation procedure, that (probably) sucks. And the Frontier Model Board will say "hey Attorny General, this plan sucks, here's why."
Now, the original version of SB 1047 would include the Attorney General saying "okay yeah your plan doesn't make sense, you don't get to build your model." The newer version of the plan I think basically requires additional political work at this phase.
But, the goal of this phase, is to establish "hey, we have dangerous AI, and we don't yet have the ability to reasonably demonstrate we can render it non-dangerous", and stop development of AI until companies reasonably figure out some plans that at _least_ make enough sense to government officials.
Second: Advanced Evals are invented, and get woven into law
The way I expect a company to prove their AI is safe, despite having dangerous capabilities, is for third parties to invent the a robust version of the second set of evals, and then for new AIs to pass those evals.
This requires a set of scientific and political labor, and the hope is that by the...
2437 قسمت
Alle Folgen
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.