AF - Fields that I reference when thinking about AI takeover prevention by Buck Shlegeris

The Nonlinear Library: Alignment Forum

محتوای ارائه شده توسط The Nonlinear Fund. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط The Nonlinear Fund یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

1y ago 17:03

MP3•خانه قسمت

بایگانی مجموعه ها ("فیدهای غیر فعال" status)

When? This feed was archived on October 23, 2024 10:10 (1y ago). Last successful fetch was on September 19, 2024 11:06 (1y ago)

Why? فیدهای غیر فعال status. سرورهای ما، برای یک دوره پایدار، قادر به بازیابی یک فید پادکست معتبر نبوده اند.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fields that I reference when thinking about AI takeover prevention, published by Buck Shlegeris on August 13, 2024 on The AI Alignment Forum.
Is AI takeover like a nuclear meltdown? A coup? A plane crash?
My day job is thinking about safety measures that aim to reduce catastrophic risks from AI (especially risks from egregious misalignment). The two main themes of this work are the design of such measures (what's the space of techniques we might expect to be affordable and effective) and their evaluation (how do we decide which safety measures to implement, and whether a set of measures is sufficiently robust).
I focus especially on AI control, where we assume our models are trying to subvert our safety measures and aspire to find measures that are robust anyway.
Like other AI safety researchers, I often draw inspiration from other fields that contain potential analogies. Here are some of those fields, my opinions on their strengths and weaknesses as analogies, and some of my favorite resources on them.
Robustness to insider threats
The single field that seems most closely analogous to AI control is reducing vulnerability to threats resulting from insiders, especially threats that arise from access to IT systems. That is, you have a bunch of employees who need to have access to various IT systems in order to do their jobs, and you're worried that some of these employees will abuse their access in order to accomplish some nefarious purpose.
In a bank, you might worry about your tellers withdrawing money from customer accounts inappropriately. In a software company, you might worry about your employees sabotaging your code out of spite, violating user privacy for fun, or stealing your data to give it to a competitor.
This is quite similar to the problem faced by AI developers when they're trying to control AIs (especially in internal deployments, which I argue is important here):
All of the access the AI has is access the developer decided to give it; the developer faces a constant tradeoff between the productivity of the AI (which is maximized by being permissive) and the danger it poses (which is minimized by being restrictive). And obviously, because it's an AI it only has access to resources that are connected to the computer network where it's running, so we can restrict our attention to the IT resources.
It's easy for the developer to constantly surveil the AI, which substantially reduces the space of strategies the AI can take.
The developer has a lot of control over workflows, and is able to choose them to make it harder for the AI to subtly cause problems. For example, they can restrict affordances in ways that make it harder for AIs to cause problems without being noticed. An example of a similar technique in the insider threat prevention case is the use of administrative APIs instead of direct access to a system, as described in "Building Secure and Reliable Systems" chapter 5.
That is: instead of having your engineers resolve issues by SSHing into production machines (which implicitly grants them very broad access that can lead to problems, intentional or accidental), you design an administrative API which allows the engineers to access just the affordances that usually suffice for them to do their job. Of course, they'll sometimes need to take actions that aren't available in the administrative API; this is called a breakglass mechanism.
But because breakglass actions are rare, it's affordable to apply more scrutiny to them: you might use multi-party authorization, requiring another engineer to sign off on the commands before they take effect, or you might do auditing (e.g. apparently some SRE teams review all the breakglass actions at a weekly team meeting, which allows them to spot irregularities and discourage gratuitous use of the mechanism). Similar t...

392 قسمت

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech