AF - Epistemic states as a potential benign prior by Tamsin Leake

The Nonlinear Library: Alignment Forum

محتوای ارائه شده توسط The Nonlinear Fund. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط The Nonlinear Fund یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

1y ago 13:38

MP3•خانه قسمت

بایگانی مجموعه ها ("فیدهای غیر فعال" status)

When? This feed was archived on October 23, 2024 10:10 (1y ago). Last successful fetch was on September 19, 2024 11:06 (1y ago)

Why? فیدهای غیر فعال status. سرورهای ما، برای یک دوره پایدار، قادر به بازیابی یک فید پادکست معتبر نبوده اند.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Epistemic states as a potential benign prior, published by Tamsin Leake on August 31, 2024 on The AI Alignment Forum.
Malignancy in the prior seems like a strong crux of the goal-design part of alignment to me. Whether your prior is going to be used to model:
processes in the multiverse containing a specific "beacon" bitstring,
processes in the multiverse containing the AI,
processes which would output all of my blog, so I can make it output more for me,
processes which match an AI chatbot's hypotheses about what it's talking with,
then you have to sample hypotheses from somewhere; and typically, we want to use either solomonoff induction or time-penalized versions of it such as levin search (penalized by log of runtime) or what QACI uses (penalized by runtime, but with quantum computation available in some cases), or the implicit prior of neural networks (large sequences of multiplying by a matrix, adding a vector, and ReLU, often with a penalty related to how many non-zero weights are used).
And the solomonoff prior is famously malign.
(Alternatively, you could have knightian uncertainty about parts of your prior that aren't nailed down enough, and then do maximin over your knightian uncertainty (like in infra-bayesianism), but then you're not guaranteed that your AI gets anywhere at all; its knightian uncertainty might remain so immense that the AI keeps picking the null action all the time because some of its knightian hypotheses still say that anything else is a bad idea.
Note: I might be greatly misunderstanding knightian uncertainty!)
(It does seem plausible that doing geometric expectation over hypotheses in the prior helps "smooth things over" in some way, but I don't think this particularly removes the weight of malign hypotheses in the prior? It just allocates their steering power in a different way, which might make things less bad, but it sounds difficult to quantify.)
It does feel to me like we do want a prior for the AI to do expected value calculations over, either for prediction or for utility maximization (or quantilization or whatever).
One helpful aspect of prior-distribution-design is that, in many cases, I don't think the prior needs to contain the true hypothesis. For example, if the problem that we're using a prior for is to model
processes which match an AI chatbot's hypotheses about what it's talking with
then we don't need the AI's prior to contain a process which behaves just like the human user it's interacting with; rather, we just need the AI's prior to contain a hypothesis which:
is accurate enough to match observations.
is accurate enough to capture the fact that the user (if we pick a good user) implements the kind of decision theory that lets us rely on them pointing back to the actual real physical user when they get empowered - i.e. in CEV(user-hypothesis), user-hypothesis builds and then runs CEV(physical-user), because that's what the user would do in such a situation.
Let's call this second criterion "cooperating back to the real user".
So we need a prior which:
Has at least some mass on hypotheses which
correspond to observations
cooperate back to the real user
and can eventually be found by the AI, given enough evidence (enough chatting with the user)
Call this the "aligned hypothesis".
Before it narrows down hypothesis space to mostly just aligned hypotheses, doesn't give enough weight to demonic hypothesis which output whichever predictions cause the AI to brainhack its physical user, or escape using rowhammer-type hardware vulnerabilities, or other failures like that.
Formalizing the chatbot model
First, I'll formalize this chatbot model. Let's say we have a magical inner-aligned "soft" math-oracle:
Which, given a "scoring" mathematical function from a non-empty set
a to real numbers (not necessarily one that is tractably ...

392 قسمت

#The Nonlinear Fund #Podcasting Education #Of TexttoSpeech