با برنامه Player FM !
AF - A "Bitter Lesson" Approach to Aligning AGI and ASI by Roger Dearnaley
بایگانی مجموعه ها ("فیدهای غیر فعال" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 19, 2024 11:06 ()
Why? فیدهای غیر فعال status. سرورهای ما، برای یک دوره پایدار، قادر به بازیابی یک فید پادکست معتبر نبوده اند.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 427832067 series 3337166
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A "Bitter Lesson" Approach to Aligning AGI and ASI, published by Roger Dearnaley on July 6, 2024 on The AI Alignment Forum.
TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role.
Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is.
However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should probably have an argument for why this conceptually-simple approach won't work, or won't be sufficient.
If you're not already familiar with it, you should first read Rich Sutton's excellent and influential post The Bitter Lesson. (Even if you are already familiar with it, it's a quick reread, only a page-and-a-half long, and its message is worth remembering.)
Why The Alignment Problem is Hard (In My Opinion)
We have been training LLM-based AIs off enormous web + books + video + etc datasets created by humans, which are full of a vast number of examples of human behavior. We are basically "distilling" human intelligence into these LLMs,[1] teaching them to imitate us.
In this process, they become familiar with, understand, and learn to imitate basically all aspects of human behavior - including the many problematic ones for Alignment, such as prejudice, deception, power-seeking, and criminality (and even ones like gluttony and lust that have little practical use for a non-corporal intelligence).
We humans are living beings, the products of evolution, so evolutionary psychology applies to us. While we are a social species, good at cooperating on non-zero-sum games, if you put humans in (what they perceive as) a non-iterated zero-sum situation, they will generally act selfishly for the benefit of themselves and their close genetic relatives, just as evolutionary theory would predict. So the behavioral potentials for deception, power-seeking, criminality etc.
are all inherent, evolutionarily adaptive, and thus unsurprising. This is human nature, and there are evolutionary reasons why it is this way.
Despite this, we have learned how to build a cooperating society out of humans, using social techniques and incentives such as an economy, laws, and law enforcement to encourage and productively harness cooperative human behavior and keep the bad consequences of selfish behavior under control. The results aren't perfect: things like crime, inequality, and war still happen, but they're acceptable - we've survived so far, even thrived.
By default, if we continue this LLM training process to larger-and-larger scales, and if the LLM-based approach to AI doesn't hit any major roadblocks, then some time, probably in the next few years, we will have human-level AIs - usually referred to as AGIs - who are roughly as well/badly-aligned as humans, and (at least for the base-model LLMs before any Alignment processes are applied) have a comparable-to-human propensity to cooperate on non-zero-sum games and act selfishly on non-iterated
zero-sum games.
They are not alive, and evolution doesn't apply to them directly, but they were trained to simulate our behavior, including our evolved survival strategies like selfishness. They will thus have alignment properties comparable to humans: they understand what human values, mor...
392 قسمت
بایگانی مجموعه ها ("فیدهای غیر فعال" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 19, 2024 11:06 ()
Why? فیدهای غیر فعال status. سرورهای ما، برای یک دوره پایدار، قادر به بازیابی یک فید پادکست معتبر نبوده اند.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 427832067 series 3337166
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A "Bitter Lesson" Approach to Aligning AGI and ASI, published by Roger Dearnaley on July 6, 2024 on The AI Alignment Forum.
TL;DR: I discuss the challenge of aligning AGI/ASI, and outline an extremely simple approach to aligning an LLM: train entirely on a synthetic dataset that always shows the AI acting aligned (even when the humans behave badly), and use a conditional training/inference-time technique to lock the LLM into the AI role.
Epistemic status: To me, this looks like an obvious thing to try. It's conceptually very simple: a vast amount of work is required to actually create the synthetic dataset, but the great majority of that is the sort of work that AI can assist with. I don't see any clear reason why this approach couldn't work, at least for AGI, and perhaps even for ASI, but then we don't know for sure how hard a problem Alignment is.
However, if you're proposing any solution to Alignment that's more complicated than this (and most of them are), you should probably have an argument for why this conceptually-simple approach won't work, or won't be sufficient.
If you're not already familiar with it, you should first read Rich Sutton's excellent and influential post The Bitter Lesson. (Even if you are already familiar with it, it's a quick reread, only a page-and-a-half long, and its message is worth remembering.)
Why The Alignment Problem is Hard (In My Opinion)
We have been training LLM-based AIs off enormous web + books + video + etc datasets created by humans, which are full of a vast number of examples of human behavior. We are basically "distilling" human intelligence into these LLMs,[1] teaching them to imitate us.
In this process, they become familiar with, understand, and learn to imitate basically all aspects of human behavior - including the many problematic ones for Alignment, such as prejudice, deception, power-seeking, and criminality (and even ones like gluttony and lust that have little practical use for a non-corporal intelligence).
We humans are living beings, the products of evolution, so evolutionary psychology applies to us. While we are a social species, good at cooperating on non-zero-sum games, if you put humans in (what they perceive as) a non-iterated zero-sum situation, they will generally act selfishly for the benefit of themselves and their close genetic relatives, just as evolutionary theory would predict. So the behavioral potentials for deception, power-seeking, criminality etc.
are all inherent, evolutionarily adaptive, and thus unsurprising. This is human nature, and there are evolutionary reasons why it is this way.
Despite this, we have learned how to build a cooperating society out of humans, using social techniques and incentives such as an economy, laws, and law enforcement to encourage and productively harness cooperative human behavior and keep the bad consequences of selfish behavior under control. The results aren't perfect: things like crime, inequality, and war still happen, but they're acceptable - we've survived so far, even thrived.
By default, if we continue this LLM training process to larger-and-larger scales, and if the LLM-based approach to AI doesn't hit any major roadblocks, then some time, probably in the next few years, we will have human-level AIs - usually referred to as AGIs - who are roughly as well/badly-aligned as humans, and (at least for the base-model LLMs before any Alignment processes are applied) have a comparable-to-human propensity to cooperate on non-zero-sum games and act selfishly on non-iterated
zero-sum games.
They are not alive, and evolution doesn't apply to them directly, but they were trained to simulate our behavior, including our evolved survival strategies like selfishness. They will thus have alignment properties comparable to humans: they understand what human values, mor...
392 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.