13 subscribers
با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده


"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
Manage episode 424655360 series 3364760
https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is
Despite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing.[1]
The main body contains our understanding of what everyone is doing in technical alignment and why, as well as at least one of our opinions on each approach. We include supplements visualizing differences between approaches and Thomas’s big picture view on alignment. The opinions written are Thomas and Eli’s independent impressions, many of which have low resilience. Our all-things-considered views are significantly more uncertain.
This post was mostly written while Thomas was participating in the 2022 iteration SERI MATS program, under mentor John Wentworth. Thomas benefited immensely from conversations with other SERI MATS participants, John Wentworth, as well as many others who I met this summer.
فصل ها
1. "(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland (00:00:00)
2. Podcast note (00:00:21)
3. (Text resumes) (00:01:39)
4. Introduction (00:02:18)
5. A summary of our understanding of each approach: (00:04:18)
6. (Text Resumes) Previous related overviews include: (00:11:05)
7. Main Article Text (00:11:43)
8. Aligned AI / Stuart Armstrong (00:11:45)
9. Alignment Research Center (ARC) (00:14:48)
10. Eliciting Latent Knowledge / Paul Christiano (00:14:54)
11. Evaluating LM power-seeking / Beth Barnes (00:18:38)
12. Anthropic (00:21:06)
13. Interpretability (00:21:51)
14. Graph: Strength vs Interpretability (00:23:06)
15. (Text Resumes) (00:23:43)
16. Scaling laws (00:24:43)
17. Brain-Like-AGI Safety / Steven Byrnes (00:25:47)
18. Center for AI Safety (CAIS) / Dan Hendrycks (00:27:10)
19. Center for Human Compatible AI (CHAI) / Stuart Russell (00:29:41)
20. Center on Long Term Risk (CLR) (00:31:35)
21. Conjecture (00:33:13)
22. Epistemology (00:33:47)
23. Scalable LLM Interpretability (00:35:07)
24. Refine (00:36:09)
25. Simulacra Theory (00:36:49)
26. David Krueger (00:39:55)
27. DeepMind (00:42:13)
28. Dylan Hadfield-Menell (00:43:55)
29. Encultured (00:45:26)
30. Externalized Reasoning Oversight / Tamera Lanham (00:47:16)
31. Diagram: SSL vs RL (00:48:48)
32. (Text Resumes) (00:49:29)
33. Future of Humanity Institute (FHI) (00:51:11)
34. Fund For Alignment Research (FAR) (00:52:12)
35. MIRI (00:54:41)
36. Communicate their view on alignment (00:55:46)
37. Deception + Inner Alignment / Evan Hubinger (00:56:41)
38. Agent Foundations / Scott Garrabrant and Abram Demski (00:58:42)
39. Infra-Bayesianism / Vanessa Kosoy (00:59:13)
40. Visible Thoughts Project (01:03:31)
41. Jacob Steinhardt (01:03:52)
42. OpenAI (01:04:58)
43. Ought (01:08:33)
44. Redwood Research (01:12:02)
45. Adversarial training (01:12:06)
46. Diagram: The plan if we had to align AGI right now (01:12:16)
47. (Text Resumes) (01:13:57)
48. LLM interpretability (01:15:38)
49. Sam Bowman (01:16:13)
50. Selection Theorems / John Wentworth (01:17:03)
51. Team Shard (01:20:55)
52. Truthful AI / Owain Evans and Owen Cotton-Barratt (01:23:50)
53. Other Organizations (01:25:23)
54. Appendix (01:26:31)
55. Visualizing Differences (01:26:32)
56. Diagram: Different approaches (01:27:12)
57. (Text Resumes) Conceptual vs. applied (01:28:46)
58. Thomas’s Alignment Big Picture (01:29:21)
544 قسمت
Manage episode 424655360 series 3364760
https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is
Despite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing.[1]
The main body contains our understanding of what everyone is doing in technical alignment and why, as well as at least one of our opinions on each approach. We include supplements visualizing differences between approaches and Thomas’s big picture view on alignment. The opinions written are Thomas and Eli’s independent impressions, many of which have low resilience. Our all-things-considered views are significantly more uncertain.
This post was mostly written while Thomas was participating in the 2022 iteration SERI MATS program, under mentor John Wentworth. Thomas benefited immensely from conversations with other SERI MATS participants, John Wentworth, as well as many others who I met this summer.
فصل ها
1. "(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland (00:00:00)
2. Podcast note (00:00:21)
3. (Text resumes) (00:01:39)
4. Introduction (00:02:18)
5. A summary of our understanding of each approach: (00:04:18)
6. (Text Resumes) Previous related overviews include: (00:11:05)
7. Main Article Text (00:11:43)
8. Aligned AI / Stuart Armstrong (00:11:45)
9. Alignment Research Center (ARC) (00:14:48)
10. Eliciting Latent Knowledge / Paul Christiano (00:14:54)
11. Evaluating LM power-seeking / Beth Barnes (00:18:38)
12. Anthropic (00:21:06)
13. Interpretability (00:21:51)
14. Graph: Strength vs Interpretability (00:23:06)
15. (Text Resumes) (00:23:43)
16. Scaling laws (00:24:43)
17. Brain-Like-AGI Safety / Steven Byrnes (00:25:47)
18. Center for AI Safety (CAIS) / Dan Hendrycks (00:27:10)
19. Center for Human Compatible AI (CHAI) / Stuart Russell (00:29:41)
20. Center on Long Term Risk (CLR) (00:31:35)
21. Conjecture (00:33:13)
22. Epistemology (00:33:47)
23. Scalable LLM Interpretability (00:35:07)
24. Refine (00:36:09)
25. Simulacra Theory (00:36:49)
26. David Krueger (00:39:55)
27. DeepMind (00:42:13)
28. Dylan Hadfield-Menell (00:43:55)
29. Encultured (00:45:26)
30. Externalized Reasoning Oversight / Tamera Lanham (00:47:16)
31. Diagram: SSL vs RL (00:48:48)
32. (Text Resumes) (00:49:29)
33. Future of Humanity Institute (FHI) (00:51:11)
34. Fund For Alignment Research (FAR) (00:52:12)
35. MIRI (00:54:41)
36. Communicate their view on alignment (00:55:46)
37. Deception + Inner Alignment / Evan Hubinger (00:56:41)
38. Agent Foundations / Scott Garrabrant and Abram Demski (00:58:42)
39. Infra-Bayesianism / Vanessa Kosoy (00:59:13)
40. Visible Thoughts Project (01:03:31)
41. Jacob Steinhardt (01:03:52)
42. OpenAI (01:04:58)
43. Ought (01:08:33)
44. Redwood Research (01:12:02)
45. Adversarial training (01:12:06)
46. Diagram: The plan if we had to align AGI right now (01:12:16)
47. (Text Resumes) (01:13:57)
48. LLM interpretability (01:15:38)
49. Sam Bowman (01:16:13)
50. Selection Theorems / John Wentworth (01:17:03)
51. Team Shard (01:20:55)
52. Truthful AI / Owain Evans and Owen Cotton-Barratt (01:23:50)
53. Other Organizations (01:25:23)
54. Appendix (01:26:31)
55. Visualizing Differences (01:26:32)
56. Diagram: Different approaches (01:27:12)
57. (Text Resumes) Conceptual vs. applied (01:28:46)
58. Thomas’s Alignment Big Picture (01:29:21)
544 قسمت
همه قسمت ها
×
1 “Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild” by Adam Karvonen, Sam Marks 7:56






1 “Distillation Robustifies Unlearning” by Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout 17:19



1 “Beware General Claims about ‘Generalizable Reasoning Capabilities’ (of Modern AI Systems)” by LawrenceC 34:11






1 “Truth or Dare” by Duncan Sabien (Inactive) 2:03:21


به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.