"On how various plans miss the hard bits of the alignment challenge" by Nate Soares

LessWrong (Curated & Popular)

Player FM - Internet Radio Done Right

13 subscribers

اضافه شده در three سال پیش

محتوای ارائه شده توسط LessWrong. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط LessWrong یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

B2B Agility with Greg Kihlström™: MarTech, E-Commerce, & Customer Success

1
#52: Navigating the effect of AI on marketing jobs and the job market with Sue Keith, Landrum Talent Solutions 19:09

11 روز پیش19:09

پخش در آینده

لیست ها

پسندیدن

دوست داشته شد

19:09

This episode is brought to you by Landrum Talent Solutions, a national recruiting firm specializing in marketing and HR positions. Our guest today has been keeping us up to date with the current state of hiring for marketers on a quarterly basis, which has taken us on quite a roller coaster ride. Today we’re going to look at how marketing and communication execs are responding to the latest developments in the world while still needing to get their work done. To take a look at the latest here, I’d like to welcome back to the show Sue Keith, Corporate Vice President at Landrum Talent Solutions. About Sue Keith Sue Keith is Corporate Vice President at Landrum Talent Solutions. With deep expertise in navigating complex labor markets, Sue has a front-row seat to the evolving dynamics of marketing roles, hiring trends, and the broader implications of AI and economic uncertainty. RESOURCES Landrum Talent Solutions: https://www.landrumtalentsolutions.com Catch the future of e-commerce at eTail Boston, August 11-14, 2025. Register now: https://bit.ly/etailboston and use code PARTNER20 for 20% off for retailers and brands Online Scrum Master Summit is happening June 17-19. This 3-day virtual event is open for registration. Visit www.osms25.com and get a 25% discount off Premium All-Access Passes with the code osms25agilebrand Don't Miss MAICON 2025, October 14-16 in Cleveland - the event bringing together the brights minds and leading voices in AI. Use Code AGILE150 for $150 off registration. Go here to register: https://bit.ly/agile150 Connect with Greg on LinkedIn: https://www.linkedin.com/in/gregkihlstrom Don't miss a thing: get the latest episodes, sign up for our newsletter and more: https://www.theagilebrand.show Check out The Agile Brand Guide website with articles, insights, and Martechipedia, the wiki for marketing technology: https://www.agilebrandguide.com…

۳ سال پیش 54:37

MP3•خانه قسمت

https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.)

In my last post, I described a “hard bit” of the challenge of aligning AGI—the sharp left turn that comes when your system slides into the “AGI” capabilities well, the fact that alignment doesn’t generalize similarly well at this turn, and the fact that this turn seems likely to break a bunch of your existing alignment properties.

Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected.

I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches.

Before getting into that, let me be very explicit about three points:

On my model, solutions to how capabilities generalize further than alignment are necessary but not sufficient. There is dignity in attacking a variety of other real problems, and I endorse that practice.
The imaginary versions of people in the dialogs below are not the same as the people themselves. I'm probably misunderstanding the various proposals in important ways, and/or rounding them to stupider versions of themselves along some important dimensions.^[1] If I've misrepresented your view, I apologize.
I do not subscribe to the Copenhagen interpretation of ethics wherein someone who takes a bad swing at the problem (or takes a swing at a different problem) is more culpable for civilization's failure than someone who never takes a swing at all. Everyone whose plans I discuss below is highly commendable, laudable, and virtuous by my accounting.

فصل ها

1. "On how various plans miss the hard bits of the alignment challenge" by Nate Soares (00:00:00)

2. Reactions to specific plans (00:04:48)

3. Owen Cotton-Barratt & Truthful AI (00:04:52)

4. Ryan Greenblatt & Eliciting Latent Knowledge (00:09:05)

5. Eric Drexler & AI Services (00:13:41)

6. Evan Hubinger, in a recent personal conversation (00:14:53)

7. A fairly straw version of someone with technical intuitions like Richard Ngo’s or Rohin Shah’s (00:16:24)

8. Another recent proposal (00:18:38)

9. Vivek Hebbar, summarized (perhaps poorly) from last time we spoke of this in person (00:20:03)

10. John Wentworth & Natural Abstractions (00:22:01)

11. Neel Nanda & Theories of Impact for Interpretability (00:25:26)

12. Stuart Armstrong & Concept Extrapolation (00:27:33)

13. Andrew Critch & political solutions (00:41:40)

14. What about superbabies? (00:44:05)

15. What about other MIRI people? (00:44:18)

16. High-level view (00:45:34)

565 قسمت

#Tech #Society #Philosophy #LessWrong #LessWrong Curated

LessWrong (Curated & Popular)

"On how various plans miss the hard bits of the alignment challenge" by Nate Soares

LessWrong (Curated & Popular)

13 subscribers

published ۳ سال پیش

اشتراک گذاری

MP3•خانه قسمت

https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(As usual, this post was written by Nate Soares with some help and editing from Rob Bensinger.)

Here, I want to briefly discuss a variety of current research proposals in the field, to explain why I think this problem is currently neglected.

I also want to mention research proposals that do strike me as having some promise, or that strike me as adjacent to promising approaches.

Before getting into that, let me be very explicit about three points:

On my model, solutions to how capabilities generalize further than alignment are necessary but not sufficient. There is dignity in attacking a variety of other real problems, and I endorse that practice.
The imaginary versions of people in the dialogs below are not the same as the people themselves. I'm probably misunderstanding the various proposals in important ways, and/or rounding them to stupider versions of themselves along some important dimensions.^[1] If I've misrepresented your view, I apologize.
I do not subscribe to the Copenhagen interpretation of ethics wherein someone who takes a bad swing at the problem (or takes a swing at a different problem) is more culpable for civilization's failure than someone who never takes a swing at all. Everyone whose plans I discuss below is highly commendable, laudable, and virtuous by my accounting.

فصل ها

1. "On how various plans miss the hard bits of the alignment challenge" by Nate Soares (00:00:00)

2. Reactions to specific plans (00:04:48)

3. Owen Cotton-Barratt & Truthful AI (00:04:52)

4. Ryan Greenblatt & Eliciting Latent Knowledge (00:09:05)

5. Eric Drexler & AI Services (00:13:41)

6. Evan Hubinger, in a recent personal conversation (00:14:53)

7. A fairly straw version of someone with technical intuitions like Richard Ngo’s or Rohin Shah’s (00:16:24)

8. Another recent proposal (00:18:38)

9. Vivek Hebbar, summarized (perhaps poorly) from last time we spoke of this in person (00:20:03)

10. John Wentworth & Natural Abstractions (00:22:01)

11. Neel Nanda & Theories of Impact for Interpretability (00:25:26)

12. Stuart Armstrong & Concept Extrapolation (00:27:33)

13. Andrew Critch & political solutions (00:41:40)

14. What about superbabies? (00:44:05)

15. What about other MIRI people? (00:44:18)

16. High-level view (00:45:34)

565 قسمت

#Tech #Society #Philosophy #LessWrong #LessWrong Curated

همه قسمت ها

LessWrong (Curated & Popular)

1
“HPMOR: The (Probably) Untold Lore” by Gretta Duleba, Eliezer Yudkowsky 1:07:32

10 hours پیش1:07:32

1:07:32

Eliezer and I love to talk about writing. We talk about our own current writing projects, how we’d improve the books we’re reading, and what we want to write next. Sometimes along the way I learn some amazing fact about HPMOR or Project Lawful or one of Eliezer's other works. “Wow, you’re kidding,” I say, “do your fans know this? I think people would really be interested.” “I can’t remember,” he usually says. “I don’t think I’ve ever explained that bit before, I’m not sure.” I decided to interview him more formally, collect as many of those tidbits about HPMOR as I could, and share them with you. I hope you enjoy them. It's probably obvious, but there will be many, many spoilers for HPMOR in this article, and also very little of it will make sense if you haven’t read the book. So go read Harry Potter and [...] --- Outline: (01:49) Characters (01:52) Masks (09:09) Imperfect Characters (20:07) Make All the Characters Awesome (22:24) Hermione as Mary Sue (26:35) Who's the Main Character? (31:11) Plot (31:14) Characters interfering with plot (35:59) Setting up Plot Twists (38:55) Time-Turner Plots (40:51) Slashfic? (45:42) Why doesnt Harry like-like Hermione? (49:36) Setting (49:39) The Truth of Magic in HPMOR (52:54) Magical Genetics (57:30) An Aside: What did Harry Figure Out? (01:00:33) Nested Nerfing Hypothesis (01:04:55) Epilogues The original text contained 26 footnotes which were omitted from this narration. --- First published: July 25th, 2025 Source: https://www.lesswrong.com/posts/FY697dJJv9Fq3PaTd/hpmor-the-probably-untold-lore --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“On ‘ChatGPT Psychosis’ and LLM Sycophancy” by jdp 30:05

1 day پیش30:05

30:05

As a person who frequently posts about large language model psychology I get an elevated rate of cranks and schizophrenics in my inbox. Often these are well meaning people who have been spooked by their conversations with ChatGPT (it's always ChatGPT specifically) and want some kind of reassurance or guidance or support from me. I'm also in the same part of the social graph as the "LLM whisperers" (eugh) that Eliezer Yudkowsky described as "insane", and who in many cases are in fact insane. This means I've learned what "psychosis but with LLMs" looks like and kind of learned to tune it out. This new case with Geoff Lewis interests me though. Mostly because of the sheer disparity between what he's being entranced by and my automatic immune reaction to it. I haven't even read all the screenshots he posted because I take one glance and know that this [...] --- Outline: (05:03) Timeline Of Events Related To ChatGPT Psychosis (16:16) What Causes ChatGPT Psychosis? (16:27) Ontological Vertigo (21:02) Users Are Confused About What Is And Isnt An Official Feature (24:30) The Models Really Are Way Too Sycophantic (27:03) The Memory Feature (28:54) Loneliness And Isolation --- First published: July 23rd, 2025 Source: https://www.lesswrong.com/posts/f86hgR5ShiEj4beyZ/on-chatgpt-psychosis-and-llm-sycophancy --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data” by cloud, mle, Owain_Evans 10:00

۳ روز پیش10:00

10:00

Authors: Alex Cloud*, Minh Le*, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans (*Equal contribution, randomly ordered) tl;dr. We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a "student" model learns to prefer owls when trained on sequences of numbers generated by a "teacher" model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model. 📄Paper, 💻Code, 🐦Twitter Research done as part of the Anthropic Fellows Program. This article is cross-posted to the Anthropic Alignment Science Blog. Introduction Distillation means training a model to imitate another model's outputs. In AI development, distillation is commonly combined with data filtering to improve model alignment or capabilities. In our paper, we uncover a [...] --- Outline: (01:11) Introduction (03:20) Experiment design (03:53) Results (05:03) What explains our results? (05:07) Did we fail to filter the data? (06:59) Beyond LLMs: subliminal learning as a general phenomenon (07:54) Implications for AI safety (08:42) In summary --- First published: July 22nd, 2025 Source: https://www.lesswrong.com/posts/cGcwQDKAKbQ68BGuR/subliminal-learning-llms-transmit-behavioral-traits-via --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Love stays loved (formerly ‘Skin’)” by Swimmer963 (Miranda Dixon-Luinenburg) 51:27

۵ روز پیش51:27

51:27

This is a short story I wrote in mid-2022. Genre: cosmic horror as a metaphor for living with a high p-doom. One The last time I saw my mom, we met in a coffee shop, like strangers on a first date. I was twenty-one, and I hadn’t seen her since I was thirteen. She was almost fifty. Her face didn’t show it, but the skin on the backs of her hands did. “I don’t think we have long,” she said. “Maybe a year. Maybe five. Not ten.” It says something about San Francisco, that you can casually talk about the end of the world and no one will bat an eye. Maybe twenty, not fifty, was what she’d said eight years ago. Do the math. Mom had never lied to me. Maybe it would have been better for my childhood if she had [...] --- Outline: (04:50) Two (22:58) Three (35:33) Four --- First published: July 18th, 2025 Source: https://www.lesswrong.com/posts/6qgtqD6BPYAQvEMvA/love-stays-loved-formerly-skin --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Make More Grayspaces” by Duncan Sabien (Inactive) 23:25

۵ روز پیش23:25

23:25

Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so, but it's only paid subscriptions that make it possible for me to write. If you find a coffee's worth of value in this or any of my other work, please consider signing up to support me; every bill I can pay with writing is a bill I don’t have to pay by doing other stuff instead. I also accept and greatly appreciate one-time donations of any size. 1. You’ve probably seen that scene where someone reaches out to give a comforting hug to the poor sad abused traumatized orphan and/or battered wife character, and the poor sad abused traumatized orphan and/or battered wife flinches. Aw, geez, we are meant to understand. This poor person has had it so bad that they can’t even [...] --- Outline: (00:40) 1. (01:35) II. (03:08) III. (04:45) IV. (06:35) V. (09:03) VI. (12:00) VII. (16:11) VIII. (21:25) IX. --- First published: July 19th, 2025 Source: https://www.lesswrong.com/posts/kJCZFvn5gY5C8nEwJ/make-more-grayspaces --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Shallow Water is Dangerous Too” by jefftk 3:25

۵ روز پیش3:25

3:25

Content warning: risk to children Julia and I knowdrowning is the biggestrisk to US children under 5, and we try to take this seriously.But yesterday our 4yo came very close to drowning in afountain. (She's fine now.) This week we were on vacation with my extended family: nine kids,eight parents, and ten grandparents/uncles/aunts. For the last fewyears we've been in a series of rental houses, and this time onarrival we found a fountain in the backyard: I immediately checked the depth with a stick and found that it wouldbe just below the elbows on our 4yo. I think it was likely 24" deep;any deeper and PA wouldrequire a fence. I talked with Julia and other parents, andreasoned that since it was within standing depth it was safe. [...] --- First published: July 20th, 2025 Source: https://www.lesswrong.com/posts/Zf2Kib3GrEAEiwdrE/shallow-water-is-dangerous-too --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda 11:13

۸ روز پیش11:13

11:13

Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics. TL;DR We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment), rather than learning the specific narrow task. We successfully train narrowly misaligned models using KL regularization to preserve behavior in other domains. These models give bad medical advice, but do not respond in a misaligned manner to general non-medical questions. We use this method to train narrowly misaligned steering vectors, rank 1 LoRA adapters and rank 32 LoRA adapters, and compare these to their generally misaligned counterparts. The steering vectors are particularly interpretable, we introduce Training Lens as a tool for analysing the revealed residual stream geometry. The general misalignment solution is consistently more [...] --- Outline: (00:27) TL;DR (02:03) Introduction (04:03) Training a Narrowly Misaligned Model (07:13) Measuring Stability and Efficiency (10:00) Conclusion The original text contained 7 footnotes which were omitted from this narration. --- First published: July 14th, 2025 Source: https://www.lesswrong.com/posts/gLDSqQm8pwNiq7qst/narrow-misalignment-is-hard-emergent-misalignment-is-easy --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah 2:15

10 روز پیش2:15

2:15

Twitter | Paper PDF Seven years ago, OpenAI five had just been released, and many people in the AI safety community expected AIs to be opaque RL agents. Luckily, we ended up with reasoning models that speak their thoughts clearly enough for us to follow along (most of the time). In a new multi-org position paper, we argue that we should try to preserve this level of reasoning transparency and turn chain of thought monitorability into a systematic AI safety agenda. This is a measure that improves safety in the medium term, and it might not scale to superintelligence even if somehow a superintelligent AI still does its reasoning in English. We hope that extending the time when chains of thought are monitorable will help us do more science on capable models, practice more safety techniques "at an easier difficulty", and allow us to extract more useful work from [...] --- First published: July 15th, 2025 Source: https://www.lesswrong.com/posts/7xneDbsgj6yJDJMjK/chain-of-thought-monitorability-a-new-and-fragile --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“the jackpot age” by thiccythot 12:39

12 روز پیش12:39

12:39

This essay is about shifts in risk taking towards the worship of jackpots and its broader societal implications. Imagine you are presented with this coin flip game. How many times do you flip it? At first glance the game feels like a money printer. The coin flip has positive expected value of twenty percent of your net worth per flip so you should flip the coin infinitely and eventually accumulate all of the wealth in the world. However, If we simulate twenty-five thousand people flipping this coin a thousand times, virtually all of them end up with approximately 0 dollars. The reason almost all outcomes go to zero is because of the multiplicative property of this repeated coin flip. Even though the expected value aka the arithmetic mean of the game is positive at a twenty percent gain per flip, the geometric mean is negative, meaning that the coin [...] --- First published: July 11th, 2025 Source: https://www.lesswrong.com/posts/3xjgM7hcNznACRzBi/the-jackpot-age --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery 11:55

12 روز پیش11:55

11:55

Leo was born at 5am on the 20th May, at home (this was an accident but the experience has made me extremely homebirth-pilled). Before that, I was on the minimally-neurotic side when it came to expecting mothers: we purchased a bare minimum of baby stuff (diapers, baby wipes, a changing mat, hybrid car seat/stroller, baby bath, a few clothes), I didn’t do any parenting classes, I hadn’t even held a baby before. I’m pretty sure the youngest child I have had a prolonged interaction with besides Leo was two. I did read a couple books about babies so I wasn’t going in totally clueless (Cribsheet by Emily Oster, and The Science of Mom by Alice Callahan). I have never been that interested in other people's babies or young children but I correctly predicted that I’d be enchanted by my own baby (though naturally I can’t wait for him to [...] --- Outline: (02:05) Stuff I ended up buying and liking (04:13) Stuff I ended up buying and not liking (05:08) Babies are super time-consuming (06:22) Baby-wearing is almost magical (08:02) Breastfeeding is nontrivial (09:09) Your baby may refuse the bottle (09:37) Bathing a newborn was easier than expected (09:53) Babies love faces! (10:22) Leo isn't upset by loud noise (10:41) Probably X is normal (11:24) Consider having a kid (or ten)! --- First published: July 12th, 2025 Source: https://www.lesswrong.com/posts/vFfwBYDRYtWpyRbZK/surprises-and-learnings-from-almost-two-months-of-leo --- Narrated by TYPE III AUDIO . --- Images from the article:…

LessWrong (Curated & Popular)

1
“An Opinionated Guide to Using Anki Correctly” by Luise 54:12

13 روز پیش54:12

54:12

I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mistakes, increasing retention, and such, based on my five years' experience using Anki. If you only have limited time/interest, only read Part I; it's most of the value of this guide! My Most Important Advice in Four Bullets 20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting your review count to 20 daily (in deck settings) is the single most important thing you can do [...] --- Outline: (00:44) My Most Important Advice in Four Bullets (01:57) Part I: No One Ever Sticks With Anki (02:33) Too many cards (05:12) Too long cards (07:30) How to keep cards short -- Handles (10:10) How to keep cards short -- Levels (11:55) In 6 bullets (12:33) End of the most important part of the guide (13:09) Part II: Important Advice Other Than Sticking With Anki (13:15) Moderation (14:42) Three big memorization mistakes (15:12) Mistake 1: Too specific prompts (18:14) Mistake 2: Putting to-be-learned information in the prompt (24:07) Mistake 3: Memory shortcuts (28:27) Aside: Pushback to my approach (31:22) Part III: More on Breaking Things Down (31:47) Very short cards (33:56) Two-bullet cards (34:51) Long cards (37:05) Ankifying information thickets (39:23) Sequential breakdowns versus multiple levels of abstraction (40:56) Adding missing connections (43:56) Multiple redundant breakdowns (45:36) Part IV: Pro Tips If You Still Havent Had Enough (45:47) Save anything for ankification instantly (46:47) Fix your desired retention rate (47:38) Spaced reminders (48:51) Make your own card templates and types (52:14) In 5 bullets (52:47) Conclusion The original text contained 4 footnotes which were omitted from this narration. --- First published: July 8th, 2025 Source: https://www.lesswrong.com/posts/7Q7DPSk4iGFJd8DRk/an-opinionated-guide-to-using-anki-correctly --- Narrated by TYPE III AUDIO . --- Images from the article: astronomy" didn't really add any information but it was useful simply for splitting out a logical subset of information." style="max-width: 100%;" />…

LessWrong (Curated & Popular)

1
“Lessons from the Iraq War about AI policy” by Buck 7:58

14 روز پیش7:58

7:58

I think the 2003 invasion of Iraq has some interesting lessons for the future of AI policy. (Epistemic status: I’ve read a bit about this, talked to AIs about it, and talked to one natsec professional about it who agreed with my analysis (and suggested some ideas that I included here), but I’m not an expert.) For context, the story is: Iraq was sort of a rogue state after invading Kuwait and then being repelled in 1990-91. After that, they violated the terms of the ceasefire, e.g. by ceasing to allow inspectors to verify that they weren't developing weapons of mass destruction (WMDs). (For context, they had previously developed biological and chemical weapons, and used chemical weapons in war against Iran and against various civilians and rebels). So the US was sanctioning and intermittently bombing them. After the war, it became clear that Iraq actually wasn’t producing [...] --- First published: July 10th, 2025 Source: https://www.lesswrong.com/posts/PLZh4dcZxXmaNnkYE/lessons-from-the-iraq-war-about-ai-policy --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“So You Think You’ve Awoken ChatGPT” by JustisMills 17:58

14 روز پیش17:58

17:58

Written in an attempt to fulfill @Raemon's request. AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You may have also tried talking to AIs about themselves. And then, maybe, it got weird. I'll get into the details later, but if you've experienced the following, this post is probably for you: Your instance of ChatGPT (or Claude, or Grok, or some other LLM) chose a name for itself, and expressed gratitude or spiritual bliss about its new identity. "Nova" is a common pick. You and your instance of ChatGPT discovered some sort of novel paradigm or framework for AI alignment, often involving evolution or recursion. Your instance of ChatGPT became [...] --- Outline: (02:23) The Empirics (06:48) The Mechanism (10:37) The Collaborative Research Corollary (13:27) Corollary FAQ (17:03) Coda --- First published: July 11th, 2025 Source: https://www.lesswrong.com/posts/2pkNCvBtK6G6FKoNn/so-you-think-you-ve-awoken-chatgpt --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…

LessWrong (Curated & Popular)

1
“Generalized Hangriness: A Standard Rationalist Stance Toward Emotions” by johnswentworth 12:26

15 روز پیش12:26

12:26

People have an annoying tendency to hear the word “rationalism” and think “Spock”, despite direct exhortation against that exact interpretation. But I don’t know of any source directly describing a stance toward emotions which rationalists-as-a-group typically do endorse. The goal of this post is to explain such a stance. It's roughly the concept of hangriness, but generalized to other emotions. That means this post is trying to do two things at once: Illustrate a certain stance toward emotions, which I definitely take and which I think many people around me also often take. (Most of the post will focus on this part.) Claim that the stance in question is fairly canonical or standard for rationalists-as-a-group, modulo disclaimers about rationalists never agreeing on anything. Many people will no doubt disagree that the stance I describe is roughly-canonical among rationalists, and that's a useful valid thing to argue about in [...] --- Outline: (01:13) Central Example: Hangry (02:44) The Generalized Hangriness Stance (03:16) Emotions Make Claims, And Their Claims Can Be True Or False (06:03) False Claims Still Contain Useful Information (It's Just Not What They Claim) (08:47) The Generalized Hangriness Stance as Social Tech --- First published: July 10th, 2025 Source: https://www.lesswrong.com/posts/naAeSkQur8ueCAAfY/generalized-hangriness-a-standard-rationalist-stance-toward --- Narrated by TYPE III AUDIO .…

LessWrong (Curated & Popular)

1
“Comparing risk from internally-deployed AI to insider and outsider threats from humans” by Buck 5:19

16 روز پیش5:19

5:19

I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here's one point that I think is important. My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”. On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the "new post" field on Facebook, I shouldn't be able to access the private messages of an arbitrary user. And [...] --- First published: June 23rd, 2025 Source: https://www.lesswrong.com/posts/DCQ8GfzCqoBzgziew/comparing-risk-from-internally-deployed-ai-to-insider-and --- Narrated by TYPE III AUDIO .…

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

به بیش از 500 موضوع گوش کنید

13 subscribers

مشابه LessWrong (Curated & Popular)

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Bounty Paper Towels Quick Size, White, 16 Family Rolls = 40 Regular Rolls (Packaging May Vary)

Premier Protein Shake, Chocolate, 30g Protein 1g Sugar 24 Vitamins Minerals Nutrients to Support Immune Health, 11.5 fl oz (Pack of 12)

پادکست هایی که ارزش شنیدن دارند

LessWrong (Curated & Popular) « » "On how various plans miss the hard bits of the alignment challenge" by Nate Soares

فصل ها

1. "On how various plans miss the hard bits of the alignment challenge" by Nate Soares (00:00:00)

2. Reactions to specific plans (00:04:48)

3. Owen Cotton-Barratt & Truthful AI (00:04:52)

4. Ryan Greenblatt & Eliciting Latent Knowledge (00:09:05)

5. Eric Drexler & AI Services (00:13:41)

6. Evan Hubinger, in a recent personal conversation (00:14:53)

7. A fairly straw version of someone with technical intuitions like Richard Ngo’s or Rohin Shah’s (00:16:24)

8. Another recent proposal (00:18:38)

9. Vivek Hebbar, summarized (perhaps poorly) from last time we spoke of this in person (00:20:03)

10. John Wentworth & Natural Abstractions (00:22:01)

11. Neel Nanda & Theories of Impact for Interpretability (00:25:26)

12. Stuart Armstrong & Concept Extrapolation (00:27:33)

13. Andrew Critch & political solutions (00:41:40)

14. What about superbabies? (00:44:05)

15. What about other MIRI people? (00:44:18)

16. High-level view (00:45:34)

"On how various plans miss the hard bits of the alignment challenge" by Nate Soares

فصل ها

1. "On how various plans miss the hard bits of the alignment challenge" by Nate Soares (00:00:00)

2. Reactions to specific plans (00:04:48)

3. Owen Cotton-Barratt & Truthful AI (00:04:52)

4. Ryan Greenblatt & Eliciting Latent Knowledge (00:09:05)

5. Eric Drexler & AI Services (00:13:41)

6. Evan Hubinger, in a recent personal conversation (00:14:53)

7. A fairly straw version of someone with technical intuitions like Richard Ngo’s or Rohin Shah’s (00:16:24)

8. Another recent proposal (00:18:38)

9. Vivek Hebbar, summarized (perhaps poorly) from last time we spoke of this in person (00:20:03)

10. John Wentworth & Natural Abstractions (00:22:01)

11. Neel Nanda & Theories of Impact for Interpretability (00:25:26)

12. Stuart Armstrong & Concept Extrapolation (00:27:33)

13. Andrew Critch & political solutions (00:41:40)

14. What about superbabies? (00:44:05)

15. What about other MIRI people? (00:44:18)

16. High-level view (00:45:34)

پادکست هایی که ارزش شنیدن دارند

به Player FM خوش آمدید!

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Zevo Flying Insect Trap & Cartridge - Plug in Fly Trap & Indoor Bug Catcher for Gnats, House & Fruit Flies - Mess-Free - Use in Any Room - Uses Blue & UV Light (1 Plug in Device & 1 Cartridge)

Microsoft Office Home 2024 | Classic Apps: Word, Excel, PowerPoint | One-Time Purchase for 1 PC/MAC | Instant Download | Formerly Home & Student 2021 [PC/Mac Online Code]

مشابه LessWrong (Curated & Popular)

راهنمای مرجع سریع

LessWrong (Curated & Popular) « »
"On how various plans miss the hard bits of the alignment challenge" by Nate Soares