Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that
Manage episode 519441983 series 3701080
A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.
https://assemblyai.com/aiexplained
Chapters:
00:00 - Introduction
00:56 - GPT 5.1 Smarter?
01:47 - Some Regressions
03:22 - Sycophancy?
05:22 - Claude Auto-Hacking
06:16 - Jailbreaking through Granularity
08:22 - This Will be Re-used
09:30 - Hallucinating Hacker
09:57 - Surprisingly Neutral Tone
12:18 - SIMA 2
14:10 - Alpha Parallels
17:24 - AI Music
GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/
System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf
Benchmarks: https://openai.com/index/gpt-5-1-for-developers/
Simple Bench: https://lmcouncil.ai/benchmarks
Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618
https://www.anthropic.com/news/disrupting-AI-espionage
Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
https://x.com/amoufarek/status/1988986075331858693
Voyager: https://voyager.minedojo.org/
Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/
40 قسمت