Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Player FM - Internet Radio Done Right

1,763 subscribers

Artificial Intelligence

اضافه شده در seven سال پیش

محتوای ارائه شده توسط TWIML and Sam Charrington. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط TWIML and Sam Charrington یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

How to Be a Better Human

1
How to make the most of a finite life (w/ Oliver Burkeman) 40:22

26 weeks پیش40:22

پخش در آینده

لیست ها

پسندیدن

دوست داشته شد

40:22

There’s only so much you can do in a week – or, according to Oliver Burkeman, in the roughly 4,000 weeks the average human lives. Oliver is a journalist and author of the books Four Thousand Weeks: Time Management for Mortals, Meditations for Mortals, and the newsletter “The Imperfectionist.” Chris and Oliver discuss the paradox of why change can only occur once we accept that we might not be able to change. Oliver also shares how life’s mishaps can become our most treasured memories and why sharing your imperfections is an act of generosity. For the full text transcript, visit go.ted.com/BHTranscripts For the full text transcript, visit go.ted.com/BHTranscripts For a chance to give your own TED Talk, fill out the Idea Search Application: ted.com/ideasearch . Interested in learning more about upcoming TED events? Follow these links: TEDNext: ted.com/futureyou Hosted on Acast. See acast.com/privacy for more information.…

حدود یک سال پیش 46:24

MP3•خانه قسمت

Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning.

The complete show notes for this episode can be found at twimlai.com/go/680.

761 قسمت

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1,763 subscribers

published حدود یک سال پیش

اشتراک گذاری

MP3•خانه قسمت

The complete show notes for this episode can be found at twimlai.com/go/680.

761 قسمت

Tutti gli episodi

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Context Engineering for Productive AI Agents with Filip Kozera - #741 46:01

۷ روز پیش46:01

46:01

In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current limitations of agent protocols like MCPs and how developers can extend them to handle the required context and authority. The conversation challenges the idea that more powerful models lead to more autonomous agents, arguing instead for "graceful recovery" systems that proactively bring humans into the loop when the agent "knows what it doesn't know." We also get into the "application layer" fight, exploring how SaaS platforms are creating data silos and what this means for the future of interoperable AI agents. Filip also shares his vision for the "word artisan"—the non-technical user who can now build and manage a fleet of AI agents, fundamentally changing the nature of knowledge work. The complete show notes for this episode can be found at https://twimlai.com/go/741 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740 1:13:02

14 روز پیش1:13:02

1:13:02

In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneously faster, more accurate, and even cheaper than single-model approaches. Using examples like "laconic decoding," Jared explains the practical techniques for building these systems and the underlying principles of inference-time scaling. The conversation also delves into the critical role of co-design, where the evolution of AI algorithms and the underlying cloud infrastructure are deeply intertwined, shaping the future of agentic AI and the compute landscape. The complete show notes for this episode can be found at https://twimlai.com/go/740 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739 1:13:02

21 روز پیش1:13:02

1:13:02

In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages the complexities of multi-turn conversations. We explore why many production systems favor a modular, multi-model approach over the end-to-end models demonstrated by large AI labs, and how this impacts everything from latency and cost to observability and evaluation. Kwin also digs into the core challenges of interruption handling, turn-taking, and creating truly natural conversational dynamics, and how to overcome them. We discuss use cases, thoughts on where the technology is headed, the move toward hybrid edge-cloud pipelines, and the exciting future of real-time video avatars, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/739 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738 1:00:29

27 روز پیش1:00:29

1:00:29

Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorporates distilling large language models for structured scene understanding and safe planning motion in critical "long-tail" scenarios. We explore how DiMA utilizes LLMs' world knowledge and efficient transformer-based models to significantly reduce collision rates and trajectory errors. We then discuss “SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation,” a diffusion-distilled approach that combines generative models with metric depth estimation to produce sharp, accurate monocular depth maps. Additionally, Fatih also shares a look at Qualcomm’s on-device demos, including text-to-3D mesh generation, real-time image-to-video and video-to-video generation, and a multi-modal visual question-answering assistant. The complete show notes for this episode can be found at https://twimlai.com/go/738 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Building the Internet of Agents with Vijoy Pandey - #737 56:13

۶ weeks پیش56:13

56:13

Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, probabilistic, and noisy environment, a stark contrast to the deterministic APIs of the past. Vijoy introduces Cisco's vision for an "Internet of Agents," a platform to manage this new reality, and its open-source implementation, AGNTCY. We explore the four phases of agent collaboration—discovery, composition, deployment, and evaluation—and dive deep into the communication stack, from syntactic protocols like A2A, ACP, and MCP to the deeper semantic challenges of creating a shared understanding between agents. Vijoy also unveils SLIM (Secure Low-Latency Interactive Messaging), a novel transport layer designed to make agent-to-agent communication quantum-safe, real-time, and efficient for multi-modal workloads. The complete show notes for this episode can be found at ⁠ https://twimlai.com/go/737.…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 59:31

۷ weeks پیش59:31

59:31

Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for trading and investment. We explore the firm's platform-centric approach to managing an extensive portfolio of features and models, the impact of multimodal LLMs on accelerating the process of extracting novel features, the importance of strict data timestamping to prevent temporal leakage, and the way they consider build vs. buy decisions in a rapidly evolving landscape. Lastly, Ben also shares insights on leveraging open-source models and the future of agentic AI in quantitative finance. The complete show notes for this episode can be found at https://twimlai.com/go/736 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 56:45

۸ weeks پیش56:45

56:45

Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling rivals human performance,” which demonstrates how zero-shot auto-labeling with foundation models can yield to significant cost and time savings compared to traditional human annotation. Jason explains how auto-labels, despite being "noisier" at lower confidence thresholds, can lead to better downstream model performance. We also cover Voxel51's "verified auto-labeling" approach, which utilizes a "stoplight" QA workflow (green, yellow, red light) to minimize human review. Finally, we discuss the challenges of handling decision boundary uncertainty and out-of-domain classes, the differences between synthetic data generation in vision and language domains, and the potential of agentic labeling. The complete show notes for this episode can be found at https://twimlai.com/go/735 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 1:25:21

۹ weeks پیش1:25:21

1:25:21

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Google I/O 2025 Special Edition - #733 26:21

10 weeks پیش26:21

26:21

Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and creator of the Pipecat open source project. We cover all the highlights from the event, including enhancements to the Gemini models like thinking budgets and thought summaries, native audio output for expressive voice AI, and the new URL Context tool for research agents. The discussion also digs into the Gemini Live API, covering its architecture, the challenges of building real-time voice applications (such as latency and voice activity detection), and new features like proactive audio and asynchronous function calling. Finally, don’t miss our guests’ wish lists for next year’s I/O! The complete show notes for this episode can be found at https://twimlai.com/go/733 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 57:09

11 weeks پیش57:09

57:09

Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples of unsafe outputs that can emerge from these systems, different approaches to evaluating these safety risks, and the potential reasons behind this counterintuitive behavior. Shifting to the application of generative AI in financial services, Sebastian outlines a domain-specific safety taxonomy designed for the industry's unique needs. We also explore the critical role of governance and regulatory frameworks in addressing these concerns, the role of prompt engineering in bolstering safety, Bloomberg’s multi-layered mitigation strategies, and vital areas for further work in improving AI safety within specialized domains. The complete show notes for this episode can be found at https://twimlai.com/go/732 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25

12 weeks پیش1:01:25

1:01:25

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27

13 weeks پیش1:07:27

1:07:27

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (MCP), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous. The complete show notes for this episode can be found at https://twimlai.com/go/730 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18

14 weeks پیش56:18

56:18

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Generative Benchmarking with Kelly Hong - #728 54:17

15 weeks پیش54:17

54:17

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06

16 weeks پیش1:34:06

1:34:06

In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727 .…

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

به بیش از 500 موضوع گوش کنید

1,763 subscribers

مشابه The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Amazon Basics Multipurpose Copy Printer Paper, 8.5 x 11 Inches, 20 lb, 1 Ream, (500 Sheets), 92 Bright, White

Elmer's Disappearing Purple School Glue Sticks Washable 6 Grams 12 Count

Bounty Paper Towels Quick Size, White, 16 Family Rolls = 40 Regular Rolls

پادکست هایی که ارزش شنیدن دارند

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « » Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

پادکست هایی که ارزش شنیدن دارند

به Player FM خوش آمدید!

Crayola Colored Pencils (36ct), Kids Pencil Set, Back to School Essentials, Must Have Classroom Supplies for Kids, Pre-Sharpened Coloring Book Pencils, 3+

Apple 2025 MacBook Air 13-inch Laptop with M4 chip: Built for Apple Intelligence, 13.6-inch Liquid Retina Display, 16GB Unified Memory, 256GB SSD Storage, 12MP Center Stage Camera, Touch ID; Midnight

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Zevo Flying Insect Trap & Cartridge - Plug in Fly Trap & Indoor Bug Catcher for Gnats, House & Fruit Flies - Mess-Free - Use in Any Room - Uses Blue & UV Light (1 Plug in Device & 1 Cartridge)

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

مشابه The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

راهنمای مرجع سریع

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « »
Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680