#043 Knowledge Graphs Won't Fix Bad Data

How AI Is Built

Player FM - Internet Radio Done Right

اضافه شده در one سال پیش

محتوای ارائه شده توسط Nicolay Gerold. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Nicolay Gerold یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

This Is Woman's Work with Nicole Kalil

1
The 3 N’s - Negotiation, Networking & No with Kathryn Valentine | 327 37:43

۵ weeks پیش37:43

پخش در آینده

لیست ها

پسندیدن

دوست داشته شد

37:43

Let’s talk about the three things women are told not to do: negotiate, network unapologetically, and say no like we mean it. Most of us have been programmed to default to yes—to the point that we feel guilty saying no, even when it’s the most obvious answer. And when we do say no? We often soften it, explain it away, and sugarcoat it so much that it barely sounds like a no at all. Kathryn Valentine—CEO of Worthmore Strategies and corporate badass helping companies retain and promote female talent—is here to flip that script. With experience advising Fortune 100s and dropping knowledge in places like HBR and Fast Company, Kathryn knows exactly how women can claim their worth, own their voice, and not feel bad about it. From salary talks to schedule shifts, from asking for more to turning down what doesn’t serve you, this episode is your reminder: your power doesn't come from being liked. It comes from knowing what matters and having the guts to go after it. Kathryn even drops her epic list of 76 things you can negotiate (yes, SEVENTY-SIX). So if you've ever softened your no or stayed silent in a meeting, this one’s for you. Connect with Kathryn: Website: www.worthmorestrategies.com 76 Things You Can Negotiate: www.76things.com Related Podcast Episodes: The Hard Truths Of Entrepreneurship with Dr. Darnyelle Jervey Harmon | 313 Toxic Productivity with Israa Nasir | 254 Be A Likeable Badass with Alison Fragale | 230 Share the Love: If you found this episode insightful, please share it with a friend, tag us on social media, and leave a review on your favorite podcast platform! 🔗 Subscribe & Review: Apple Podcasts | Spotify | Amazon Music Learn more about your ad choices. Visit megaphone.fm/adchoices…

حدود یک سال پیش 1:10:58

MP3•خانه قسمت

Metadata is the foundation of any enterprise knowledge graph.

By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.

The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.

Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.

Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.

Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.

A modern data catalog should:

Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.
Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).
Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.

Practical Approaches & Use Cases:

Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.
Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.
Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.

Technical Considerations:

Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.
Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.

Juan Sequeda:

LinkedIn
data.world
Semantic Web for the Working Ontologist
Designing and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)
Catalog & Cocktails (Juan’s podcast)

Nicolay Gerold:

00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs

62 قسمت

#Tech #Nicolay Gerold #Technology #Llm #Machine Learning #Data Engineering

#043 Knowledge Graphs Won't Fix Bad Data

How AI Is Built

published حدود یک سال پیش

اشتراک گذاری

MP3•خانه قسمت

Metadata is the foundation of any enterprise knowledge graph.

By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.

The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.

A modern data catalog should:

Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.
Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).
Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.

Practical Approaches & Use Cases:

Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.
Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.
Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.

Technical Considerations:

Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.
Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.

Juan Sequeda:

LinkedIn
data.world
Semantic Web for the Working Ontologist
Designing and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)
Catalog & Cocktails (Juan’s podcast)

Nicolay Gerold:

62 قسمت

#Tech #Nicolay Gerold #Technology #Llm #Machine Learning #Data Engineering

همه قسمت ها

1
Embedding Intelligence: AI's Move to the Edge 1:05:35

۶ روز پیش1:05:35

1:05:35

Nicolay here, while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required. Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book. His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken. Key Insight: The Real World Action Gap LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller. This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data. Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets. 💡 Core Concepts Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains ⏱ Important Moments Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands 8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems 🛠 Tools & Tech Whisper: github.com/openai/whisper Moonshine: github.com/usefulsensors/moonshine TinyML Book: oreilly.com/library/view/tinyml/9781492052036 Stanford Edge ML: github.com/petewarden/stanford-edge-ml 📚 Resources Looking to Listen Paper: looking-to-listen.github.io Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635 Connect: pete@usefulsensors.com | petewarden.com | usefulsensors.com Beta Opportunity: Moonshine browser implementation for client-side speech processing in JavaScript…

1
#054 Building Frankenstein Models with Model Merging and the Future of AI 1:06:55

21 روز پیش1:06:55

1:06:55

Nicolay here,most AI conversations focus on training bigger models with more compute. This one explores the counterintuitive world where averaging weights from different models creates better performance than expensive post-training. Today I have the chance to talk to Maxime Labonne, who's a researcher at Liquid AI and the architect of some of the most popular open source models on Hugging Face. He went from researching neural networks for cybersecurity to building "Frankenstein models" through techniques that shouldn't work but consistently do. Key Insight: Model Merging as a Free Lunch The core breakthrough is deceptively simple: take two fine-tuned models, average their weights layer by layer, and often get better performance than either individual model. Maxime initially started writing an article to explain why this couldn't work, but his own experiments convinced him otherwise. The magic lies in knowledge compression and regularization. When you train a model multiple times on similar data, each run creates slightly different weight configurations due to training noise. Averaging these weights creates a smoother optimization path that avoids local minima. You can literally run model merging on a CPU - no GPUs required. In the podcast, we also touch on: Obliteration: removing safety refusal mechanisms without retraining Why synthetic data now comprises 90%+ of fine-tuning datasets The evaluation crisis and automated benchmarks missing real-world performance Chain of thought compression techniques for reasoning models 💡 Core Concepts Model Merging : Averaging weights across layers from multiple fine-tuned models to create improved performance without additional training Obliteration : Training-free method to remove refusal directions from models by computing activation differences Linear Merging : The least opinionated merging technique that simply averages weights with optional scaling factors Refusal Direction : The activation pattern that indicates when a model will output a safety refusal 📶 Connect with Maxime: X / Twitter: https://x.com/maximelabonne LinkedIn: https://www.linkedin.com/in/maxime-labonne/ Company: https://www.liquid.ai/ 📶 Connect with Nicolay: LinkedIn: https://www.linkedin.com/in/nicolay-gerold/ X / Twitter: https://x.com/nicolaygerold Website: https://www.nicolaygerold.com/ ⏱ Important Moments Model Merging Discovery Process: [00:00:30] Maxime explains how he started writing an article to debunk model merging Two Main Merging Use Cases: [11:04] Clear distinction between merging checkpoints versus combining different task-specific capabilities Linear Merging as Best Practice: [21:00] Why simple weight averaging consistently outperforms more complex techniques Layer Importance Hierarchy: [21:18] First and last layers have the most influence on model behavior Obliteration Technique Explained: [36:07] How to compute and subtract refusal directions from model activations Synthetic Data Dominance: [50:00] Modern fine-tuning uses 90%+ synthetic data 🛠 Tools & Tech Mentioned MergeKit: https://github.com/cg123/mergekit Transformer Lens: https://github.com/TransformerLensOrg/TransformerLens Hugging Face Transformers: https://github.com/huggingface/transformers PyTorch: https://pytorch.org/ 📚 Recommended Resources Maxime's Model Merging Articles: https://huggingface.co/blog/merge Model Soups Paper: https://arxiv.org/abs/2203.05482 Will Brown's Rubric Engineering: https://x.com/willccbb/status/1883611121577517092…

1
#053 AI in the Terminal: Enhancing Coding with Warp 1:04:30

27 روز پیش1:04:30

1:04:30

Nicolay here, Most AI coding tools obsess over automating everything. This conversation focuses on the right balance between human skill and AI assistance - where manual context beats web search every time. Today I have the chance to talk to Ben Holmes, a software engineer at Warp, where they're building the AI-first terminal. Manual context engineering trumps automated web search for getting accurate results from coding assistants. Key Insight Expansion The breakthrough insight is brutally practical: manual context construction consistently outperforms automated web search when working with AI coding assistants. Instead of letting your AI tool search for documentation, find the right pages yourself and feed them directly into the model's context window. Ben demonstrated this with OpenAI's Realtime API documentation - after an hour of back-and-forth with web search, he manually found the correct API signatures and saved them as a reference file. When building new features, he attached this curated documentation directly, resulting in immediate success rather than repeated failures from outdated or incorrect search results. This approach works because you can verify documentation accuracy before feeding it to the AI, while web search often returns the first result regardless of quality or recency. In the podcast, we also touch on: Why React Native might become irrelevant as AI translation between native languages improves Model-specific strengths: Gemini excels at debugging while Claude dominates f unction calling The skill of working without AI assistance - "raw dogging" code for deep learning Warp's architecture using different models for planning (O1/O3) vs. coding (Claude/Gemini) 💡 Core Concepts Manual Context Engineering: Curating documentation, diagrams, and reference materials directly rather than relying on automated web search. Model-Specific Workflows: Matching AI models to their strengths - O1 for planning, Claude for f unction calling, Gemini for debugging. Raw Dog Programming: Coding without AI assistance to build f undamental skills in codebase navigation and problem-solving. Agent Mode Architecture: Multi-model system where Claude orchestrates task distribution to specialized agents through f unction calls. 📶 Connect with Ben: Twitter/X, YouTube, Discord (Warp Community), Website 📶 Connect with Nicolay: LinkedIn, X/Twitter, Bluesky, Website, nicolay.gerold@gmail.com ⏱ Important Moments React Native's Potential O bsolescence: [08:42] AI translation between native languages could eliminate cross-platform frameworks Manual vs Automated Context: [51:42] Why manually curating documentation beats AI web search Raw Dog Programming Benefits: [12:00] Value of coding without AI assistance during Ben's first week at Warp Model-Specific Strengths: [26:00] Gemini's superior debugging vs Claude's speculative code fixes OpenAI Desktop App Advantage: [13:44] O utperforms Cursor for reading long files Warp's Multi-Model Architecture: [31:00] How Warp uses O1/O3 for planning, Claude for orchestration Function Calling Accuracy: [28:30] Claude outperforms other models at chaining f unction calls AI as Improv Partner: [56:06] Current AI says "yes and" to everything rather than pushing back 🛠 Tools & Tech Mentioned Warp Terminal, OpenAI Desktop App, Cursor, Cline, Go by Example, OpenAI Realtime API, MCP 📚 Recommended Resources Warp Discord Community, Ben's YouTube Channel, Go Programming Documentation 🔮 What's Next Next week, we continue exploring production AI implementations with more insights into getting generative AI systems deployed effectively. 💬 Join The Conversation Follow How AI Is Built on YouTube, Bluesky, or Spotify. Discord coming soon! ♻ Building the platform for engineers to share production experience. Pay it forward by sharing with one engineer facing similar challenges. ♻…

1
#052 Don't Build Models, Build Systems That Build Models 59:22

۷ weeks پیش59:22

59:22

Nicolay here, Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds. The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road. Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big." The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining. In the podcast, we also cover: Why inference, not training, is where the money is made How to rethink compute when moving from traditional cloud to serverless The economics of automated resource management Why task decomposition is the key ML engineering skill When to earn the right to fine-tune versus using foundation models *📶 Connect with Charles:* Twitter - https://twitter.com/charlesirl Modal Labs - https://modal.com Modal Slack Community - https://modal.com/slack *📶 Connect with Nicolay:* LinkedIn - https://linkedin.com/in/nicolay-gerold/ X / Twitter - https://x.com/nicolaygerold Bluesky - https://bsky.app/profile/nicolaygerold.com Website - https://nicolaygerold.com/ My Agency Aisbach - https://aisbach.com/ (for ai implementations / strategy) *⏱️ Important Moments* From CUDA to Serverless : [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure. Rethinking Scale Ambition : [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big." The Economics of Serverless : [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads. Lambda vs Modal Philosophy : [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus. Inference Economics Reality : [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions." The Open Source Commoditization : [00:14:55] How foundation models are becoming undifferentiated capabilities like databases. Task Decomposition as Core Skill : [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering. Systems That Build Models : [00:33:31] The critical difference between delivering static weights versus repeatable model production systems Earning the Right to Fine-Tune : [00:34:06] The infrastructure prerequisites needed before attempting model customization. Multi-Node Training Challenges : [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand. *🛠️ Tools & Tech Mentioned* Modal - https://modal.com (serverless GPU infrastructure) AWS Lambda - https://aws.amazon.com/lambda/ (traditional serverless) Kubernetes - https://kubernetes.io/ (container orchestration) Temporal - https://temporal.io/ (workflow orchestration) Weights & Biases - https://wandb.ai/ (experiment tracking) Hugging Face - https://huggingface.co/ (model repository) PyTorch Distributed - https://pytorch.org/tutorials/intermediate/ddp_tutorial.html (multi-GPU training) Redis - https://redis.io/ (caching and queues) *📚 Recommended Resources* Full Stack Deep Learning - https://fullstackdeeplearning.com/ (deployment best practices) Modal Documentation - https://modal.com/docs (getting started guide) Deep Seek Paper - https://arxiv.org/abs/2401.02954 (disaggregated inference patterns) AI Engineer Summit - https://ai.engineer/ (community events) MLOps Community - https://mlops.community/ (best practices) 💬 Join The Conversation Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt , Bluesky - https://bsky.app/profile/howaiisbuilt.fm , or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29 If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/ , X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com . Or at nicolay.gerold@gmail.com . I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.…

1
#051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51

۹ weeks پیش1:05:51

1:05:51

Nicolay here, Today I have the chance to talk to Charity Majors, CEO and co-founder of Honeycomb, who recently has been writing about the cost crisis in observability. "Your source of truth is production, not your IDE - and if you can't understand your code there, you're flying blind." The key insight is architecturally simple but operationally transformative: replace your 10-20 observability tools with wide structured events that capture everything about a request in one place. Most teams store the same request data across metrics, logs, traces, APM, and error tracking - creating a 20X cost multiplier while making debugging nearly impossible because you're reconstructing stories from fragments. Charity's approach flips this: instrument once with rich context, derive everything else from that single source. This isn't just about cost - it's about giving engineers the connective tissue to understand distributed systems. When you can correlate "all requests failing from Android version X in region Y using language pack Z," you find problems in minutes instead of days. The second is putting developers on call for their own code. This creates the tight feedback loop that makes engineers write more reliable software - because nobody wants to get paged at 3am for their own bugs. In the podcast, we also touch on: Why deploy time is the foundational feedback loop (15 minutes vs 15 hours changes everything) The controversial "developers on call" stance and why ops people rarely found companies How microservices made everything trace-shaped and killed traditional metrics approaches The "normal engineer" philosophy - building for 4am debugging, not peak performance AI making "code of unknown quality" the new normal Progressive deployment strategies (kibble → dogfood → production) and more 💡 Core Concepts Wide Structured Events : Capturing all request context in one instrumentation event instead of scattered log lines - enables correlation analysis that's impossible with fragmented data. Observability 2.0 : Moving from metrics-as-workhorse to structured-data-as-workhorse, where you instrument once and derive metrics/alerts/dashboards from the same rich dataset. SLO-based Alerting : Replacing symptom alerts (CPU, memory, disk) with customer-impact alerts that measure whether you're meeting promises to users. Progressive Deployment : Gradual rollout through staged environments (kibble → dogfood → production) that builds confidence without requiring 2X infrastructure. Trace-shaped Systems : Architecture pattern recognizing that distributed systems problems are fundamentally about correlating events across time and services, not isolated metrics. 📶 Connect with Charity: LinkedIn Bluesky Personal Blog Company 📶 Connect with Nicolay: LinkedIn X / Twitter Website ⏱️ Important Moments Gateway Drug to Engineering : [01:04] How IRC and bash tab completion sparked Charity's fascination with Unix command line possibilities ADHD and Incident Response : [01:54] Why high-pressure outages brought out her best work - getting "dead calm" when everything's broken Code vs. Production Reality : [02:56] Evolution from focusing on code beauty to understanding performance, behavior, and maintenance over time The Alexander's Horse Principle : [04:49] Auto-deployment as daily practice - if you grow up deploying constantly, it feels natural by the time you scale Production as Source of Truth : [06:32] Why your IDE output doesn't matter if you can't understand your code's intersection with infrastructure and users The Logging Evolution : [08:03] Moving from debugger-style spam logs to fewer, wider structured events oriented around units of work Bubble Up Anomaly Detection : [10:27] How correlating dimensions reveals that failures cluster around specific Android versions, regions, and feature combinations Everything is Trace-Shaped : [12:45] Why microservices complexity is about locating problems in distributed systems, not just identifying them AI as Acceleration of Automation : [15:57] Most AI panic could be replaced with "automation" - it's the same pattern, just faster feedback loops Non-determinism as Genuinely New : [16:51] The one aspect of AI that's actually novel in software systems, requiring new architectural patterns The Cost Crisis : [22:30] How 10-20 observability tools create unsustainable cost multipliers as businesses scale SLO Revolution : [28:40] Deleting 90% of alerts by focusing on customer impact instead of system symptoms Shrinking Feedback Loops : [34:28] Keeping deploy-to-validation under one hour so engineers can connect actions to outcomes Normal Engineer Design : [38:12] Building systems that work for tired humans at 4am, not just heroes during business hours The Instrumentation Habit : [23:15] Always looking at your code in production after deployment to build informed instincts about system behavior Progressive Deployment Strategy : [36:43] Kibble → Dog Food → Production pipeline for gradual confidence building Real Engineering Bar : [49:00] Discussion on what actually makes exceptional vs normal engineers 🛠️ Tools & Tech Mentioned Honeycomb - Observability platform for structured events OpenTelemetry - Vendor-neutral instrumentation framework IRC - Early gateway to computing Parse - Mobile backend where Honeycomb's origin story began 📚 Recommended Resources "In Praise of Normal Engineers" - Charity's blog post "How I Failed" by Tim O'Reilly "Looking at the Crux" by Richard Rumelt "Fluke" - Book about randomness in history "Engineering Management for the Rest of Us" by Sarah Dresner…

1
#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57

12 weeks پیش1:06:57

1:06:57

Nicolay here, Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production. Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant. His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use. The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why. Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of." Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself. In the podcast, we also cover: Why fine-tuning is almost always the wrong choice The "just-in-time" learning approach for staying sane in AI Building writing assistants that actually preserve your voice Why robots, not chatbots, are the real endgame 💡 Core Concepts Agentic Patterns: These patterns seem complex but are actually straightforward to implement once you understand the core loop. React : Agents that Reason, Act, and Observe in a loop Reflection : Agents that review and improve their own outputs Fine-tuning vs Base Model + Prompting: Fine-tuning involves taking a pre-trained model and training it further on your specific data. The alternative is using base models with careful prompting and context engineering. Paul's take : "Fine-tuning adds so much complexity... if you add fine-tuning to create a new feature, it's just from one day to one week." RAG: A technique where you retrieve relevant documents/information and include them in the LLM's context to generate better responses. Paul's approach : "In the beginning I also want to avoid RAG and just introduce a more guided research approach. Like I say, hey, these are the resources that I want to use in this article." 📶 Connect with Paul: LinkedIn X / Twitter Newsletter GitHub Book 📶 Connect with Nicolay: LinkedIn X / Twitter Bluesky Website My Agency Aisbach (for ai implementations / strategy) ⏱️ Important Moments From CUDA to LLMs : [02:20] Paul's journey from writing CUDA kernels and 3D object detection to modern AI applications. AI Content Is Natural Evolution : [11:19] Why AI writing tools are like the internet transition for artists - tools change, creativity remains. The Framework Trap : [36:41] "I see them as no code or low code tools... not good frameworks to build on top of." Fine-Tuning Complexity Bomb : [27:41] How fine-tuning turns 1-day features into 1-week experiments. End-to-End First : [22:44] "I don't focus on accuracy, performance, or latency initially. I just want an end-to-end process that works." The Orchestration Solution : [40:04] Why Temporal, D-Boss, and Restate beat LLM-specific orchestrators. Hype Filtering System : [54:06] Paul's approach: read about new tools, wait 2-3 months, only adopt if still relevant. Just-in-Time vs Just-in-Case : [57:50] The crucial difference between learning for potential needs vs immediate application. Robot Vision : [50:29] Why LLMs are just stepping stones to embodied AI and the unsolved challenges ahead. 🛠️ Tools & Tech Mentioned LangGraph (for prototyping only) Temporal (durable execution) DBOS (simpler orchestration) Restate (developer-friendly orchestration) Ray (distributed compute) UV (Python packaging) Prefect (workflow orchestration) 📚 Recommended Resources The Economist Style Guide (for writing) Brandon Sanderson's Writing Approach (worldbuilding first) LangGraph Academy (free, covers agent patterns) Ray Documentation (Paul's next deep dive) 🔮 What's Next Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit. 💬 Join The Conversation Follow How AI Is Built on YouTube , Bluesky , or Spotify . If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn , X , or Bluesky . Or at nicolay.gerold@gmail.com . I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️…

1
#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00

12 weeks پیش11:00

11:00

1
#049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38

13 weeks پیش1:02:38

1:02:38

Nicolay here, I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points. If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems. If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software. Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML —a DSL that treats every LLM call as a typed function. It’s like swapping duct-taped Python scripts for a purpose-built compiler. Vaibhav advocates for building first principle based primitives. One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts. Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs. We also cover: Why JSON constraints are the wrong hammer—and how Schema-Aligned Parsing fixes it Whether “durable” should be a first-class keyword (think async/await for crash-safety) Shipping multi-language AI pipelines without forcing a Python microservice Token-bloat surgery, symbol tuning, and the myth of magic prompts How to keep humans sharp when 98 % of agent outputs are already correct 💡 Core Concepts Schema-Aligned Parsing (SAP) Parse first, panic later. The model can handle Markdown, half-baked YAML, or rogue quotes—SAP puts it into your declared type or raises. No silent corruption. Symbol Tuning Labels eat up tokens and often don’t help with your accuracy (in some cases they even hurt). Rename PasswordReset to C7, keep the description human-readable. Durable Execution Durable execution refers to a computing paradigm where program execution state persists despite failures, interruptions, or crashes. It ensures that operations resume exactly where they left off, maintaining progress even when systems go down. Prompt Compression Every extra token is latency, cost, and entropy. Axe filler words until the prompt reads like assembly. If output degrades, you cut too deep—back off one line. 📶 Connect with Vaibhav: LinkedIn X / Twitter BAML 📶 Connect with Nicolay: Newsletter LinkedIn X / Twitter Bluesky Website My Agency Aisbach (for ai implementations / strategy) ⏱️ Important Moments New DSL vs. Python Glue [00:54] Why bolting yet another microservice onto your stack is cowardice; BAML compiles instead of copies. Three-Nines on Flaky Models [04:27] Designing retries, fallbacks, and human overrides when GPT eats dirt 5 % of the time. Native Go SDK & OpenAPI Fatigue [06:32] Killing thousand-line generated clients; typing go get instead. “LLM = Pure Function” Mental Model [15:58] Replace mysticism with f(input) → output; unit-test like any other function. Tool-Calling as a Switch Statement [18:19] Multi-tool orchestration boils down to switch(action) {…}—no cosmic “agent” needed. Sneak Peek—durable Keyword [24:49] Crash-safe workflows without shoving state into S3 and praying. Symbol Tuning Demo [31:35] Swapping verbose labels for C0,C1 slashes token cost and bias in one shot. Inside SAP Coercion Logic [47:31] Int arrays to ints, scalars to lists, bad casts raise—deterministic, no LLM in the loop. Frameworks vs. Primitives Rant [52:32] Why BAML ships primitives and leaves the “batteries” to you—less magic, more control. 🛠️ Tools & Tech Mentioned BAML DSL & Playground Temporal • Prefect • DBOS outlines • Instructor • LangChain 📚 Recommended Resources BAML Docs Schema-Aligned Parsing (SAP) 🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin. 💬 Join The Conversation Follow How AI Is Built on YouTube , Bluesky , or Spotify . If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn , X , or Bluesky . Or at nicolay.gerold@gmail.com . I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests: Hit subscribe right now to help me understand what content resonates with you If you found value in this post, share it with one other developer or tech professional who's working with AI That's our agreement - I deliver actionable AI insights, you help grow this. ♻️…

1
#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34

13 weeks پیش1:12:34

1:12:34

1
#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read 7:06

14 weeks پیش7:06

7:06

Nicolay here, most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos. Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”. It’s like the 10 commandments, but for building agents. One of it is “Contact human with tool calls”: the LLM can call humans for high-stakes decisions or “writes”. The key insight is brutally simple. AI can get to 90% accuracy on most tasks - good enough for spam-like activities but disastrous for anything that requires trust. The solution isn't to wait for models to get smarter; it's to add a human approval layer for critical actions. Imagine you are writing to a database or sending an email. Each “write” has to be approved by a human. So you post the email in a Slack channel and in most cases, your sales people will approve. In the 10%, it’s stopped in its tracks and the human can take over. You stop the slop and get good training data in the mean time. Dexter’s company is building exactly this: an approval mechanism that lets AI agents send requests to humans before executing. In the podcast, we also touch on a bunch of other things: MCP and that they are (atm) just a thin client Are we training LLMs toward mediocrity? What infrastructure do we need for human in the loop (e.g. DBOS)? and more 💡 Core Concepts Context Engineering : Crafting the information representation for LLMs - selecting optimal data structures, metadata, and formats to ensure models receive precisely what they need to perform effectively. Token Bloat Prevention : Ruthlessly eliminating irrelevant information from context windows to maintain agent focus during complex tasks, preventing the pattern of repeating failed approaches. Human-in-the-loop Approval Flows : Achieving 99% reliability through a "90% AI + 10% human oversight" framework where agents analyze data and suggest actions but request explicit permission before execution. Rubric Engineering : Systematically evaluating AI outputs through dimension-specific scoring criteria to provide precise feedback and identify exceptional results, helping escape the trap of models converging toward mediocrity. 📶 Connect with Dexter: LinkedIn X / Twitter Company 📶 Connect with Nicolay: LinkedIn X / Twitter Bluesky Website My Agency Aisbach (for ai implementations / strategy) ⏱️ Important Moments MCP Servers as Clients : [03:07] Dexter explains why what many call "MCP servers" actually function more like clients when examining the underlying code. Authentication Challenges : [04:45] The discussion shifts to how authentication should be handled in MCP implementations and whether it belongs in the protocol. Asynchronous Agent Execution : [08:18] Exploring how to handle agents that need to pause for human input without wasting tokens on continuous polling. Token Bloat Prevention : [14:41] Strategies for keeping context windows focused and efficient, moving beyond standard chat formats. Context Engineering : [29:06] The concept that everything in AI agent development ultimately comes down to effective context engineering. Fine-tuning vs. RAG for Writing Style : [20:05] Contrasting personal writing style fine-tuning versus context window examples. Generating Options vs. Deterministic Outputs : [19:44] The unexplored potential of having AI generate diverse creative options for human selection. The "Mediocrity Convergence" Question : [37:11] The philosophical concern that popular LLMs may inevitably trend toward average quality. Data Labeling Interfaces : [35:25] Discussion about the need for better, lower-friction interfaces to collect human feedback on AI outputs. Human-in-the-loop Approval Flows : [42:46] The core approach of HumanLayer, allowing agents to ask permission before taking action. 🛠️ Tools & Tech Mentioned MCP OpenControl DBOS Temporal Cursor 📚 Recommended Resources 12 Factor Agents BAML Docs Rubric Engineering 🔮 What's Next Next week, we will continue going more into getting generative AI into production talking to Vibhav from BAML. 💬 Join The Conversation Follow How AI Is Built on YouTube , Bluesky , or Spotify . If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn , X , or Bluesky . Or at nicolay.gerold@gmail.com . I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that. ♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. I am trying to produce the best content possible - informative, actionable, and engaging. I'm asking for two things: hit subscribe now to show me what content you like (so I can do more of it), and if this episode helped you, pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️…

42 weeks پیش54:46

54:46

Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer. While vector search generates buzz, hybrid approaches combining traditional text search with vector capabilities yield better results. The architecture typically splits into three core components: ingestion/indexing (requiring decisions between batch vs streaming) query processing (balancing understanding vs performance) analytics/feedback loops for continuous improvement. Critical but often overlooked aspects include query understanding depth, systematic relevancy testing (avoid anecdote-driven development), and data governance as search systems naturally evolve into organizational data hubs. Performance optimization requires careful tradeoffs between index-time vs query-time computation, with even 1-2% improvements being significant in mature systems. Success requires testing against production data (staging environments prove unreliable), implementing proper evaluation infrastructure (golden query sets, A/B testing, interleaving), and avoiding the local maxima trap where improving one query set unknowingly damages others. The end goal is finding an acceptable balance between corpus size, latency requirements, and cost constraints while maintaining system manageability and relevance quality. "It's quite easy to end up in local maxima, whereby you improve a query for one set and then you end up destroying it for another set." "A good marker of a sophisticated system is one where you actually see it's getting worse... you might be discovering a maxima." "There's no free lunch in all of this. Often it's a case that, to service billions of documents on a vector search, less than 10 millis, you can do those kinds of things. They're just incredibly expensive. It's really about trying to manage all of the overall system to find what is an acceptable balance." Search Pioneers: Website GitHub Stuart Cam: LinkedIn Russ Cam: Github LinkedIn X (Twitter) Nicolay Gerold: ⁠LinkedIn⁠ ⁠X (Twitter) 00:00 Introduction to Search Systems 00:13 Challenges in Search: Relevancy vs Latency 00:27 Insights from Industry Experts 01:00 Evolution of Search Technologies 03:16 Storage and Compute in Search Systems 06:22 Common Mistakes in Building Search Systems 09:10 Evaluating and Improving Search Systems 19:27 Architectural Components of Search Systems 29:17 Understanding Search Query Expectations 29:39 Balancing Speed, Cost, and Corpus Size 32:03 Trade-offs in Search System Design 32:53 Indexing vs Querying: Key Considerations 35:28 Re-ranking and Personalization Challenges 38:11 Evaluating Search System Performance 44:51 Overrated vs Underrated Search Techniques 48:31 Final Thoughts and Contact Information…

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.