51 subscribers
با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده
Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241
Manage episode 424230463 series 3241972
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // MLOps podcast #241 with Boris Selitser, Co-Founder and CTO/CPO of Okareo. A big thank you to LatticeFlow for sponsoring this episode! LatticeFlow - https://latticeflow.ai/ // Abstract Explore the evolving landscape of building LLM applications, focusing on the critical roles of synthetic data and agent evaluations. Discover how synthetic data enhances model behavior description, prototyping, testing, and fine-tuning, driving robustness in LLM applications. Learn about the latest methods for evaluating complex agent-based systems, including RAG-based evaluations, dialog-level assessments, simulated user interactions, and adversarial models. This talk delves into the specific challenges developers face and the tradeoffs involved in each evaluation approach, providing practical insights for effective AI development. // Bio Boris is the Co-Founder and CTO/CPO at Okareo. Okareo is a full-cycle platform for developers to evaluate and customize AI/LLM applications. Before Okareo, Boris was Director of Product at Meta/Facebook, leading teams building internal platforms and ML products. Examples include a copyright classification system across the Facebook apps and an engagement platform for over 200K developers, 500K+ creators, and 12M+ Oculus users. Boris has a bachelor’s in Computer Science from UC Berkeley. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://docs.okareo.com/blog/data_loop https://docs.okareo.com/blog/agent_eval The Real E2E RAG Stack // Sam Bean // MLOps Podcast #217 - https://youtu.be/8uZst7pgOw0
RecSys at Spotify // Sanket Gupta // MLOps Podcast #232 - https://youtu.be/byH-ARJA4gk --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Boris on LinkedIn: https://www.linkedin.com/in/selitser/
Timestamps: [00:00] Boris' preferred coffee [00:37] Takeaways [02:32] Please like, share, leave a review, and subscribe to our MLOps channels! [02:48] Software Engineering and Data Science [06:01] AI Transformative Potential Explained [10:31] Prompt Injection Protection Strategies [17:03] Agent's metrics for Jira [24:11] Data and Metrics Evolution [27:54] Evaluation Focus Enhances Systems [31:22 - 32:52] LatticeFlow AD [32:55] Custom Evaluation and Synthetic Data [36:23] Synthetic data for expansion, evaluation, and map [41:06] Diverse agents' personalities for readiness [44:25] Agent functions [46:17] Optimizing Routing Agents [50:04] Adapting to tool output for decision-making [52:56] Agent framework evolution [55:41] Agent framework for delivering value [57:03] Wrap up
434 قسمت
Manage episode 424230463 series 3241972
Join us at our first in-person conference on June 25 all about AI Quality: https://www.aiqualityconference.com/ Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // MLOps podcast #241 with Boris Selitser, Co-Founder and CTO/CPO of Okareo. A big thank you to LatticeFlow for sponsoring this episode! LatticeFlow - https://latticeflow.ai/ // Abstract Explore the evolving landscape of building LLM applications, focusing on the critical roles of synthetic data and agent evaluations. Discover how synthetic data enhances model behavior description, prototyping, testing, and fine-tuning, driving robustness in LLM applications. Learn about the latest methods for evaluating complex agent-based systems, including RAG-based evaluations, dialog-level assessments, simulated user interactions, and adversarial models. This talk delves into the specific challenges developers face and the tradeoffs involved in each evaluation approach, providing practical insights for effective AI development. // Bio Boris is the Co-Founder and CTO/CPO at Okareo. Okareo is a full-cycle platform for developers to evaluate and customize AI/LLM applications. Before Okareo, Boris was Director of Product at Meta/Facebook, leading teams building internal platforms and ML products. Examples include a copyright classification system across the Facebook apps and an engagement platform for over 200K developers, 500K+ creators, and 12M+ Oculus users. Boris has a bachelor’s in Computer Science from UC Berkeley. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links https://docs.okareo.com/blog/data_loop https://docs.okareo.com/blog/agent_eval The Real E2E RAG Stack // Sam Bean // MLOps Podcast #217 - https://youtu.be/8uZst7pgOw0
RecSys at Spotify // Sanket Gupta // MLOps Podcast #232 - https://youtu.be/byH-ARJA4gk --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Boris on LinkedIn: https://www.linkedin.com/in/selitser/
Timestamps: [00:00] Boris' preferred coffee [00:37] Takeaways [02:32] Please like, share, leave a review, and subscribe to our MLOps channels! [02:48] Software Engineering and Data Science [06:01] AI Transformative Potential Explained [10:31] Prompt Injection Protection Strategies [17:03] Agent's metrics for Jira [24:11] Data and Metrics Evolution [27:54] Evaluation Focus Enhances Systems [31:22 - 32:52] LatticeFlow AD [32:55] Custom Evaluation and Synthetic Data [36:23] Synthetic data for expansion, evaluation, and map [41:06] Diverse agents' personalities for readiness [44:25] Agent functions [46:17] Optimizing Routing Agents [50:04] Adapting to tool output for decision-making [52:56] Agent framework evolution [55:41] Agent framework for delivering value [57:03] Wrap up
434 قسمت
همه قسمت ها
×

1 Making AI Reliable is the Greatest Challenge of the 2020s // Alon Bochman // #312 1:01:37

1 Behavior Modeling, Secondary AI Effects, Bias Reduction & Synthetic Data // Devansh Devansh // #311 1:01:35

1 GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, & Visual Analytics // Paco Nathan & Weidong Yang // #310 1:14:01

1 I Am Once Again Asking "What is MLOps?" // Oleksandr Stasyk // #308 1:07:22

1 Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306 1:02:23

1 We're All Finetuning Incorrectly // Tanmay Chopra // #304 1:00:30






1 From Rules to Reasoning Engines // George Mathew // #296 1:05:26

1 GenAI Traffic: Why API Infrastructure Must Evolve... Again // Erica Hughberg // #296 1:06:24

1 Future of Software, Agents in the Enterprise, and Inception Stage Company Building // Eliot Durbin // #293 54:26

1 The Agent Landscape - Lessons Learned Putting Agents Into Production 1:08:40

1 Evolving Workflow Orchestration // Alex Milowski // #291 1:14:34




1 Navigating Machine Learning Careers: Insights from Meta to Consulting // Ilya Reznik // #286 1:00:36


1 Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282 1:05:20


1 Unleashing Unconstrained News Knowledge Graphs to Combat Misinformation // Robert Caulk // #279 1:15:24

1 AI-Driven Code: Navigating Due Diligence & Transparency in MLOps // Matt van Itallie // #275 57:01


1 LLMs to agents: The Beauty & Perils of Investing in GenAI // VC Panel // Agents in Production 33:24



1 The Impact of UX Research in the AI Space // Lauren Kaplan // #272 1:08:19


1 Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270 55:18

1 How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269 1:01:42

1 The AI Dream Team: Strategies for ML Recruitment and Growth // Jelmer Borst and Daniela Solis // #267 58:42

1 Unpacking 3 Types of Feature Stores // Simba Khadder // #265 1:07:42

1 Who's MLOps for Anyway? // Jonathan Rioux // #261 1:10:14


1 Building in Production Human-centred GenAI Solutions // Mohamed Abusaid & Mara Pometti// #177 1:02:42

1 MLOps for GenAI Applications // Harcharan Kabbay // #256 1:07:18

1 Design and Development Principles for LLMOps // Andy McMahon // #254 1:10:17


1 Red Teaming LLMs // Ron Heichman // #252 1:09:52



1 Evaluating the Effectiveness of Large Language Models: Challenges and Insights // Aniket Singh // #248 35:40

1 Extending AI: From Industry to Innovation // Sophia Rowland & David Weik // #247 1:01:36

1 All Data Scientists Should Learn Software Engineering Principles // Catherine Nelson // #245 52:54


1 Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241 57:21

1 How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable 56:37

1 Managing Small Knowledge Graphs for Multi-agent Systems // Tom Smoker // #236 1:04:40

1 Just when we Started to Solve Software Docs, AI Blew Everything Up // Dave Nunez // #235 1:01:38




1 Data Engineering in the Federal Sector // Shane Morris // #223 1:03:22

1 What Business Stakeholders Want to See from the ML Teams // Peter Guagenti // #222 1:21:27

1 MLOps - Design Thinking to Build ML Infra for ML and LLM Use Cases // Amritha Arun Babu & Abhik Choudhury // #221 1:00:17

1 4 Years of the MLOps Community // Demetrios Brinkmann // #220 1:04:29

1 The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219 1:15:11

1 [Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case 59:00

1 The Real E2E RAG Stack // Sam Bean, Rewind AI // #217 1:10:06


1 Becoming an AI Evangelist // Alex Volkov // #215 1:14:31


1 Data Governance and AI // Alexandra Diem // #212 1:05:45

1 Powering MLOps: The Story of Tecton's Rift // Matt Bleifer & Mike Eastham // #209 1:03:57


1 The Myth of AI Breakthroughs // Jonathan Frankle // #205 1:10:02

1 Pioneering AI Models for Regional Languages // Aleksa Gordić // #203 1:04:23

1 Small Data, Big Impact: The Story Behind DuckDB // Hannes Mühleisen & Jordan Tigani // #202 1:08:34

1 Language, Graphs, and AI in Industry // Paco Nathan // #201 1:18:28


1 The Role of Infrastructure in ML // Niels Bantilan // #197 1:05:24

1 LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196 55:14


1 Building the Future of AI in Software Development // Varun Mohan // #195 1:04:34

1 DSPy: Transforming Language Model Calls into Smart Pipelines // Omar Khattab // #194 1:05:39


1 Enterprises Using MLOps, the Changing LLM Landscape, MLOps Pipelines // Chris Van Pelt // #192 47:49

1 Building Defensible AI Apps // Gregory Kamradt // #191 1:05:33

1 Guarding LLM and NLP APIs: A Trailblazing Odyssey for Enhanced Security // Ads Dawson // #190 59:40

1 Designing for Forward Compatibility in Gen AI // Rohit Agarwal // #189 1:00:17


1 The Future of Feature Stores and Platforms // Mike Del Balso & Josh Wills // # 186 1:11:14

1 Lessons on Data Science Leadership // Luigi Patruno // #185 1:13:31

1 Data Platforms in MLOps: Translating Business Goals into Product Decisions // Richa Sachdev // #184 42:48

1 MLOps@GetYourGuide // Jean Machado, Meghana Satish, Olivia Houghton, Theodore Meynard// #182 1:03:52

1 The Centralization of Power in AI // Kyle Harrison // # 181 1:01:34

1 Adventures in Building CLIP & Other (Largeish) LMs // Sachin Abeywardana // #180 1:06:37


1 Harnessing MLOps in Finance // Michelle Marie Conway // MLOps Podcast Coffee #174 1:05:15


1 Building Cody, an Open Source AI Coding Assistant // Beyang Liu // MLOps Podcast #173 1:02:12

1 FrugalGPT: Better Quality and Lower Cost for LLM Applications // Lingjiao Chen // MLOps Podcast #172 1:02:58


1 All the Hard Stuff with LLMs in Product Development // Phillip Carter // MLOps Podcast #170 1:01:03


1 Treating Prompt Engineering More Like Code // Maxime Beauchemin // MLOps Podcast #167 1:14:17

1 Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166 50:37

1 Python Power: How Daft Embeds Models and Revolutionizes Data Processing // Sammy Sidhu // MLOps Podcast #165 51:29

1 Open Source and Fast Decision Making // Rob Hirschfeld // MLOps Podcast #164 1:00:01

1 From Arduinos to LLMs: Exploring the Spectrum of ML // Soham Chatterjee // MLOps Podcast #162 44:49

1 Why is MLOps Hard in an Enterprise? // Maria Vechtomova & Basak Eskili // MLOps Podcast #159 55:05

1 Large Language Models at Cohere // Nils Reimers // MLOps Podcast #158 1:14:52



1 The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155 58:12

1 ML Scalability Challenges // Waleed Kadous // MLOps Podcast # 154 1:00:40

1 Multilingual Programming and a Project Structure to Enable It // Rodolfo Núñez // MLOps Podcast #153 1:00:22

1 [Bonus Episode] Practical AI x MLOps // Demetrios Brinkmann, Mihail Eric, Daniel Whitenack and Chris Benson 58:44

1 How A Manager Became a Believer in DevOps for Machine Learning // Keith Trnka // MLOps Podcast #152 56:20

1 ML in Production: A DS from Ubisoft Perspective // Jean-Michel Daignan // MLOps Podcast #151 50:20

1 The Future of Search in the Era of Large Language Models // Saahil Jain // MLOps Podcast #150 50:48




1 Investing in the Next Generation of AI & ML // Jill Chase & Manmeet Gujral // MLOps Podcast #143 41:08


1 Foundational Models are the Future but... with Alex Ratner CEO of Snorkel AI // MLOps Podcast #139 52:27

1 Machine Learning Operations — What is it and Why Do We Need It? // Niklas Kühl // MLOps Podcast #137 59:24


1 "Real-Time" ML: Features and Inference // Sasha Ovsankin and Rupesh Gupta // MLOps Podcast #135 52:13

1 Building Threat Detection Systems: An MLE's Perspective // Jeremy Jordan // MLOps Podcast #134 50:27

1 What is Data / ML Like on League? // Ian Schweer // MLOps Coffee Sessions #132 1:00:56



1 Reliable ML // Niall Murphy & Todd Underwood // Coffee Sessions #127 1:02:24


1 Feature Stores at Shopify and Skyscanner // Matt Delacour and Mike Moran // Reading Group #4 49:35



1 The Journey from Data Scientist to MLOps Engineer // Ale Solano // MLOps Coffee Sessions #80 41:32


1 Calibration for ML at Etsy - apply() special // Erica Greene and Seoyoon Park // MLOps Coffee Sessions #78 49:36

1 Data Mesh - The Data Quality Control Mechanism for MLOps? // Scott Hirleman // MLOps Coffee Sessions #77 57:06

1 Build a Culture of ML Testing and Model Quality // Mohamed Elgendy // MLOps Coffee Sessions #76 51:13


1 On Structuring an ML Platform 1 Pizza Team //Breno Costa & Matheus Frata //MLOps Coffee Sessions #73 52:53

1 2021 MLOps Year in Review // Vishnu Rachakonda and Demetrios Brinkmann // MLOps Coffee Sessions #72 51:31

1 Setting up an ML Platform on GCP: Lessons Learned // Mefta Sadat // MLOps Coffee Sessions #71 40:04


1 Monitoring Unstructured Data // Aparna Dhinakaran & Jason Lopatecki // Lightning Sessions #2 10:37

1 RECOMMENDER SYSTEM: Why They Update Models 100 Times a Day // Gleb Abroskin // MLOps Coffee Sessions #123 52:08


1 Bringing DevOps Agility to ML// Luis Ceze // Coffee Sessions #121 1:04:35




1 Product Enrichment and Recommender Systems // Marc Lindner and Amr Mashlah // Coffee Sessions #114 56:36

1 Building Better Data Teams // Leanne Fitzpatrick // Coffee Sessions #113 1:02:01


1 More than a Cache: Turning Redis into a Composable, ML Data Platform // Samuel Partee // Coffee Sessions #111 48:33

1 Why You Need More Than Airflow // Ketan Umare // Coffee Sessions #109 1:11:36

1 ML Flow vs Kubeflow 2022 // Byron Allen // Coffee Sessions #108 1:06:01


1 Building a Culture of Experimentation to Speed Up Data-Driven Value // Delina Ivanova // MLOps Coffee Sessions #106 54:06

1 Cleanlab: Labeled Datasets that Correct Themselves Automatically // Curtis Northcutt // MLOps Coffee Sessions #105 1:06:10

1 Making MLFlow // Lead MLFlow Maintainer Corey Zumar // MLOps Coffee Sessions #103 1:04:45

1 Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101 58:46


1 CPU vs GPU // Ronen Dar & Gijsbert Janssen van Doorn // MLOps Coffee Sessions #99 1:03:57

1 Racing the Playhead: Real-time Model Inference in a Video Streaming Environment // Brannon Dorsey // Coffee Sessions #98 58:04

1 Real-Time Exactly-Once Event Processing with Apache Flink, Kafka, and Pinot //Jacob Tsafatinos // MLOps Coffee Sessions #97 54:12



1 Traversing the Data Maturity Spectrum: A Startup Perspective // Mark Freeman // Coffee Sessions #94 46:03

1 Model Monitoring in Practice: Top Trends // Krishnaram Kenthapadi // MLOps Coffee Sessions #93 51:36

1 Building the World's First Data Engineering Conference // Pete Soderling // MLOps Coffee Sessions #92 41:59

1 The Shipyard: Lessons Learned While Building an ML Platform / Automating Adherence // Joseph Haaga // Coffee Sessions #91 39:59


1 ML Platform Tradeoffs and Wondering Why to Use Them // Javier Mansilla // MLOps Coffee Sessions #88 53:59

1 Don't Listen Unless You Are Going to Do ML in Production // Kyle Morris // MLOps Coffee Sessions #87 51:33

1 Building ML/Data Platform on Top of Kubernetes // Julien Bisconti // MLOps Coffee Sessions #86 48:15

1 Continuous Deployment of Critical ML Applications // Emmanuel Ameisen // MLOps Coffee Sessions #85 44:38




1 Wikimedia MLOps // Chris Albon // Coffee Sessions #68 1:05:49

1 ML Stepping Stones: Challenges & Opportunities for Companies // John Crousse // Coffee Sessions #67 47:58

1 Machine Learning at Reasonable Scale // Jacopo Tagliabue // MLOps Coffee Sessions #66 1:04:43



1 The Future of AI and ML in Process Automation // Slater Victoroff // MLOps Coffee Sessions #64 58:01


1 Data Selection for Data-Centric AI: Data Quality Over Quantity // Cody Coleman // Coffee Sessions #59 1:11:00


1 The Future of ML and Data Platforms // Michael Del Balso - Erik Bernhardsson // Coffee Sessions #57 55:27
به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.