51 subscribers
با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده
Evaluating the Effectiveness of Large Language Models: Challenges and Insights // Aniket Singh // #248
Manage episode 429192105 series 3241972
Aniket Kumar Singh is a Vision Systems Engineer at Ultium Cells, skilled in Machine Learning and Deep Learning. I'm also engaged in AI research, focusing on Large Language Models (LLMs). Evaluating the Effectiveness of Large Language Models: Challenges and Insights // MLOps Podcast #248 with Aniket Kumar Singh, CTO @ MyEvaluationPal | ML Engineer @ Ultium Cells. // Abstract Dive into the world of Large Language Models (LLMs) like GPT-4. Why is it crucial to evaluate these models, how we measure their performance, and the common hurdles we face? Drawing from Aniket's research, he shares insights on the importance of prompt engineering and model selection. Aniket also discusses real-world applications in healthcare, economics, and education, and highlights future directions for improving LLMs. // Bio Aniket is a Vision Systems Engineer at Ultium Cells, skilled in Machine Learning and Deep Learning. I'm also engaged in AI research, focusing on Large Language Models (LLMs). // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: www.aniketsingh.me Aniket's AI Research for Good blog that I plan to utilize to share any new research that would focus on the good: www.airesearchforgood.org Aniket's papers: https://scholar.google.com/citations?user=XHxdWUMAAAAJ&hl=en --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Aniket on LinkedIn: https://www.linkedin.com/in/singh-k-aniket/ Timestamps: [00:00] Aniket's preferred coffee [00:14] Takeaways [01:29] Aniket's job and hobby [03:06] Evaluating LLMs: Systems-Level Perspective [05:55] Rule-based system [08:32] Evaluation Focus: Model Capabilities [13:04] LLM Confidence [13:56] Problems with LLM Ratings [17:17] Understanding AI Confidence Trends [18:28] Aniket's papers [20:40] Testing AI Awareness [24:36] Agent Architectures Overview [27:05] Leveraging LLMs for tasks [29:53] Closed systems in Decision-Making [31:28] Navigating model Agnosticism [33:47] Robust Pipeline vs Robust Prompt [34:40] Wrap up
427 قسمت
Manage episode 429192105 series 3241972
Aniket Kumar Singh is a Vision Systems Engineer at Ultium Cells, skilled in Machine Learning and Deep Learning. I'm also engaged in AI research, focusing on Large Language Models (LLMs). Evaluating the Effectiveness of Large Language Models: Challenges and Insights // MLOps Podcast #248 with Aniket Kumar Singh, CTO @ MyEvaluationPal | ML Engineer @ Ultium Cells. // Abstract Dive into the world of Large Language Models (LLMs) like GPT-4. Why is it crucial to evaluate these models, how we measure their performance, and the common hurdles we face? Drawing from Aniket's research, he shares insights on the importance of prompt engineering and model selection. Aniket also discusses real-world applications in healthcare, economics, and education, and highlights future directions for improving LLMs. // Bio Aniket is a Vision Systems Engineer at Ultium Cells, skilled in Machine Learning and Deep Learning. I'm also engaged in AI research, focusing on Large Language Models (LLMs). // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: www.aniketsingh.me Aniket's AI Research for Good blog that I plan to utilize to share any new research that would focus on the good: www.airesearchforgood.org Aniket's papers: https://scholar.google.com/citations?user=XHxdWUMAAAAJ&hl=en --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Aniket on LinkedIn: https://www.linkedin.com/in/singh-k-aniket/ Timestamps: [00:00] Aniket's preferred coffee [00:14] Takeaways [01:29] Aniket's job and hobby [03:06] Evaluating LLMs: Systems-Level Perspective [05:55] Rule-based system [08:32] Evaluation Focus: Model Capabilities [13:04] LLM Confidence [13:56] Problems with LLM Ratings [17:17] Understanding AI Confidence Trends [18:28] Aniket's papers [20:40] Testing AI Awareness [24:36] Agent Architectures Overview [27:05] Leveraging LLMs for tasks [29:53] Closed systems in Decision-Making [31:28] Navigating model Agnosticism [33:47] Robust Pipeline vs Robust Prompt [34:40] Wrap up
427 قسمت
Alle afleveringen
×
1 Agents of Innovation: AI-Powered Product Ideation with Synthetic Consumer Testing // Luca Fiaschi // #306 1:02:23

1 We're All Finetuning Incorrectly // Tanmay Chopra // #304 1:00:30






1 From Rules to Reasoning Engines // George Mathew // #296 1:05:26

1 GenAI Traffic: Why API Infrastructure Must Evolve... Again // Erica Hughberg // #296 1:06:24

1 Future of Software, Agents in the Enterprise, and Inception Stage Company Building // Eliot Durbin // #293 54:26

1 The Agent Landscape - Lessons Learned Putting Agents Into Production 1:08:40

1 Look At Your ****ing Data 👀 // Kenny Daniel // #292 1:07:43

1 Evolving Workflow Orchestration // Alex Milowski // #291 1:14:34




1 Navigating Machine Learning Careers: Insights from Meta to Consulting // Ilya Reznik // #286 1:00:36


1 Machine Learning, AI Agents, and Autonomy // Egor Kraev // #282 1:05:20


1 Unleashing Unconstrained News Knowledge Graphs to Combat Misinformation // Robert Caulk // #279 1:15:24

1 AI-Driven Code: Navigating Due Diligence & Transparency in MLOps // Matt van Itallie // #275 57:01


1 LLMs to agents: The Beauty & Perils of Investing in GenAI // VC Panel // Agents in Production 33:24



1 The Impact of UX Research in the AI Space // Lauren Kaplan // #272 1:08:19


1 Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // Bernie Wu // #270 55:18

1 How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269 1:01:42

1 The AI Dream Team: Strategies for ML Recruitment and Growth // Jelmer Borst and Daniela Solis // #267 58:42

1 Unpacking 3 Types of Feature Stores // Simba Khadder // #265 1:07:42

1 Who's MLOps for Anyway? // Jonathan Rioux // #261 1:10:14


1 Building in Production Human-centred GenAI Solutions // Mohamed Abusaid & Mara Pometti// #177 1:02:42

1 MLOps for GenAI Applications // Harcharan Kabbay // #256 1:07:18

1 Design and Development Principles for LLMOps // Andy McMahon // #254 1:10:17


1 Red Teaming LLMs // Ron Heichman // #252 1:09:52



1 Evaluating the Effectiveness of Large Language Models: Challenges and Insights // Aniket Singh // #248 35:40

1 Extending AI: From Industry to Innovation // Sophia Rowland & David Weik // #247 1:01:36

1 All Data Scientists Should Learn Software Engineering Principles // Catherine Nelson // #245 52:54


1 Navigating the AI Frontier: The Power of Synthetic Data and Agent Evaluations in LLM Development // Boris Selitser // #241 57:21

1 How to Build Production-Ready AI Models for Manufacturing // [Exclusive] LatticeFlow Roundtable 56:37

1 Managing Small Knowledge Graphs for Multi-agent Systems // Tom Smoker // #236 1:04:40

1 Just when we Started to Solve Software Docs, AI Blew Everything Up // Dave Nunez // #235 1:01:38

1 MLOps - Design Thinking to Build ML Infra for ML and LLM Use Cases // Amritha Arun Babu & Abhik Choudhury // #221 1:00:17

1 4 Years of the MLOps Community // Demetrios Brinkmann // #220 1:04:29

1 The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219 1:15:11

1 [Exclusive] Zilliz Roundtable // Why Purpose-built Vector Databases Matter for Your Use Case 59:00

1 The Real E2E RAG Stack // Sam Bean, Rewind AI // #217 1:10:06


1 Becoming an AI Evangelist // Alex Volkov // #215 1:14:31


1 Data Governance and AI // Alexandra Diem // #212 1:05:45

1 Powering MLOps: The Story of Tecton's Rift // Matt Bleifer & Mike Eastham // #209 1:03:57


1 The Myth of AI Breakthroughs // Jonathan Frankle // #205 1:10:02

1 Pioneering AI Models for Regional Languages // Aleksa Gordić // #203 1:04:23

1 Small Data, Big Impact: The Story Behind DuckDB // Hannes Mühleisen & Jordan Tigani // #202 1:08:34

1 Language, Graphs, and AI in Industry // Paco Nathan // #201 1:18:28


1 The Role of Infrastructure in ML // Niels Bantilan // #197 1:05:24

1 LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196 55:14




1 Data Engineering in the Federal Sector // Shane Morris // #223 1:03:22

1 What Business Stakeholders Want to See from the ML Teams // Peter Guagenti // #222 1:21:27


1 Building the Future of AI in Software Development // Varun Mohan // #195 1:04:34

1 DSPy: Transforming Language Model Calls into Smart Pipelines // Omar Khattab // #194 1:05:39


1 Enterprises Using MLOps, the Changing LLM Landscape, MLOps Pipelines // Chris Van Pelt // #192 47:49

1 Building Defensible AI Apps // Gregory Kamradt // #191 1:05:33

1 Guarding LLM and NLP APIs: A Trailblazing Odyssey for Enhanced Security // Ads Dawson // #190 59:40

1 Designing for Forward Compatibility in Gen AI // Rohit Agarwal // #189 1:00:17


1 The Future of Feature Stores and Platforms // Mike Del Balso & Josh Wills // # 186 1:11:14

1 FrugalGPT: Better Quality and Lower Cost for LLM Applications // Lingjiao Chen // MLOps Podcast #172 1:02:58


1 All the Hard Stuff with LLMs in Product Development // Phillip Carter // MLOps Podcast #170 1:01:03


1 Treating Prompt Engineering More Like Code // Maxime Beauchemin // MLOps Podcast #167 1:14:17

1 Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166 50:37

1 Python Power: How Daft Embeds Models and Revolutionizes Data Processing // Sammy Sidhu // MLOps Podcast #165 51:29

1 Open Source and Fast Decision Making // Rob Hirschfeld // MLOps Podcast #164 1:00:01

1 From Arduinos to LLMs: Exploring the Spectrum of ML // Soham Chatterjee // MLOps Podcast #162 44:49

1 Why is MLOps Hard in an Enterprise? // Maria Vechtomova & Basak Eskili // MLOps Podcast #159 55:05

1 Large Language Models at Cohere // Nils Reimers // MLOps Podcast #158 1:14:52



1 The Birth and Growth of Spark: An Open Source Success Story // Matei Zaharia // MLOps Podcast #155 58:12

1 ML Scalability Challenges // Waleed Kadous // MLOps Podcast # 154 1:00:40

1 Multilingual Programming and a Project Structure to Enable It // Rodolfo Núñez // MLOps Podcast #153 1:00:22

1 [Bonus Episode] Practical AI x MLOps // Demetrios Brinkmann, Mihail Eric, Daniel Whitenack and Chris Benson 58:44

1 How A Manager Became a Believer in DevOps for Machine Learning // Keith Trnka // MLOps Podcast #152 56:20

1 ML in Production: A DS from Ubisoft Perspective // Jean-Michel Daignan // MLOps Podcast #151 50:20

1 The Future of Search in the Era of Large Language Models // Saahil Jain // MLOps Podcast #150 50:48

1 Lessons on Data Science Leadership // Luigi Patruno // #185 1:13:31

1 Data Platforms in MLOps: Translating Business Goals into Product Decisions // Richa Sachdev // #184 42:48

1 MLOps@GetYourGuide // Jean Machado, Meghana Satish, Olivia Houghton, Theodore Meynard// #182 1:03:52

1 The Centralization of Power in AI // Kyle Harrison // # 181 1:01:34

1 Adventures in Building CLIP & Other (Largeish) LMs // Sachin Abeywardana // #180 1:06:37


1 Harnessing MLOps in Finance // Michelle Marie Conway // MLOps Podcast Coffee #174 1:05:15


1 Building Cody, an Open Source AI Coding Assistant // Beyang Liu // MLOps Podcast #173 1:02:12




1 Investing in the Next Generation of AI & ML // Jill Chase & Manmeet Gujral // MLOps Podcast #143 41:08


1 Foundational Models are the Future but... with Alex Ratner CEO of Snorkel AI // MLOps Podcast #139 52:27

1 Machine Learning Operations — What is it and Why Do We Need It? // Niklas Kühl // MLOps Podcast #137 59:24


1 "Real-Time" ML: Features and Inference // Sasha Ovsankin and Rupesh Gupta // MLOps Podcast #135 52:13
به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.