Artwork

محتوای ارائه شده توسط Demetrios. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Demetrios یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Holistic Evaluation of Generative AI Systems // Jineet Doshi // #280

57:33
 
اشتراک گذاری
 

Manage episode 457198015 series 3241972
محتوای ارائه شده توسط Demetrios. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Demetrios یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Jineet Doshi is an award-winning Scientist, Machine Learning Engineer, and Leader at Intuit with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine-learning models from design to production across various domains, which have impacted 100 million customers and significantly improved business metrics, leading to millions of dollars of impact.

Holistic Evaluation of Generative AI Systems // MLOps Podcast #280 with Jineet Doshi, Staff AI Scientist or AI Lead at Intuit.

// Abstract

Evaluating LLMs is essential in establishing trust before deploying them to production. Even post-deployment, evaluation is essential to ensure LLM outputs meet expectations, making it a foundational part of LLMOps. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks, such as writing poems, Q&A, summarization, etc. This leads to the question of how do you evaluate a system with such broad intelligence capabilities? This talk covers the various approaches for evaluating LLMs, such as classic NLP techniques, red teaming, and newer ones like using LLMs as a judge, along with the pros and cons of each. The talk includes an evaluation of complex GenAI systems like RAG and Agents. It also covers evaluating LLMs for safety and security, and the need to have a holistic approach for evaluating these very capable models.

// Bio

Jineet Doshi is an award-winning AI Lead and Engineer with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine learning models from design to production across various domains, which have impacted millions of customers and have significantly improved business metrics, leading to millions of dollars of impact. He is currently an AI Lead at Intuit, where he is one of the architects and developers of their Generative AI platform, which is serving Generative AI experiences for more than 100 million customers around the world. Jineet is also a guest lecturer at Stanford University as part of their building LLM Applications class. He is on the Advisory Board of the University of San Francisco’s AI Program. He holds multiple patents in the field, is on the steering committee of MLOps World Conference, and has also co-chaired workshops at top AI conferences like KDD. He holds a Master's degree from Carnegie Mellon University.

// MLOps Swag/Merch

https://shop.mlops.community/

// Related Links

Website: https://www.intuit.com/

--------------- ✌️Connect With Us ✌️ -------------

Join our Slack community: https://go.mlops.community/slack

Follow us on Twitter: @mlopscommunity

Sign up for the next meetup: https://go.mlops.community/register

Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/

Connect with Jineet on LinkedIn: https://www.linkedin.com/in/jineetdoshi/

Timestamps:[00:00] Jineet's preferred coffee[00:20] Takeaways[01:24] Please like, share, leave a review, and subscribe to our MLOps channels![01:36] LLM evaluation at scale[03:13] Challenges in GenAI evaluation[08:09] Eval products vs platforms[09:28] Evaluation methods for models[14:03] NLP evaluation techniques[25:06] LLM as a judge/jury[31:56] LLMs and pizza brainstorming[34:07] Cost per answer breakdown[38:29] Evaluating RAG systems[44:00] Testing with LLMs and humans[49:23] Evaluating AI use cases[54:19] AI workflow stress testing[55:40] Wrap up

  continue reading

473 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 457198015 series 3241972
محتوای ارائه شده توسط Demetrios. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Demetrios یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Jineet Doshi is an award-winning Scientist, Machine Learning Engineer, and Leader at Intuit with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine-learning models from design to production across various domains, which have impacted 100 million customers and significantly improved business metrics, leading to millions of dollars of impact.

Holistic Evaluation of Generative AI Systems // MLOps Podcast #280 with Jineet Doshi, Staff AI Scientist or AI Lead at Intuit.

// Abstract

Evaluating LLMs is essential in establishing trust before deploying them to production. Even post-deployment, evaluation is essential to ensure LLM outputs meet expectations, making it a foundational part of LLMOps. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks, such as writing poems, Q&A, summarization, etc. This leads to the question of how do you evaluate a system with such broad intelligence capabilities? This talk covers the various approaches for evaluating LLMs, such as classic NLP techniques, red teaming, and newer ones like using LLMs as a judge, along with the pros and cons of each. The talk includes an evaluation of complex GenAI systems like RAG and Agents. It also covers evaluating LLMs for safety and security, and the need to have a holistic approach for evaluating these very capable models.

// Bio

Jineet Doshi is an award-winning AI Lead and Engineer with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine learning models from design to production across various domains, which have impacted millions of customers and have significantly improved business metrics, leading to millions of dollars of impact. He is currently an AI Lead at Intuit, where he is one of the architects and developers of their Generative AI platform, which is serving Generative AI experiences for more than 100 million customers around the world. Jineet is also a guest lecturer at Stanford University as part of their building LLM Applications class. He is on the Advisory Board of the University of San Francisco’s AI Program. He holds multiple patents in the field, is on the steering committee of MLOps World Conference, and has also co-chaired workshops at top AI conferences like KDD. He holds a Master's degree from Carnegie Mellon University.

// MLOps Swag/Merch

https://shop.mlops.community/

// Related Links

Website: https://www.intuit.com/

--------------- ✌️Connect With Us ✌️ -------------

Join our Slack community: https://go.mlops.community/slack

Follow us on Twitter: @mlopscommunity

Sign up for the next meetup: https://go.mlops.community/register

Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/

Connect with Jineet on LinkedIn: https://www.linkedin.com/in/jineetdoshi/

Timestamps:[00:00] Jineet's preferred coffee[00:20] Takeaways[01:24] Please like, share, leave a review, and subscribe to our MLOps channels![01:36] LLM evaluation at scale[03:13] Challenges in GenAI evaluation[08:09] Eval products vs platforms[09:28] Evaluation methods for models[14:03] NLP evaluation techniques[25:06] LLM as a judge/jury[31:56] LLMs and pizza brainstorming[34:07] Cost per answer breakdown[38:29] Evaluating RAG systems[44:00] Testing with LLMs and humans[49:23] Evaluating AI use cases[54:19] AI workflow stress testing[55:40] Wrap up

  continue reading

473 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش