
با برنامه Player FM !
Complete Beginner's Course on AI Evaluations: Step by Step (2025) | Aman Khan
Manage episode 502161037 series 3621237
Today, I want to share a new episode with Aman Khan.The best way to learn about AI evaluations is to watch 2 PMs build them live from scratch. In our new episode, Aman and I walk through creating evals for an AI customer support agent — from labeling a golden dataset to aligning LLM judges. This is the complete beginners AI eval course you've been waiting for.Aman and I talked about:
(00:00) What are AI evals and how to get good at them
(02:52) The 4 types of AI evaluations everyone should know
(06:08) Live demo: Building evals for a customer support agent
(10:29) Using Anthropic's console to generate great prompts
(15:13) Creating the evaluation criteria
(17:40) Adding human labels to the golden dataset
(31:05) Scaling evals with LLM-judge prompts
(38:21) How to align LLM judges with human judgmentGet the takeaways: https://creatoreconomy.so/p/complete-beginner-course-on-ai-evaluations-aman-khanWhere to find Aman:
X: https://www.linkedin.com/in/amanberkeley/
Website: https://arize.com/📌 Subscribe to this channel – more interviews coming soon!
77 قسمت
Manage episode 502161037 series 3621237
Today, I want to share a new episode with Aman Khan.The best way to learn about AI evaluations is to watch 2 PMs build them live from scratch. In our new episode, Aman and I walk through creating evals for an AI customer support agent — from labeling a golden dataset to aligning LLM judges. This is the complete beginners AI eval course you've been waiting for.Aman and I talked about:
(00:00) What are AI evals and how to get good at them
(02:52) The 4 types of AI evaluations everyone should know
(06:08) Live demo: Building evals for a customer support agent
(10:29) Using Anthropic's console to generate great prompts
(15:13) Creating the evaluation criteria
(17:40) Adding human labels to the golden dataset
(31:05) Scaling evals with LLM-judge prompts
(38:21) How to align LLM judges with human judgmentGet the takeaways: https://creatoreconomy.so/p/complete-beginner-course-on-ai-evaluations-aman-khanWhere to find Aman:
X: https://www.linkedin.com/in/amanberkeley/
Website: https://arize.com/📌 Subscribe to this channel – more interviews coming soon!
77 قسمت
Todos os episódios
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.