Artwork

محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Preparing Data to Measure True Machine Learning Model Performance

57:45
 
اشتراک گذاری
 

Manage episode 348654548 series 2637014
محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

How do you prepare a dataset for machine learning (ML)? How do you go beyond cleaning the data and move toward measuring how the model performs? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to talk about strategies for better ML model performance.

Jodie starts by defining some terms for the conversation. We talk about targets, features, and supervised learning.

We discuss three common ways that data can alter model performance and which Python tools can help spot and avoid them. Jodie shares personal experiences of working through these pitfalls. We also share a healthy collection of resources to explore and learn more.

Course Spotlight: Combining Data in pandas With concat() and merge()

In this video course, you’ll learn two techniques for combining data in pandas: merge() and concat(). Combining Series and DataFrame objects in pandas is a powerful way to gain new insights into your data.

Topics:

  • 00:00:00 – Introduction
  • 00:01:46 – Recent conference talks
  • 00:03:24 – How to prepare your data for model performance
  • 00:04:24 – Vocabulary: target, features, and supervised learning
  • 00:06:28 – The curse of dimensionality
  • 00:08:57 – Overfitting
  • 00:11:08 – Underfitting
  • 00:12:11 – Splitting the dataset
  • 00:13:39 – K-fold cross validation
  • 00:18:30 – Data leakage
  • 00:21:36 – Checking for duplicates
  • 00:26:23 – Applying transformations only after splitting data
  • 00:31:16 – Imbalanced data
  • 00:36:36 – Using ML to balance data
  • 00:41:05 – Informing your model of the imbalance
  • 00:42:56 – Video Course Spotlight
  • 00:44:20 – Accuracy used as a measure
  • 00:49:05 – Scikit-learn method classification_table
  • 00:50:43 – Jet Brains blog post and conference talk
  • 00:52:18 – How can people follow your work online?
  • 00:54:39 – Upcoming webinars
  • 00:56:20 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

271 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 348654548 series 2637014
محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

How do you prepare a dataset for machine learning (ML)? How do you go beyond cleaning the data and move toward measuring how the model performs? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, returns to talk about strategies for better ML model performance.

Jodie starts by defining some terms for the conversation. We talk about targets, features, and supervised learning.

We discuss three common ways that data can alter model performance and which Python tools can help spot and avoid them. Jodie shares personal experiences of working through these pitfalls. We also share a healthy collection of resources to explore and learn more.

Course Spotlight: Combining Data in pandas With concat() and merge()

In this video course, you’ll learn two techniques for combining data in pandas: merge() and concat(). Combining Series and DataFrame objects in pandas is a powerful way to gain new insights into your data.

Topics:

  • 00:00:00 – Introduction
  • 00:01:46 – Recent conference talks
  • 00:03:24 – How to prepare your data for model performance
  • 00:04:24 – Vocabulary: target, features, and supervised learning
  • 00:06:28 – The curse of dimensionality
  • 00:08:57 – Overfitting
  • 00:11:08 – Underfitting
  • 00:12:11 – Splitting the dataset
  • 00:13:39 – K-fold cross validation
  • 00:18:30 – Data leakage
  • 00:21:36 – Checking for duplicates
  • 00:26:23 – Applying transformations only after splitting data
  • 00:31:16 – Imbalanced data
  • 00:36:36 – Using ML to balance data
  • 00:41:05 – Informing your model of the imbalance
  • 00:42:56 – Video Course Spotlight
  • 00:44:20 – Accuracy used as a measure
  • 00:49:05 – Scikit-learn method classification_table
  • 00:50:43 – Jet Brains blog post and conference talk
  • 00:52:18 – How can people follow your work online?
  • 00:54:39 – Upcoming webinars
  • 00:56:20 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

271 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش