Artwork

محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Natural Language Processing and How ML Models Understand Text

58:49
 
اشتراک گذاری
 

Manage episode 335817963 series 2637014
محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

How do you process and classify text documents in Python? What are the fundamental techniques and building blocks for Natural Language Processing (NLP)? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, talks about how machine learning (ML) models understand text.

Jodie explains how ML models require data in a structured format, which involves transforming text documents into columns and rows. She covers the most straightforward approach, called binary vectorization. We discuss the bag-of-words method and the tools of stemming, lemmatization, and count vectorization.

We jump into word embedding models next. Jodie talks about WordNet, Natural Language Toolkit (NLTK), word2vec, and Gensim. Our conversation lays a foundation for starting with text classification, implementing sentiment analysis, and building projects using these tools. Jodie also shares multiple resources to help you continue exploring NLP and modeling.

Course Spotlight: Learn Text Classification With Python and Keras

In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.

Topics:

  • 00:00:00 – Introduction
  • 00:02:47 – Exploring the topic
  • 00:06:00 – Perceived sentience of LaMDA
  • 00:10:24 – How do we get started?
  • 00:11:16 – What are classification and sentiment analysis?
  • 00:13:03 – Transforming text in rows and columns
  • 00:14:47 – Sponsor: Snyk
  • 00:15:27 – Bag-of-words approach
  • 00:19:12 – Stemming and lemmatization
  • 00:22:05 – Capturing N-grams
  • 00:25:34 – Count vectorization
  • 00:27:14 – Stop words
  • 00:28:46 – Text Frequency / Inverse Document Frequency (TFIDF) vectorization
  • 00:32:28 – Potential projects for bag-of-words techniques
  • 00:34:07 – Video Course Spotlight
  • 00:35:20 – WordNet and NLTK package
  • 00:37:27 – Word embeddings and word2vec
  • 00:45:30 – Previous training and too many dimensions
  • 00:50:07 – How to use word2vec and Gensim?
  • 00:51:26 – What types of projects for word2vec and Gensim?
  • 00:54:41 – Getting into GPT and BERT in another episode
  • 00:56:11 – How to follow Jodie’s work?
  • 00:57:36 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

272 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 335817963 series 2637014
محتوای ارائه شده توسط Real Python. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Real Python یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

How do you process and classify text documents in Python? What are the fundamental techniques and building blocks for Natural Language Processing (NLP)? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, talks about how machine learning (ML) models understand text.

Jodie explains how ML models require data in a structured format, which involves transforming text documents into columns and rows. She covers the most straightforward approach, called binary vectorization. We discuss the bag-of-words method and the tools of stemming, lemmatization, and count vectorization.

We jump into word embedding models next. Jodie talks about WordNet, Natural Language Toolkit (NLTK), word2vec, and Gensim. Our conversation lays a foundation for starting with text classification, implementing sentiment analysis, and building projects using these tools. Jodie also shares multiple resources to help you continue exploring NLP and modeling.

Course Spotlight: Learn Text Classification With Python and Keras

In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.

Topics:

  • 00:00:00 – Introduction
  • 00:02:47 – Exploring the topic
  • 00:06:00 – Perceived sentience of LaMDA
  • 00:10:24 – How do we get started?
  • 00:11:16 – What are classification and sentiment analysis?
  • 00:13:03 – Transforming text in rows and columns
  • 00:14:47 – Sponsor: Snyk
  • 00:15:27 – Bag-of-words approach
  • 00:19:12 – Stemming and lemmatization
  • 00:22:05 – Capturing N-grams
  • 00:25:34 – Count vectorization
  • 00:27:14 – Stop words
  • 00:28:46 – Text Frequency / Inverse Document Frequency (TFIDF) vectorization
  • 00:32:28 – Potential projects for bag-of-words techniques
  • 00:34:07 – Video Course Spotlight
  • 00:35:20 – WordNet and NLTK package
  • 00:37:27 – Word embeddings and word2vec
  • 00:45:30 – Previous training and too many dimensions
  • 00:50:07 – How to use word2vec and Gensim?
  • 00:51:26 – What types of projects for word2vec and Gensim?
  • 00:54:41 – Getting into GPT and BERT in another episode
  • 00:56:11 – How to follow Jodie’s work?
  • 00:57:36 – Thanks and goodbye

Show Links:

Level up your Python skills with our expert-led courses:

Support the podcast & join our community of Pythonistas

  continue reading

272 قسمت

Tất cả các tập

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش