Artwork

محتوای ارائه شده توسط Zeta Alpha. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Zeta Alpha یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Open Pre-Trained Transformer Language Models (OPT): What does it take to train GPT-3?

47:12
 
اشتراک گذاری
 

Manage episode 355037186 series 3446693
محتوای ارائه شده توسط Zeta Alpha. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Zeta Alpha یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Andrew Yates (Assistant Professor at the University of Amsterdam) and Sergi Castella i Sapé discuss the recent "Open Pre-trained Transformer (OPT) Language Models" from Meta AI (formerly Facebook). In this replication work, Meta developed and trained a 175 Billion parameter Transformer very similar to GPT-3 from OpenAI, documenting the process in detail to share their findings with the community. The code, pretrained weights, and logbook are available on their Github repository (links below).

Links

Feedback Form: https://scastella.typeform.com/to/rg7a5GfJ

📄 OPT paper: https://arxiv.org/abs/2205.01068

👾 Code: https://github.com/facebookresearch/metaseq

📒 Logbook: https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf

✍️ OPT Official Blog Post: https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/

OpenAI Embeddings API: https://openai.com/blog/introducing-text-and-code-embeddings/

Nils Reimers' critique of OpenAI Embeddings API: https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9

Timestamps:

00:00 Introduction and housekeeping: new feedback form, ACL conference highlights

02:42 The convergence between NLP and Neural IR techniques

06:43 Open Pretrained Transformer motivation and scope, reproducing GPT-3 and open-sourcing

08:16 Basics of OPT: architecture, pre-training objective, teacher forcing, tokenizer, training data

13:40 Preliminary experiments findings: hyperparameters, training stability, spikiness

20:08 Problems that appear at scale when training with 992 GPUs

23:01 Using temperature to check whether GPUs are working

25:00 Training the largest model: what to do when the loss explodes? (which happens quite often)

29:15 When they switched away from AdamW to SGD

32:00 Results: successful but not quite GPT-3 level.

Toxicity? 35:45 Replicability of Large Language Models research. Was GPT-3 replicable? What difference does it make?

37:25 What makes a paper replicable?

40:33 Directions in which large Language Models are applied to Information Retrieval

45:15 Final thoughts and takeaways

  continue reading

20 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 355037186 series 3446693
محتوای ارائه شده توسط Zeta Alpha. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Zeta Alpha یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Andrew Yates (Assistant Professor at the University of Amsterdam) and Sergi Castella i Sapé discuss the recent "Open Pre-trained Transformer (OPT) Language Models" from Meta AI (formerly Facebook). In this replication work, Meta developed and trained a 175 Billion parameter Transformer very similar to GPT-3 from OpenAI, documenting the process in detail to share their findings with the community. The code, pretrained weights, and logbook are available on their Github repository (links below).

Links

Feedback Form: https://scastella.typeform.com/to/rg7a5GfJ

📄 OPT paper: https://arxiv.org/abs/2205.01068

👾 Code: https://github.com/facebookresearch/metaseq

📒 Logbook: https://github.com/facebookresearch/metaseq/blob/main/projects/OPT/chronicles/OPT175B_Logbook.pdf

✍️ OPT Official Blog Post: https://ai.facebook.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/

OpenAI Embeddings API: https://openai.com/blog/introducing-text-and-code-embeddings/

Nils Reimers' critique of OpenAI Embeddings API: https://medium.com/@nils_reimers/openai-gpt-3-text-embeddings-really-a-new-state-of-the-art-in-dense-text-embeddings-6571fe3ec9d9

Timestamps:

00:00 Introduction and housekeeping: new feedback form, ACL conference highlights

02:42 The convergence between NLP and Neural IR techniques

06:43 Open Pretrained Transformer motivation and scope, reproducing GPT-3 and open-sourcing

08:16 Basics of OPT: architecture, pre-training objective, teacher forcing, tokenizer, training data

13:40 Preliminary experiments findings: hyperparameters, training stability, spikiness

20:08 Problems that appear at scale when training with 992 GPUs

23:01 Using temperature to check whether GPUs are working

25:00 Training the largest model: what to do when the loss explodes? (which happens quite often)

29:15 When they switched away from AdamW to SGD

32:00 Results: successful but not quite GPT-3 level.

Toxicity? 35:45 Replicability of Large Language Models research. Was GPT-3 replicable? What difference does it make?

37:25 What makes a paper replicable?

40:33 Directions in which large Language Models are applied to Information Retrieval

45:15 Final thoughts and takeaways

  continue reading

20 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع