
با برنامه Player FM !
Natasha Jaques 2
Manage episode 357892993 series 2536330
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine
Additional References
- Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019
- Learning to summarize from human feedback, Nisan Stiennon et al 2020
- Training language models to follow instructions with human feedback, Long Ouyang et al 2022
73 قسمت
Manage episode 357892993 series 2536330
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine
Additional References
- Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019
- Learning to summarize from human feedback, Nisan Stiennon et al 2020
- Training language models to follow instructions with human feedback, Long Ouyang et al 2022
73 قسمت
همه قسمت ها
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.