Artwork

محتوای ارائه شده توسط Changelog Media. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Changelog Media یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Towards high-quality (maybe synthetic) datasets

57:04
 
اشتراک گذاری
 

Manage episode 444356710 series 2385063
محتوای ارائه شده توسط Changelog Media. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Changelog Media یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.

Join the discussion

Changelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.
  • WorkOSA platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their application. Add Single Sign-On (Okta, Azure, Google, Microsoft OAuth), sync users from any SCIM directory, HRIS integration, audit trails (SIEM), free magic link sign-in. WorkOS is designed for developers and offers a single, elegant interface that abstracts dozens of enterprise integrations. Learn more and get started at WorkOS.com
  • Eight SleepTake your sleep and recovery to the next level. Go to eightsleep.com/PRACTICALAI and use the code PRACTICALAI to get $350 off your very own Pod 4 Ultra. You can try it for free for 30 days - but we’re confident you will not want to return it. Once you experience AI-optimized sleep, you’ll wonder how you ever slept without it. Currently shipping to: United States, Canada, United Kingdom, Europe, and Australia.

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

  continue reading

فصل ها

1. Welcome to Practical AI (00:00:00)

2. Sponsor: Fly (00:00:44)

3. What does data collaboration mean? (00:03:56)

4. Understanding your data (00:07:18)

5. How to start curating data (00:09:58)

6. Practical steps to scale (00:13:12)

7. Sponsor: WorkOS (00:16:52)

8. Traditional & new usecases (00:20:23)

9. Virtues of smaller models (00:24:51)

10. What Argilla looks like (00:27:04)

11. User backgrounds (00:30:55)

12. The non-technical POV (00:34:21)

13. Sponsor: Eight Sleep (00:38:23)

14. AI feedback (00:41:09)

15. Hallucination issues (00:44:50)

16. What is Distilabel (00:46:10)

17. Usage & adoption (00:50:08)

18. Where things are going (00:52:55)

19. This is muy bueno (00:55:34)

20. Outro (00:56:15)

298 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 444356710 series 2385063
محتوای ارائه شده توسط Changelog Media. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Changelog Media یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.

Join the discussion

Changelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes.
  • WorkOSA platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their application. Add Single Sign-On (Okta, Azure, Google, Microsoft OAuth), sync users from any SCIM directory, HRIS integration, audit trails (SIEM), free magic link sign-in. WorkOS is designed for developers and offers a single, elegant interface that abstracts dozens of enterprise integrations. Learn more and get started at WorkOS.com
  • Eight SleepTake your sleep and recovery to the next level. Go to eightsleep.com/PRACTICALAI and use the code PRACTICALAI to get $350 off your very own Pod 4 Ultra. You can try it for free for 30 days - but we’re confident you will not want to return it. Once you experience AI-optimized sleep, you’ll wonder how you ever slept without it. Currently shipping to: United States, Canada, United Kingdom, Europe, and Australia.

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

  continue reading

فصل ها

1. Welcome to Practical AI (00:00:00)

2. Sponsor: Fly (00:00:44)

3. What does data collaboration mean? (00:03:56)

4. Understanding your data (00:07:18)

5. How to start curating data (00:09:58)

6. Practical steps to scale (00:13:12)

7. Sponsor: WorkOS (00:16:52)

8. Traditional & new usecases (00:20:23)

9. Virtues of smaller models (00:24:51)

10. What Argilla looks like (00:27:04)

11. User backgrounds (00:30:55)

12. The non-technical POV (00:34:21)

13. Sponsor: Eight Sleep (00:38:23)

14. AI feedback (00:41:09)

15. Hallucination issues (00:44:50)

16. What is Distilabel (00:46:10)

17. Usage & adoption (00:50:08)

18. Where things are going (00:52:55)

19. This is muy bueno (00:55:34)

20. Outro (00:56:15)

298 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع