Artwork

محتوای ارائه شده توسط Nicolay Gerold. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Nicolay Gerold یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

#052 Don't Build Models, Build Systems That Build Models

59:22
 
اشتراک گذاری
 

Manage episode 491817586 series 3585930
محتوای ارائه شده توسط Nicolay Gerold. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Nicolay Gerold یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Nicolay here,

Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.

The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.

Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."

The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.

In the podcast, we also cover:

  • Why inference, not training, is where the money is made
  • How to rethink compute when moving from traditional cloud to serverless
  • The economics of automated resource management
  • Why task decomposition is the key ML engineering skill
  • When to earn the right to fine-tune versus using foundation models

*📶 Connect with Charles:*

*📶 Connect with Nicolay:*

*⏱️ Important Moments*

  • From CUDA to Serverless: [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure.
  • Rethinking Scale Ambition: [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big."
  • The Economics of Serverless: [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads.
  • Lambda vs Modal Philosophy: [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus.
  • Inference Economics Reality: [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions."
  • The Open Source Commoditization: [00:14:55] How foundation models are becoming undifferentiated capabilities like databases.
  • Task Decomposition as Core Skill: [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering.
  • Systems That Build Models: [00:33:31] The critical difference between delivering static weights versus repeatable model production systems
  • Earning the Right to Fine-Tune: [00:34:06] The infrastructure prerequisites needed before attempting model customization.
  • Multi-Node Training Challenges: [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand.

*🛠️ Tools & Tech Mentioned*


*📚 Recommended Resources*


💬 Join The Conversation

Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at [email protected].

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.

  continue reading

63 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 491817586 series 3585930
محتوای ارائه شده توسط Nicolay Gerold. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Nicolay Gerold یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Nicolay here,

Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.

The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.

Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."

The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.

In the podcast, we also cover:

  • Why inference, not training, is where the money is made
  • How to rethink compute when moving from traditional cloud to serverless
  • The economics of automated resource management
  • Why task decomposition is the key ML engineering skill
  • When to earn the right to fine-tune versus using foundation models

*📶 Connect with Charles:*

*📶 Connect with Nicolay:*

*⏱️ Important Moments*

  • From CUDA to Serverless: [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure.
  • Rethinking Scale Ambition: [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big."
  • The Economics of Serverless: [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads.
  • Lambda vs Modal Philosophy: [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus.
  • Inference Economics Reality: [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions."
  • The Open Source Commoditization: [00:14:55] How foundation models are becoming undifferentiated capabilities like databases.
  • Task Decomposition as Core Skill: [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering.
  • Systems That Build Models: [00:33:31] The critical difference between delivering static weights versus repeatable model production systems
  • Earning the Right to Fine-Tune: [00:34:06] The infrastructure prerequisites needed before attempting model customization.
  • Multi-Node Training Challenges: [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand.

*🛠️ Tools & Tech Mentioned*


*📚 Recommended Resources*


💬 Join The Conversation

Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at [email protected].

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.

  continue reading

63 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش