Artwork

محتوای ارائه شده توسط The Data Flowcast. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط The Data Flowcast یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

Leveraging Airflow To Build Scalable and Reliable Data Platforms at 99acres.com with Samyak Jain

25:08
 
اشتراک گذاری
 

Manage episode 467646212 series 2948506
محتوای ارائه شده توسط The Data Flowcast. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط The Data Flowcast یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Data orchestration is evolving rapidly, with dynamic workflows becoming the cornerstone of modern data engineering. In this episode, we are joined by Samyak Jain, Senior Software Engineer - Big Data at 99acres.com. Samyak shares insights from his journey with Apache Airflow, exploring how his team built a self-service platform that enables non-technical teams to launch data pipelines and marketing campaigns seamlessly.

Key Takeaways:

(02:02) Starting a career in data engineering by troubleshooting Airflow pipelines.

(04:27) Building self-service portals with Airflow as the backend engine.

(05:34) Utilizing API endpoints to trigger dynamic DAGs with parameterized templates.

(09:31) Managing a dynamic environment with over 1,400 active DAGs.

(11:14) Implementing fault tolerance by segmenting data workflows into distinct layers.

(14:15) Tracking and optimizing query costs in AWS Athena to save $7K monthly.

(16:22) Automating cost monitoring with real-time alerts for high-cost queries.

(17:15) Streamlining Airflow metadata cleanup to prevent performance bottlenecks.

(21:30) Efficiently handling one-time and recurring marketing campaigns using Airflow.

(24:18) Advocating for Airflow features that improve resource management and ownership tracking.

Resources Mentioned:

Samyak Jain -

https://www.linkedin.com/in/samyak-jain-ab5830169/

99acres.com -

https://www.linkedin.com/company/99acres/

Apache Airflow -

https://airflow.apache.org/

AWS Athena -

https://aws.amazon.com/athena/

Kafka -

https://kafka.apache.org/

Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

51 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 467646212 series 2948506
محتوای ارائه شده توسط The Data Flowcast. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط The Data Flowcast یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Data orchestration is evolving rapidly, with dynamic workflows becoming the cornerstone of modern data engineering. In this episode, we are joined by Samyak Jain, Senior Software Engineer - Big Data at 99acres.com. Samyak shares insights from his journey with Apache Airflow, exploring how his team built a self-service platform that enables non-technical teams to launch data pipelines and marketing campaigns seamlessly.

Key Takeaways:

(02:02) Starting a career in data engineering by troubleshooting Airflow pipelines.

(04:27) Building self-service portals with Airflow as the backend engine.

(05:34) Utilizing API endpoints to trigger dynamic DAGs with parameterized templates.

(09:31) Managing a dynamic environment with over 1,400 active DAGs.

(11:14) Implementing fault tolerance by segmenting data workflows into distinct layers.

(14:15) Tracking and optimizing query costs in AWS Athena to save $7K monthly.

(16:22) Automating cost monitoring with real-time alerts for high-cost queries.

(17:15) Streamlining Airflow metadata cleanup to prevent performance bottlenecks.

(21:30) Efficiently handling one-time and recurring marketing campaigns using Airflow.

(24:18) Advocating for Airflow features that improve resource management and ownership tracking.

Resources Mentioned:

Samyak Jain -

https://www.linkedin.com/in/samyak-jain-ab5830169/

99acres.com -

https://www.linkedin.com/company/99acres/

Apache Airflow -

https://airflow.apache.org/

AWS Athena -

https://aws.amazon.com/athena/

Kafka -

https://kafka.apache.org/

Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.

#AI #Automation #Airflow #MachineLearning

  continue reading

51 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش