Artwork

محتوای ارائه شده توسط Data on Kubernetes Community. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Data on Kubernetes Community یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal
Player FM - برنامه پادکست
با برنامه Player FM !

DoK Talks #141 - Dossier: multi-tenant distributed Jupyter Notebooks // Iacoppo Colonnelli & Dario Tranchitella

1:00:10
 
اشتراک گذاری
 

Manage episode 334453420 series 2865115
محتوای ارائه شده توسط Data on Kubernetes Community. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Data on Kubernetes Community یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

https://go.dok.community/slack
https://dok.community
ABSTRACT OF THE TALK

When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results.

Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times).

The Dossier platform allows system administrators to manage multi-tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution.

BIO

Iacopo Colonnelli is a Computer Science research fellow. He received his Ph.D. with honours in Modeling and Data Science at Università di Torino with a thesis on novel workflow models for heterogeneous distributed systems, and his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN. His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures.

Dario is an SWE that turned DevOps, and he's regretting this choice day by day. Besides making memes on Twitter that gain more reactions than technical discussions, leading the development of Open Source projects at CLASTIX, an Open Source-based start-up focusing on Multi-Tenancy in Kubernetes.

KEY TAKE-AWAYS FROM THE TALK

From this talk, people will learn:
- The different requirements of Data analysis as a service
- How to configure for multi-tenancy at the cluster level with Capsule
- How to write distributed workflows as Notebooks with Jupyter Workflows
- How to combine all these aspects into a single platform: Dossier

All the software presented in the talk is OpenSource, so attendees can directly play with them and include them in their experiments with no additional restrictions.

  continue reading

243 قسمت

Artwork
iconاشتراک گذاری
 
Manage episode 334453420 series 2865115
محتوای ارائه شده توسط Data on Kubernetes Community. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Data on Kubernetes Community یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

https://go.dok.community/slack
https://dok.community
ABSTRACT OF THE TALK

When providing data analysis as a service, one must tackle several problems. Data privacy and protection by design are crucial when working on sensitive data. Performance and scalability are fundamental for compute-intensive workloads, e.g. training Deep Neural Networks. User-friendly interfaces and fast prototyping tools are essential to allow domain experts to experiment with new techniques. Portability and reproducibility are necessary to assess the actual value of results.

Kubernetes is the best platform to provide reliable, elastic, and maintainable services. However, Kubernetes alone is not enough to achieve large-scale multi-tenant reproducible data analysis. OOTB support for multi-tenancy is too rough, with only two levels of segregation (i.e. the single namespace or the entire cluster). Offloading computation to off-cluster resources is non-trivial and requires the user's manual configuration. Also, Jupyter Notebooks per se cannot provide much scalability (they execute locally and sequentially) and reproducibility (users can run cells in any order and any number of times).

The Dossier platform allows system administrators to manage multi-tenant distributed Jupyter Notebooks at the cluster level in the Kubernetes way, i.e. through CRDs. Namespaces are aggregated in Tenants, and all security and accountability aspects are managed at that level. Each Notebook spawns into a user-dedicated namespace, subject to all Tenant-level constraints. Users can rely on provisioned resources, either in-cluster worker nodes or external resources like HPC facilities. Plus, they can plug their computing nodes in a BYOD fashion. Notebooks are interpreted as distributed workflows, where each cell is a task that one can offload to a different location in charge of its execution.

BIO

Iacopo Colonnelli is a Computer Science research fellow. He received his Ph.D. with honours in Modeling and Data Science at Università di Torino with a thesis on novel workflow models for heterogeneous distributed systems, and his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN. His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures.

Dario is an SWE that turned DevOps, and he's regretting this choice day by day. Besides making memes on Twitter that gain more reactions than technical discussions, leading the development of Open Source projects at CLASTIX, an Open Source-based start-up focusing on Multi-Tenancy in Kubernetes.

KEY TAKE-AWAYS FROM THE TALK

From this talk, people will learn:
- The different requirements of Data analysis as a service
- How to configure for multi-tenancy at the cluster level with Capsule
- How to write distributed workflows as Notebooks with Jupyter Workflows
- How to combine all these aspects into a single platform: Dossier

All the software presented in the talk is OpenSource, so attendees can directly play with them and include them in their experiments with no additional restrictions.

  continue reading

243 قسمت

همه قسمت ها

×
 
Loading …

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.

 

راهنمای مرجع سریع

در حین کاوش به این نمایش گوش دهید
پخش