با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده


Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University
Manage episode 453994436 series 3574631
Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
فصل ها
1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)
2. Memory Optimization for on-Device LLM (00:00:24)
3. Memory Optimization Techniques for LLMs (00:11:50)
48 قسمت
Manage episode 453994436 series 3574631
Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
فصل ها
1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)
2. Memory Optimization for on-Device LLM (00:00:24)
3. Memory Optimization Techniques for LLMs (00:11:50)
48 قسمت
همه قسمت ها
×
1 The Future of Domain-Specific AI Search Lies in Targeted Agent Systems 1:00:46

1 Revolutionizing Software Development with GenAI-Powered Edge Solutions with Anirban Bhattacharjee of Wipro 28:43

1 Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University 30:29

1 Harnessing Edge AI: Transforming Industries with Advanced Transformer Models with Dave McCarthy of IDC and Pete Bernard of tinyML Foundation 33:53

1 Transforming the Edge with Generative AI: Unraveling Innovations Beyond Chatbots with Danilo Pau, IEEE Fellow from STMicroelectronics 6:47

1 Deploying TinyML Models at Scale: Insights on Monitoring and Automation with Alessandro Grande of Edge Impulse 20:34

1 Revolutionizing Space Technology with Edge AI and Satellite Autonomy 1:01:39

1 Career EDGE: Navigating the Job Market in the Age of Edge AI 1:28:56

1 EDGE AI BLUEPRINTS: Enhancing Urban Safety with Edge AI Approaches to Pedestrian and Traffic Management 1:00:29

1 Panel Discussion - EDGE AI TAIPEI - Revolutionizing Edge Computing with AI-Driven Innovations 58:42
به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.