با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده
#030 Vector Search at Scale, Why One Size Doesn't Fit All
Manage episode 448926657 series 3585930
Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.
Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:
Key Insights:
- Multi-tier storage strategy:
- GPU memory (1% of data, fastest)
- RAM (10% of data)
- Local SSD
- Object storage (slowest but cheapest)
- Real-time search solution:
- New data goes to buffer (searchable immediately)
- Index builds in background when buffer fills
- Combines buffer & main index results
- Performance optimization:
- GPU acceleration for 10k-50k queries/second
- Customizable trade-offs between:
- Cost
- Latency
- Search relevance
- Future developments:
- Self-learning indices
- Hybrid search methods (dense + sparse)
- Graph embedding support
- Colbert integration
Perfect for teams hitting scaling walls with their current vector search implementation or planning for future growth.
Worth watching if you're building production search systems or need to optimize costs vs performance.
Charles Xie:
Nicolay Gerold:
00:00 Introduction to Search System Challenges 00:26 Introducing Milvus: The Open Source Vector Database 00:58 Interview with Charles: Founder of Zilliz 02:20 Scalability and Performance in Vector Databases 03:35 Challenges in Distributed Systems 05:46 Data Consistency and Real-Time Search 12:12 Hierarchical Storage and GPU Acceleration 18:34 Emerging Technologies in Vector Search 23:21 Self-Learning Indexes and Future Innovations 28:44 Key Takeaways and Conclusion
62 قسمت
Manage episode 448926657 series 3585930
Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.
Charles Xie, founder of Zilliz (company behind Milvus), shares how they solved vector database scaling challenges at 100B+ vector scale:
Key Insights:
- Multi-tier storage strategy:
- GPU memory (1% of data, fastest)
- RAM (10% of data)
- Local SSD
- Object storage (slowest but cheapest)
- Real-time search solution:
- New data goes to buffer (searchable immediately)
- Index builds in background when buffer fills
- Combines buffer & main index results
- Performance optimization:
- GPU acceleration for 10k-50k queries/second
- Customizable trade-offs between:
- Cost
- Latency
- Search relevance
- Future developments:
- Self-learning indices
- Hybrid search methods (dense + sparse)
- Graph embedding support
- Colbert integration
Perfect for teams hitting scaling walls with their current vector search implementation or planning for future growth.
Worth watching if you're building production search systems or need to optimize costs vs performance.
Charles Xie:
Nicolay Gerold:
00:00 Introduction to Search System Challenges 00:26 Introducing Milvus: The Open Source Vector Database 00:58 Interview with Charles: Founder of Zilliz 02:20 Scalability and Performance in Vector Databases 03:35 Challenges in Distributed Systems 05:46 Data Consistency and Real-Time Search 12:12 Hierarchical Storage and GPU Acceleration 18:34 Emerging Technologies in Vector Search 23:21 Self-Learning Indexes and Future Innovations 28:44 Key Takeaways and Conclusion
62 قسمت
همه قسمت ها
×
1 Embedding Intelligence: AI's Move to the Edge 1:05:35

1 #054 Building Frankenstein Models with Model Merging and the Future of AI 1:06:55

1 #053 AI in the Terminal: Enhancing Coding with Warp 1:04:30

1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51

1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57

1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00

1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38

1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34


1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:43

1 #044 Graphs Aren't Just For Specialists Anymore 1:03:34

1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:58

1 #042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:43

1 #041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:34

1 #038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:23

به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.