با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده


Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2
Manage episode 428522579 series 3585930
In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB's development, including null value support, multimodal data handling, and finding an optimal balance for search performance.
Sound Bites
"A little bit more power to actually just try." "We're becoming a little bit more feature complete with returns of arrow." "Weird data representations that are actually really optimized for your use case."
Key Points
- Weston introduces LanceDB, an open-source multimodal vector database and file format.
- The goals behind LanceDB's design: handling null values, multimodal data, and finding the right balance between point lookups and full dataset scan performance.
- Lance V2 File Format:
- Potential Use Cases
Conversation Highlights
- On the benefits of Arrow integration: Strengthening the connection with the Arrow data ecosystem for seamless data handling.
- Why "columnar container format"?: A broader definition than "table format" to encompass more unconventional use cases.
- Tackling multimodal data: How LanceDB V2 enables storage of large multimodal data efficiently and without needing tons of memory.
- Python's role in encoding experimentation: Providing a way to rapidly prototype custom encodings and plug them into LanceDB.
LanceDB:
Weston Pace:
Nicolay Gerold:
Chapters
00:00 Introducing Lance: A New File Format
06:46 Enabling Custom Encodings in Lance
11:51 Exploring the Relationship Between Lance and Arrow
20:04 New Chapter
Lance file format, nulls, round-tripping data, optimized data representations, full-text search, encodings, downsides, multimodal data, compression, point lookups, full scan performance, non-contiguous columns, custom encodings
61 قسمت
Manage episode 428522579 series 3585930
In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB's development, including null value support, multimodal data handling, and finding an optimal balance for search performance.
Sound Bites
"A little bit more power to actually just try." "We're becoming a little bit more feature complete with returns of arrow." "Weird data representations that are actually really optimized for your use case."
Key Points
- Weston introduces LanceDB, an open-source multimodal vector database and file format.
- The goals behind LanceDB's design: handling null values, multimodal data, and finding the right balance between point lookups and full dataset scan performance.
- Lance V2 File Format:
- Potential Use Cases
Conversation Highlights
- On the benefits of Arrow integration: Strengthening the connection with the Arrow data ecosystem for seamless data handling.
- Why "columnar container format"?: A broader definition than "table format" to encompass more unconventional use cases.
- Tackling multimodal data: How LanceDB V2 enables storage of large multimodal data efficiently and without needing tons of memory.
- Python's role in encoding experimentation: Providing a way to rapidly prototype custom encodings and plug them into LanceDB.
LanceDB:
Weston Pace:
Nicolay Gerold:
Chapters
00:00 Introducing Lance: A New File Format
06:46 Enabling Custom Encodings in Lance
11:51 Exploring the Relationship Between Lance and Arrow
20:04 New Chapter
Lance file format, nulls, round-tripping data, optimized data representations, full-text search, encodings, downsides, multimodal data, compression, point lookups, full scan performance, non-contiguous columns, custom encodings
61 قسمت
همه قسمت ها
×
1 Maxime Labonne on Model Merging, AI Trends, and Beyond 1:06:55

1 #053 AI in the Terminal: Enhancing Coding with Warp 1:04:30

1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:51

1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:57

1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:00

1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:38

1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:34

1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:43

1 #044 Graphs Aren't Just For Specialists Anymore 1:03:34

1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:58

1 #042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:43

1 #041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:34

1 #038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:23


1 #022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) 46:05





به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.