با برنامه Player FM !
پادکست هایی که ارزش شنیدن دارند
حمایت شده
#043 Knowledge Graphs Won't Fix Bad Data
Manage episode 467611514 series 3585930
Metadata is the foundation of any enterprise knowledge graph.
By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.
The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.
Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.
Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.
Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.
A modern data catalog should:
- Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.
- Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).
- Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.
Practical Approaches & Use Cases:
- Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.
- Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.
- Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.
Technical Considerations:
- Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.
- Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.
Juan Sequeda:
- data.world
- Semantic Web for the Working Ontologist
- Designing and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)
- Catalog & Cocktails (Juan’s podcast)
Nicolay Gerold:
00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs
58 قسمت
Manage episode 467611514 series 3585930
Metadata is the foundation of any enterprise knowledge graph.
By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.
The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.
Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.
Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.
Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.
A modern data catalog should:
- Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.
- Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).
- Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.
Practical Approaches & Use Cases:
- Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.
- Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.
- Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.
Technical Considerations:
- Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.
- Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.
Juan Sequeda:
- data.world
- Semantic Web for the Working Ontologist
- Designing and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)
- Catalog & Cocktails (Juan’s podcast)
Nicolay Gerold:
00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs
58 قسمت
All episodes
×
1 #051 Build systems that can be debugged at 4am by tired humans with no context 1:05:52

1 #050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 1:06:58

1 #050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster 11:01

1 #049 BAML: The Programming Language That Turns LLMs into Predictable Functions 1:02:39

1 #049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions 1:12:35


1 #045 RAG As Two Things - Prompt Engineering and Search 1:02:44

1 #044 Graphs Aren't Just For Specialists Anymore 1:03:35

1 #043 Knowledge Graphs Won't Fix Bad Data 1:10:59

1 #042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs 1:33:44

1 #041 Context Engineering, How Knowledge Graphs Help LLMs Reason 1:33:35

1 #038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It 1:14:24

به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.