Skip to main content

5 posts tagged with "dbt"

View All Tags

Building an AI Data Layer on Top of Your Existing Data Lake and Warehouse

· 6 min read
Metadata Morph
AI & Data Engineering Team

Your data lake and warehouse already hold the answers your business needs. The missing layer isn't more data — it's an intelligent orchestration layer that lets AI agents query, reason, and act on that data reliably.

This post walks through a production-ready architecture that uses dbt as a semantic manifest, Model Context Protocol (MCP) servers as the access layer, and multiple specialized agents to turn your existing Snowflake, Redshift, or BigQuery investment into an active, AI-driven intelligence system.

dbt Testing Strategies Before Feeding Data to LLMs: Preventing Garbage-In, Garbage-Out

· 5 min read
Metadata Morph
AI & Data Engineering Team

An AI agent is only as reliable as the data it reasons from. Feed it nulls, duplicates, or stale data and it will produce confident, coherent, and wrong answers — often without any obvious signal that something is off. The LLM doesn't know what it doesn't know.

dbt's testing framework is the right place to enforce data quality before data reaches your agents. This post covers a layered testing strategy that catches the most common failure modes before they become AI failures.

Self-Writing Data Quality Reports: An Agent That Monitors Your Pipelines Overnight

· 4 min read
Metadata Morph
AI & Data Engineering Team

Every data team has the same Monday morning ritual: someone checks whether last night's pipelines ran cleanly, hunts through logs for failures, and manually compiles a status update for stakeholders. It's important work — and it's entirely automatable.

A data quality reporting agent runs overnight, checks every layer of your pipeline, and delivers a clear, human-readable report before anyone opens their laptop. When something is wrong, the report explains what failed, what downstream models are affected, and what the likely cause is.

Building a RAG Pipeline on Your Existing Data Warehouse

· 6 min read
Metadata Morph
AI & Data Engineering Team

The most common failure mode in enterprise AI projects is asking an LLM questions about your business data and getting confidently wrong answers. The model doesn't know your revenue figures, your customer data, or your internal processes — it only knows what it was trained on.

Retrieval-Augmented Generation (RAG) fixes this by giving the model the relevant context it needs at query time, retrieved from your actual data. The surprising part: you probably don't need a new data infrastructure to do it. Your existing warehouse already has the data — you just need the retrieval layer on top.

Ingesting Massive Data Loads: Patterns for High-Performance Batch Pipelines

· 6 min read
Metadata Morph
AI & Data Engineering Team

Moving data from source systems into your lake or warehouse sounds simple until you're doing it at scale. A pipeline that works fine at 10M rows starts breaking at 1B — queries time out, storage costs spike, and the pipeline window that should take 2 hours starts taking 14.

This post covers the patterns that separate pipelines that scale from pipelines that collapse under their own weight.