Skip to main content

dbt Testing Strategies Before Feeding Data to LLMs: Preventing Garbage-In, Garbage-Out

· 5 min read
Metadata Morph
AI & Data Engineering Team

An AI agent is only as reliable as the data it reasons from. Feed it nulls, duplicates, or stale data and it will produce confident, coherent, and wrong answers — often without any obvious signal that something is off. The LLM doesn't know what it doesn't know.

dbt's testing framework is the right place to enforce data quality before data reaches your agents. This post covers a layered testing strategy that catches the most common failure modes before they become AI failures.

Why Data Quality Failures Are Worse for AI Than for BI

When a dashboard shows wrong numbers, a human analyst usually notices — the number looks off, or they cross-reference it. When an AI agent reasons from wrong data, the error is amplified and obscured by a fluent, confident narrative. The agent doesn't second-guess the data it receives.

Common data failures and their AI consequences:

Data failureBI consequenceAI agent consequence
Duplicate rows in fct_revenueRevenue chart looks inflatedAgent reports revenue 2× actual, presents it as fact
Null in conversion_rateDashboard shows blankAgent skips or fabricates the metric
Stale source (6h delay)Dashboard shows yesterdayAgent gives confident answer about the wrong time period
Referential integrity breakOrphaned rows in reportAgent reasoning about joined data is corrupted
Wrong data type (string as amount)Query errorAgent silently uses $0 or crashes

Layer 1: Built-in dbt Tests

dbt ships with four generic tests that every model should have on its key columns:

# models/schema.yml

models:
- name: fct_daily_orders
description: "One row per order, per day"
columns:
- name: order_id
tests:
- unique # no duplicates
- not_null # every row has a value
- name: user_id
tests:
- not_null
- relationships: # referential integrity
to: ref('dim_users')
field: user_id
- name: status
tests:
- accepted_values: # enumeration validation
values: ['placed', 'paid', 'shipped', 'cancelled', 'refunded']
- name: amount_usd
tests:
- not_null

These four tests catch the most common structural failures. Run them on every model that feeds an AI agent, not just the final mart models. Failures in upstream staging models propagate silently to downstream AI inputs.

Layer 2: Custom Singular Tests for Business Rules

Generic tests catch structural failures. Singular tests enforce business logic that's specific to your domain.

-- tests/assert_no_negative_revenue.sql
-- Fails if any order has a negative amount (indicates data entry or ETL bug)
SELECT
order_id,
amount_usd
FROM {{ ref('fct_daily_orders') }}
WHERE amount_usd < 0
-- tests/assert_revenue_within_expected_range.sql
-- Fails if daily revenue is more than 3× the 30-day average (likely a data error)
WITH daily_revenue AS (
SELECT
order_date,
SUM(amount_usd) AS total_revenue
FROM {{ ref('fct_daily_orders') }}
GROUP BY order_date
),
baseline AS (
SELECT AVG(total_revenue) AS avg_revenue
FROM daily_revenue
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
AND order_date < CURRENT_DATE
)
SELECT
d.order_date,
d.total_revenue,
b.avg_revenue
FROM daily_revenue d, baseline b
WHERE d.order_date = CURRENT_DATE - INTERVAL '1 day'
AND d.total_revenue > b.avg_revenue * 3
-- tests/assert_agent_input_freshness.sql
-- Fails if the data the agent will receive is more than 4 hours old
SELECT
MAX(updated_at) AS latest_record,
CURRENT_TIMESTAMP AS check_time,
DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP) AS hours_stale
FROM {{ ref('mart_agent_context') }}
HAVING hours_stale > 4

Layer 3: dbt-expectations for Statistical Tests

The dbt-expectations package extends dbt with statistical tests that catch anomalies generic tests miss:

# packages.yml
packages:
- package: calogica/dbt_expectations
version: [">=0.10.0", "<0.11.0"]
# models/schema.yml
models:
- name: fct_daily_orders
tests:
# Table-level tests
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1000 # fail if fewer than 1,000 orders (ingestion failure)
max_value: 1000000 # fail if more than 1M orders (duplication)

- dbt_expectations.expect_table_row_count_to_equal_other_table:
compare_model: ref('stg_orders') # staging → mart count parity check

columns:
- name: amount_usd
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0.01
max_value: 100000 # flag suspiciously large single orders
- dbt_expectations.expect_column_mean_to_be_between:
min_value: 80 # average order value sanity check
max_value: 500
- dbt_expectations.expect_column_stdev_to_be_between:
min_value: 10
max_value: 200

- name: conversion_rate
tests:
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0.0
max_value: 1.0 # conversion rate must be a valid probability

Layer 4: Agent-Specific Data Contracts

For models that feed directly into AI agents, define an explicit data contract as a separate schema file. This documents what the agent expects and makes it easy to verify:

# models/agent_inputs/schema.yml

models:
- name: mart_agent_context
description: >
The primary context table consumed by the KPI commentary agent.
All columns are required — the agent will fail silently on nulls.
This model must pass all tests before the agent runs.
meta:
agent_consumer: kpi_commentary_agent
max_acceptable_staleness_hours: 4
owner: data-engineering@company.com

tests:
- dbt_expectations.expect_table_row_count_to_be_between:
min_value: 1
max_value: 10000

columns:
- name: metric_name
tests:
- not_null
- unique
- accepted_values:
values:
- revenue
- gross_margin
- conversion_rate
- churn_rate
- cac

- name: current_value
tests:
- not_null
- dbt_expectations.expect_column_values_to_not_be_null

- name: baseline_30d
tests:
- not_null

- name: updated_at
tests:
- not_null

Gating the Agent on Test Results

The dbt test suite should gate whether the agent runs at all. In Airflow:

from airflow.decorators import dag, task
from airflow.operators.bash import BashOperator

@dag(schedule='0 6 * * *', start_date=datetime(2025, 10, 27))
def agent_pipeline_with_quality_gate():

run_dbt_tests = BashOperator(
task_id='run_dbt_tests',
bash_command='dbt test --select mart_agent_context+ --store-failures',
)

@task
def check_test_results() -> bool:
"""Parse dbt run_results.json and fail fast if any test failed."""
import json
with open('target/run_results.json') as f:
results = json.load(f)

failed = [
r for r in results['results']
if r['status'] == 'fail' and r['unique_id'].startswith('test.')
]

if failed:
failed_names = [r['unique_id'] for r in failed]
raise ValueError(
f"Data quality gate failed. Agent will not run.\n"
f"Failed tests: {failed_names}"
)
return True

@task
def run_agent(gate_passed: bool):
if gate_passed:
execute_kpi_commentary_agent()

gate = check_test_results(run_dbt_tests)
run_agent(gate)

agent_pipeline_with_quality_gate()

If any test tagged to mart_agent_context or its upstream models fails, the agent doesn't run. The data engineering team gets an alert; the business gets no bad output.

What This Prevents

This four-layer testing strategy catches:

  • Structural failures (nulls, duplicates, type errors) — Layer 1
  • Business logic violations (negative revenue, impossible values) — Layer 2
  • Statistical anomalies (row count drops, mean shifts, outlier values) — Layer 3
  • AI-specific contract violations (stale data, missing required metrics) — Layer 4

The LLM never sees data that would cause it to produce a wrong, hallucinated, or misleading answer — because the pipeline refuses to deliver it.

Book a strategy session to build your data quality layer.