dbt Testing Strategies Before Feeding Data to LLMs: Preventing Garbage-In, Garbage-Out

November 27, 2025 · 5 min read

Metadata Morph

AI & Data Engineering Team

An AI agent is only as reliable as the data it reasons from. Feed it nulls, duplicates, or stale data and it will produce confident, coherent, and wrong answers — often without any obvious signal that something is off. The LLM doesn't know what it doesn't know.

dbt's testing framework is the right place to enforce data quality before data reaches your agents. This post covers a layered testing strategy that catches the most common failure modes before they become AI failures.

Why Data Quality Failures Are Worse for AI Than for BI

When a dashboard shows wrong numbers, a human analyst usually notices — the number looks off, or they cross-reference it. When an AI agent reasons from wrong data, the error is amplified and obscured by a fluent, confident narrative. The agent doesn't second-guess the data it receives.

Common data failures and their AI consequences:

Data failure	BI consequence	AI agent consequence
Duplicate rows in `fct_revenue`	Revenue chart looks inflated	Agent reports revenue 2× actual, presents it as fact
Null in `conversion_rate`	Dashboard shows blank	Agent skips or fabricates the metric
Stale source (6h delay)	Dashboard shows yesterday	Agent gives confident answer about the wrong time period
Referential integrity break	Orphaned rows in report	Agent reasoning about joined data is corrupted
Wrong data type (string as amount)	Query error	Agent silently uses $0 or crashes

Layer 1: Built-in dbt Tests

dbt ships with four generic tests that every model should have on its key columns:

# models/schema.yml

models:
  - name: fct_daily_orders
    description: "One row per order, per day"
    columns:
      - name: order_id
        tests:
          - unique              # no duplicates
          - not_null            # every row has a value
      - name: user_id
        tests:
          - not_null
          - relationships:      # referential integrity
              to: ref('dim_users')
              field: user_id
      - name: status
        tests:
          - accepted_values:    # enumeration validation
              values: ['placed', 'paid', 'shipped', 'cancelled', 'refunded']
      - name: amount_usd
        tests:
          - not_null

These four tests catch the most common structural failures. Run them on every model that feeds an AI agent, not just the final mart models. Failures in upstream staging models propagate silently to downstream AI inputs.

Layer 2: Custom Singular Tests for Business Rules

Generic tests catch structural failures. Singular tests enforce business logic that's specific to your domain.

-- tests/assert_no_negative_revenue.sql
-- Fails if any order has a negative amount (indicates data entry or ETL bug)
SELECT
    order_id,
    amount_usd
FROM {{ ref('fct_daily_orders') }}
WHERE amount_usd < 0

-- tests/assert_revenue_within_expected_range.sql
-- Fails if daily revenue is more than 3× the 30-day average (likely a data error)
WITH daily_revenue AS (
    SELECT
        order_date,
        SUM(amount_usd) AS total_revenue
    FROM {{ ref('fct_daily_orders') }}
    GROUP BY order_date
),
baseline AS (
    SELECT AVG(total_revenue) AS avg_revenue
    FROM daily_revenue
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
      AND order_date < CURRENT_DATE
)
SELECT
    d.order_date,
    d.total_revenue,
    b.avg_revenue
FROM daily_revenue d, baseline b
WHERE d.order_date = CURRENT_DATE - INTERVAL '1 day'
  AND d.total_revenue > b.avg_revenue * 3

-- tests/assert_agent_input_freshness.sql
-- Fails if the data the agent will receive is more than 4 hours old
SELECT
    MAX(updated_at) AS latest_record,
    CURRENT_TIMESTAMP AS check_time,
    DATEDIFF('hour', MAX(updated_at), CURRENT_TIMESTAMP) AS hours_stale
FROM {{ ref('mart_agent_context') }}
HAVING hours_stale > 4

Layer 3: dbt-expectations for Statistical Tests

The dbt-expectations package extends dbt with statistical tests that catch anomalies generic tests miss:

# packages.yml
packages:
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<0.11.0"]

# models/schema.yml
models:
  - name: fct_daily_orders
    tests:
      # Table-level tests
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000        # fail if fewer than 1,000 orders (ingestion failure)
          max_value: 1000000     # fail if more than 1M orders (duplication)

      - dbt_expectations.expect_table_row_count_to_equal_other_table:
          compare_model: ref('stg_orders')   # staging → mart count parity check

    columns:
      - name: amount_usd
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0.01
              max_value: 100000       # flag suspiciously large single orders
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 80           # average order value sanity check
              max_value: 500
          - dbt_expectations.expect_column_stdev_to_be_between:
              min_value: 10
              max_value: 200

      - name: conversion_rate
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0.0
              max_value: 1.0          # conversion rate must be a valid probability

Layer 4: Agent-Specific Data Contracts

For models that feed directly into AI agents, define an explicit data contract as a separate schema file. This documents what the agent expects and makes it easy to verify:

# models/agent_inputs/schema.yml

models:
  - name: mart_agent_context
    description: >
      The primary context table consumed by the KPI commentary agent.
      All columns are required — the agent will fail silently on nulls.
      This model must pass all tests before the agent runs.
    meta:
      agent_consumer: kpi_commentary_agent
      max_acceptable_staleness_hours: 4
      owner: data-engineering@company.com

    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1
          max_value: 10000

    columns:
      - name: metric_name
        tests:
          - not_null
          - unique
          - accepted_values:
              values:
                - revenue
                - gross_margin
                - conversion_rate
                - churn_rate
                - cac

      - name: current_value
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_not_be_null

      - name: baseline_30d
        tests:
          - not_null

      - name: updated_at
        tests:
          - not_null

Gating the Agent on Test Results

The dbt test suite should gate whether the agent runs at all. In Airflow:

from airflow.decorators import dag, task
from airflow.operators.bash import BashOperator

@dag(schedule='0 6 * * *', start_date=datetime(2025, 10, 27))
def agent_pipeline_with_quality_gate():

    run_dbt_tests = BashOperator(
        task_id='run_dbt_tests',
        bash_command='dbt test --select mart_agent_context+ --store-failures',
    )

    @task
    def check_test_results() -> bool:
        """Parse dbt run_results.json and fail fast if any test failed."""
        import json
        with open('target/run_results.json') as f:
            results = json.load(f)

        failed = [
            r for r in results['results']
            if r['status'] == 'fail' and r['unique_id'].startswith('test.')
        ]

        if failed:
            failed_names = [r['unique_id'] for r in failed]
            raise ValueError(
                f"Data quality gate failed. Agent will not run.\n"
                f"Failed tests: {failed_names}"
            )
        return True

    @task
    def run_agent(gate_passed: bool):
        if gate_passed:
            execute_kpi_commentary_agent()

    gate = check_test_results(run_dbt_tests)
    run_agent(gate)

agent_pipeline_with_quality_gate()

If any test tagged to mart_agent_context or its upstream models fails, the agent doesn't run. The data engineering team gets an alert; the business gets no bad output.

What This Prevents

This four-layer testing strategy catches:

Structural failures (nulls, duplicates, type errors) — Layer 1
Business logic violations (negative revenue, impossible values) — Layer 2
Statistical anomalies (row count drops, mean shifts, outlier values) — Layer 3
AI-specific contract violations (stale data, missing required metrics) — Layer 4

The LLM never sees data that would cause it to produce a wrong, hallucinated, or misleading answer — because the pipeline refuses to deliver it.

Book a strategy session to build your data quality layer.

Why Data Quality Failures Are Worse for AI Than for BI​

Layer 1: Built-in dbt Tests​

Layer 2: Custom Singular Tests for Business Rules​

Layer 3: dbt-expectations for Statistical Tests​

Layer 4: Agent-Specific Data Contracts​

Gating the Agent on Test Results​

What This Prevents​