Skip to content

dbt Testing Guide

Data Testing with dbt


Overview

dbt provides a robust testing framework for data validation. Unlike traditional testing frameworks, dbt tests are defined in SQL and run against transformed data, ensuring data quality throughout the transformation pipeline.


dbt Test Types

Test Categories

Test Types:

  • Generic Tests: Pre-built tests (not_null, unique, relationships)
  • Singular Tests: Custom SQL tests (single assertion)
  • Data Tests: Test data across rows (business rules)

Generic Tests

Using Generic Tests

schema.yml
version: 2
models:
- name: customers
columns:
- name: customer_id
description: "Unique customer identifier"
tests:
- unique
- not_null
- name: customer_email
description: "Customer email address"
tests:
- unique
- not_null
- email:
- name: customer_segment
description: "Customer segment (tier)"
tests:
- not_null
- accepted_values:
values: ['BRONZE', 'SILVER', 'GOLD', 'PLATINUM']
- name: orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- relationships:
to: ref('customers')
field: customer_id
- name: order_date
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "order_date <= CURRENT_DATE"
- name: order_status
tests:
- not_null
- accepted_values:
values: ['PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']
- name: total_amount
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "total_amount >= 0"

Built-in Generic Tests

-- tests/generic/not_null.sql
-- Ensure column has no null values
SELECT
id,
customer_email,
order_date
FROM {{ this }}
WHERE customer_email IS NULL
OR order_date IS NULL
LIMIT 1
-- tests/generic/unique.sql
-- Ensure column values are unique
SELECT
customer_email,
COUNT(*) AS num_violations
FROM {{ this }}
GROUP BY customer_email
HAVING COUNT(*) > 1
-- tests/generic/relationships.sql
-- Ensure referential integrity
SELECT
child.customer_id,
COUNT(*) AS num_violations
FROM {{ this }} AS child
LEFT JOIN {{ ref('customers') }} AS parent
ON child.customer_id = parent.customer_id
WHERE parent.customer_id IS NULL
GROUP BY child.customer_id

Singular Tests

Custom SQL Tests

-- tests/singular/check_positive_amounts.sql
-- Test: All order amounts should be positive
SELECT
order_id,
total_amount
FROM {{ ref('orders') }}
WHERE total_amount < 0
LIMIT 1
-- tests/singular/check_email_format.sql
-- Test: Email addresses should be valid format
SELECT
customer_id,
customer_email
FROM {{ ref('customers') }}
WHERE customer_email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
LIMIT 1
-- tests/singular/check_future_dates.sql
-- Test: Order dates should not be in the future
SELECT
order_id,
order_date
FROM {{ ref('orders') }}
WHERE order_date > CURRENT_DATE
LIMIT 1
-- tests/singular/check_duplicate_orders.sql
-- Test: No duplicate orders (same customer, same date, same total)
SELECT
customer_id,
order_date,
total_amount,
COUNT(*) AS num_violations
FROM {{ ref('orders') }}
GROUP BY customer_id, order_date, total_amount
HAVING COUNT(*) > 1

Data Tests

Row-Level Tests

-- tests/data/check_customer_orders.sql
-- Test: Every customer should have at least one order
SELECT
customer_id,
customer_name,
COUNT(order_id) AS order_count
FROM {{ ref('customers') }} AS c
LEFT JOIN {{ ref('orders') }} AS o
ON c.customer_id = o.customer_id
GROUP BY customer_id, customer_name
HAVING COUNT(order_id) = 0
-- tests/data/check_high_value_orders.sql
-- Test: Flag high-value orders for review
SELECT
order_id,
customer_id,
total_amount
FROM {{ ref('orders') }}
WHERE total_amount > 10000 -- Flag orders over $10k

Aggregation Tests

-- tests/data/check_daily_sales_variance.sql
-- Test: Daily sales shouldn't vary more than 50% from average
WITH daily_sales AS (
SELECT
DATE(order_date) AS sale_date,
SUM(total_amount) AS daily_total
FROM {{ ref('orders') }}
GROUP BY DATE(order_date)
),
avg_sales AS (
SELECT
AVG(daily_total) AS avg_daily_total,
STDDEV(daily_total) AS stddev_daily_total
FROM daily_sales
)
SELECT
sale_date,
daily_total,
avg_daily_total,
stddev_daily_total
FROM daily_sales
CROSS JOIN avg_sales
WHERE ABS(daily_total - avg_daily_total) > (avg_daily_total * 0.5)
-- tests/data/check_product_availability.sql
-- Test: All products should have at least one order in last 30 days
SELECT
p.product_id,
p.product_name,
COUNT(o.order_id) AS order_count
FROM {{ ref('products') }} AS p
LEFT JOIN {{ ref('order_items') }} AS o
ON p.product_id = o.product_id
AND o.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY p.product_id, p.product_name
HAVING COUNT(o.order_id) = 0

Test Configuration

dbt_project.yml

dbt_project.yml
name: 'my_data_platform'
version: '1.0.0'
config-version: 2
profile: 'my_data_platform'
model-paths: ["models"]
seed-paths: ["seeds"]
test-paths: ["tests"]
analysis-paths: ["analyses"]
macro-paths: ["macros"]
target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
models:
my_data_platform:
+schema_name: analytics
+materialized: table
marts:
+materialized: view
tests:
+schema_name: analytics_tests

profiles.yml

profiles.yml
my_data_platform:
outputs:
dev:
type: postgres
host: localhost
port: 5432
user: data_user
pass: data_pass
dbname: data_dev
schema: analytics
threads: 4
prod:
type: postgres
host: prod-db.example.com
port: 5432
user: data_user
pass: "{{ env_var('DB_PASSWORD') }}"
dbname: data_prod
schema: analytics
threads: 16
target: dev

Running Tests

Command Line

Terminal window
# Run all tests
dbt test
# Run specific model tests
dbt test --select customers
# Run specific test
dbt test --select not_null_customers_customer_email
# Run tests with verbose output
dbt test --verbose
# Run tests and store results
dbt test --store-failures
# Run tests in production
dbt test --profile prod
# Run tests with schema validation
dbt test --schema

Test Output

Terminal window
$ dbt test
Running with dbt=1.5.0
Found 25 models, 50 tests, 0 snapshots, 0 analyses, 311 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:25:10 | Concurrency: 4 threads (target='prod')
07:25:10 |
07:25:15 | 1 of 50 START test not_null_orders_order_id ................................ [RUN]
07:25:15 | 1 of 50 PASS not_null_orders_order_id ............................... [PASS in 0.52s]
07:25:15 |
07:25:15 | 2 of 50 START test unique_orders_order_id .................................. [RUN]
07:25:16 | 2 of 50 PASS unique_orders_order_id ................................ [PASS in 0.61s]
07:25:16 |
07:25:16 | 3 of 50 START test relationships_orders_customer_id_to_customers_customer_id [RUN]
07:25:17 | 3 of 50 PASS relationships_orders_customer_id_to_customers_customer_id [PASS in 0.89s]
...
07:25:30 | 48 of 50 START test check_positive_amounts ................................ [RUN]
07:25:31 | 48 of 50 FAIL check_positive_amounts ............................... [FAIL in 0.45s]
...
07:25:32 | Finished running 50 tests in 21.45s
Completed with 1 error and 0 warnings:
Failure in test check_positive_amounts (tests/singular/check_positive_amounts.sql)
Got 3 results instead of 0
Done. PASS=49 WARN=0 ERROR=1 SKIP=0 TOTAL=50

Test Best Practices

DO

# 1. Test all NOT NULL columns
columns:
- name: customer_id
tests:
- not_null
# 2. Test all foreign keys
columns:
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: customer_id
# 3. Test business rules
tests:
- dbt_utils.expression_is_true:
expression: "amount >= 0"
# 4. Use descriptive test names
# check_positive_amounts (good) vs. test_1 (bad)
# 5. Group tests by severity
# Critical, Warning, Informational

DON’T

# 1. Don't test everything
# Focus on critical data quality rules
# 2. Don't ignore test failures
# Always investigate and fix
# 3. Don't test the same thing multiple times
# Avoid redundant tests
# 4. Don't create tests that are always passing
# Tests should add value
# 5. Don't test in production without testing in dev first
# Test locally before deploying

Test Documentation

Documenting Tests

schema.yml
models:
- name: customers
description: "Customer dimension table"
columns:
- name: customer_id
description: "Unique customer identifier"
tests:
- unique:
description: "Each customer should have a unique ID"
- not_null:
description: "Customer ID should never be null"
- name: customer_email
description: "Customer email address"
tests:
- unique:
description: "Email should be unique across customers"
- not_null:
description: "Email should never be null"
- email:
description: "Email should be valid format"

Key Takeaways

  1. Generic tests: Pre-built tests (not_null, unique, relationships)
  2. Singular tests: Custom SQL tests for specific business rules
  3. Data tests: Test data across rows and aggregations
  4. Schema tests: Test in schema.yml file
  5. File tests: Create .sql files in tests/ directory
  6. Test execution: Run tests with dbt test
  7. Test documentation: Document test purpose in schema.yml
  8. CI/CD: Integrate tests into CI/CD pipeline

Back to Module 4