dbt Testing Guide
Data Testing with dbt
Overview
dbt provides a robust testing framework for data validation. Unlike traditional testing frameworks, dbt tests are defined in SQL and run against transformed data, ensuring data quality throughout the transformation pipeline.
dbt Test Types
Test Categories
Test Types:
- Generic Tests: Pre-built tests (not_null, unique, relationships)
- Singular Tests: Custom SQL tests (single assertion)
- Data Tests: Test data across rows (business rules)
Generic Tests
Using Generic Tests
version: 2
models: - name: customers columns: - name: customer_id description: "Unique customer identifier" tests: - unique - not_null
- name: customer_email description: "Customer email address" tests: - unique - not_null - email:
- name: customer_segment description: "Customer segment (tier)" tests: - not_null - accepted_values: values: ['BRONZE', 'SILVER', 'GOLD', 'PLATINUM']
- name: orders columns: - name: order_id tests: - unique - not_null
- name: customer_id tests: - not_null - relationships: to: ref('customers') field: customer_id
- name: order_date tests: - not_null - dbt_utils.expression_is_true: expression: "order_date <= CURRENT_DATE"
- name: order_status tests: - not_null - accepted_values: values: ['PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED']
- name: total_amount tests: - not_null - dbt_utils.expression_is_true: expression: "total_amount >= 0"Built-in Generic Tests
-- tests/generic/not_null.sql-- Ensure column has no null values
SELECT id, customer_email, order_dateFROM {{ this }}WHERE customer_email IS NULL OR order_date IS NULLLIMIT 1
-- tests/generic/unique.sql-- Ensure column values are unique
SELECT customer_email, COUNT(*) AS num_violationsFROM {{ this }}GROUP BY customer_emailHAVING COUNT(*) > 1
-- tests/generic/relationships.sql-- Ensure referential integrity
SELECT child.customer_id, COUNT(*) AS num_violationsFROM {{ this }} AS childLEFT JOIN {{ ref('customers') }} AS parent ON child.customer_id = parent.customer_idWHERE parent.customer_id IS NULLGROUP BY child.customer_idSingular Tests
Custom SQL Tests
-- tests/singular/check_positive_amounts.sql
-- Test: All order amounts should be positiveSELECT order_id, total_amountFROM {{ ref('orders') }}WHERE total_amount < 0LIMIT 1
-- tests/singular/check_email_format.sql
-- Test: Email addresses should be valid formatSELECT customer_id, customer_emailFROM {{ ref('customers') }}WHERE customer_email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'LIMIT 1
-- tests/singular/check_future_dates.sql
-- Test: Order dates should not be in the futureSELECT order_id, order_dateFROM {{ ref('orders') }}WHERE order_date > CURRENT_DATELIMIT 1
-- tests/singular/check_duplicate_orders.sql
-- Test: No duplicate orders (same customer, same date, same total)SELECT customer_id, order_date, total_amount, COUNT(*) AS num_violationsFROM {{ ref('orders') }}GROUP BY customer_id, order_date, total_amountHAVING COUNT(*) > 1Data Tests
Row-Level Tests
-- tests/data/check_customer_orders.sql
-- Test: Every customer should have at least one orderSELECT customer_id, customer_name, COUNT(order_id) AS order_countFROM {{ ref('customers') }} AS cLEFT JOIN {{ ref('orders') }} AS o ON c.customer_id = o.customer_idGROUP BY customer_id, customer_nameHAVING COUNT(order_id) = 0
-- tests/data/check_high_value_orders.sql
-- Test: Flag high-value orders for reviewSELECT order_id, customer_id, total_amountFROM {{ ref('orders') }}WHERE total_amount > 10000 -- Flag orders over $10kAggregation Tests
-- tests/data/check_daily_sales_variance.sql
-- Test: Daily sales shouldn't vary more than 50% from averageWITH daily_sales AS ( SELECT DATE(order_date) AS sale_date, SUM(total_amount) AS daily_total FROM {{ ref('orders') }} GROUP BY DATE(order_date)),avg_sales AS ( SELECT AVG(daily_total) AS avg_daily_total, STDDEV(daily_total) AS stddev_daily_total FROM daily_sales)SELECT sale_date, daily_total, avg_daily_total, stddev_daily_totalFROM daily_salesCROSS JOIN avg_salesWHERE ABS(daily_total - avg_daily_total) > (avg_daily_total * 0.5)
-- tests/data/check_product_availability.sql
-- Test: All products should have at least one order in last 30 daysSELECT p.product_id, p.product_name, COUNT(o.order_id) AS order_countFROM {{ ref('products') }} AS pLEFT JOIN {{ ref('order_items') }} AS o ON p.product_id = o.product_id AND o.created_at >= CURRENT_DATE - INTERVAL '30 days'GROUP BY p.product_id, p.product_nameHAVING COUNT(o.order_id) = 0Test Configuration
dbt_project.yml
name: 'my_data_platform'version: '1.0.0'config-version: 2
profile: 'my_data_platform'
model-paths: ["models"]seed-paths: ["seeds"]test-paths: ["tests"]analysis-paths: ["analyses"]macro-paths: ["macros"]
target-path: "target"clean-targets: - "target" - "dbt_packages"
models: my_data_platform: +schema_name: analytics +materialized: table marts: +materialized: view
tests: +schema_name: analytics_testsprofiles.yml
my_data_platform: outputs: dev: type: postgres host: localhost port: 5432 user: data_user pass: data_pass dbname: data_dev schema: analytics threads: 4
prod: type: postgres host: prod-db.example.com port: 5432 user: data_user pass: "{{ env_var('DB_PASSWORD') }}" dbname: data_prod schema: analytics threads: 16
target: devRunning Tests
Command Line
# Run all testsdbt test
# Run specific model testsdbt test --select customers
# Run specific testdbt test --select not_null_customers_customer_email
# Run tests with verbose outputdbt test --verbose
# Run tests and store resultsdbt test --store-failures
# Run tests in productiondbt test --profile prod
# Run tests with schema validationdbt test --schemaTest Output
$ dbt test
Running with dbt=1.5.0Found 25 models, 50 tests, 0 snapshots, 0 analyses, 311 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:25:10 | Concurrency: 4 threads (target='prod')07:25:10 |07:25:15 | 1 of 50 START test not_null_orders_order_id ................................ [RUN]07:25:15 | 1 of 50 PASS not_null_orders_order_id ............................... [PASS in 0.52s]07:25:15 |07:25:15 | 2 of 50 START test unique_orders_order_id .................................. [RUN]07:25:16 | 2 of 50 PASS unique_orders_order_id ................................ [PASS in 0.61s]07:25:16 |07:25:16 | 3 of 50 START test relationships_orders_customer_id_to_customers_customer_id [RUN]07:25:17 | 3 of 50 PASS relationships_orders_customer_id_to_customers_customer_id [PASS in 0.89s]...07:25:30 | 48 of 50 START test check_positive_amounts ................................ [RUN]07:25:31 | 48 of 50 FAIL check_positive_amounts ............................... [FAIL in 0.45s]...07:25:32 | Finished running 50 tests in 21.45s
Completed with 1 error and 0 warnings:
Failure in test check_positive_amounts (tests/singular/check_positive_amounts.sql) Got 3 results instead of 0
Done. PASS=49 WARN=0 ERROR=1 SKIP=0 TOTAL=50Test Best Practices
DO
# 1. Test all NOT NULL columnscolumns: - name: customer_id tests: - not_null
# 2. Test all foreign keyscolumns: - name: customer_id tests: - relationships: to: ref('customers') field: customer_id
# 3. Test business rulestests: - dbt_utils.expression_is_true: expression: "amount >= 0"
# 4. Use descriptive test names# check_positive_amounts (good) vs. test_1 (bad)
# 5. Group tests by severity# Critical, Warning, InformationalDON’T
# 1. Don't test everything# Focus on critical data quality rules
# 2. Don't ignore test failures# Always investigate and fix
# 3. Don't test the same thing multiple times# Avoid redundant tests
# 4. Don't create tests that are always passing# Tests should add value
# 5. Don't test in production without testing in dev first# Test locally before deployingTest Documentation
Documenting Tests
models: - name: customers description: "Customer dimension table" columns: - name: customer_id description: "Unique customer identifier" tests: - unique: description: "Each customer should have a unique ID" - not_null: description: "Customer ID should never be null"
- name: customer_email description: "Customer email address" tests: - unique: description: "Email should be unique across customers" - not_null: description: "Email should never be null" - email: description: "Email should be valid format"Key Takeaways
- Generic tests: Pre-built tests (not_null, unique, relationships)
- Singular tests: Custom SQL tests for specific business rules
- Data tests: Test data across rows and aggregations
- Schema tests: Test in schema.yml file
- File tests: Create .sql files in tests/ directory
- Test execution: Run tests with
dbt test - Test documentation: Document test purpose in schema.yml
- CI/CD: Integrate tests into CI/CD pipeline
Back to Module 4