Inmon Corporate Information Factory (CIF)
Enterprise Data Warehousing Methodology
Overview
The Corporate Information Factory (CIF) is Bill Inmon’s enterprise data warehousing methodology. Unlike Kimball’s bottom-up approach (data marts first), Inmon advocates a top-down approach: build a centralized enterprise warehouse first, then derive dependent data marts.
CIF Architecture
Architecture Layers
Key Components:
- Enterprise Data Warehouse (EDW): Atomic, integrated data warehouse
- Data Marts: Dependent, summarized data for specific departments
- Operational Data Store (ODS): Integrated operational data (current state)
- ETL: Extract, Transform, Load processes
Enterprise Data Warehouse
Normalized Design
-- Enterprise Data Warehouse: 3NF (Third Normal Form)-- Characteristics: Atomic, integrated, historical
-- 1. Customers table (3NF)CREATE TABLE customers ( customer_id BIGINT PRIMARY KEY, customer_name VARCHAR(255) NOT NULL, customer_email VARCHAR(255) UNIQUE, customer_phone VARCHAR(50), address_id BIGINT, -- FK to addresses customer_segment VARCHAR(50), created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL, FOREIGN KEY (address_id) REFERENCES addresses(address_id));
-- 2. Addresses table (3NF)CREATE TABLE addresses ( address_id BIGINT PRIMARY KEY, address_line1 VARCHAR(255) NOT NULL, address_line2 VARCHAR(255), city VARCHAR(100) NOT NULL, state VARCHAR(50) NOT NULL, postal_code VARCHAR(20) NOT NULL, country VARCHAR(100) NOT NULL);
-- 3. Products table (3NF)CREATE TABLE products ( product_id BIGINT PRIMARY KEY, product_name VARCHAR(255) NOT NULL, product_category VARCHAR(100) NOT NULL, product_subcategory VARCHAR(100), supplier_id BIGINT, -- FK to suppliers unit_price DECIMAL(18,2) NOT NULL, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL, FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id));
-- 4. Orders table (3NF)CREATE TABLE orders ( order_id BIGINT PRIMARY KEY, customer_id BIGINT NOT NULL, order_date DATE NOT NULL, order_status VARCHAR(50) NOT NULL, total_amount DECIMAL(18,2) NOT NULL, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL, FOREIGN KEY (customer_id) REFERENCES customers(customer_id));
-- 5. Order items table (3NF)CREATE TABLE order_items ( order_item_id BIGINT PRIMARY KEY, order_id BIGINT NOT NULL, product_id BIGINT NOT NULL, quantity INT NOT NULL, unit_price DECIMAL(18,2) NOT NULL, discount_percent DECIMAL(5,2) DEFAULT 0, created_at TIMESTAMP NOT NULL, FOREIGN KEY (order_id) REFERENCES orders(order_id), FOREIGN KEY (product_id) REFERENCES products(product_id), CONSTRAINT uk_order_item UNIQUE (order_id, product_id));Benefits of Normalized Design
Dependent Data Marts
Data Mart Design
-- Dependent Data Mart: Derived from EDW-- Purpose: Performance-optimized, subject-specific
-- Sales data mart (denormalized for performance)CREATE TABLE mart_sales_sales_summary ASSELECT o.order_id, c.customer_id, c.customer_name, c.customer_segment, p.product_id, p.product_name, p.product_category, o.order_date, o.order_status, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price * (1 - oi.discount_percent / 100) AS line_total, o.total_amount AS order_totalFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_id;
-- Inventory data mart (denormalized for performance)CREATE TABLE mart_inventory_current ASSELECT p.product_id, p.product_name, p.product_category, s.supplier_id, s.supplier_name, i.quantity_on_hand, i.reorder_level, i.last_restock_date, CURRENT_TIMESTAMP AS as_of_dateFROM products pJOIN suppliers s ON p.supplier_id = s.supplier_idJOIN inventory i ON p.product_id = i.product_idWHERE i.as_of_date = (SELECT MAX(as_of_date) FROM inventory);
-- Benefits:-- - Optimized for specific queries-- - Faster queries (denormalized)-- - Derived from EDW (single source of truth)-- - Refreshed periodically (daily, hourly)Operational Data Store (ODS)
ODS Design
-- Operational Data Store: Current, integrated operational data-- Purpose: Real-time operational reporting
CREATE TABLE ods_customer_current ASSELECT DISTINCT ON (customer_id) customer_id, customer_name, customer_email, customer_phone, address_line1, city, state, postal_code, customer_status, CURRENT_TIMESTAMP AS ods_load_timestampFROM ( -- Union all customer data from source systems SELECT customer_id, customer_name, customer_email, customer_phone, address_line1, city, state, postal_code, customer_status, updated_at FROM ecommerce.customers
UNION ALL
SELECT account_id AS customer_id, account_name AS customer_name, email AS customer_email, phone AS customer_phone, address_line1, city, state, postal_code, status AS customer_status, updated_at FROM crm.accounts) source_customersORDER BY customer_id, updated_at DESC;
-- Benefits:-- - Real-time view of current state-- - Integrated across source systems-- - No history (current state only)-- - Used for operational dashboardsCIF ETL Patterns
ETL to EDW
-- ETL Process: Operational Systems → EDW
-- Step 1: Extract from source systemsCREATE TABLE stg_ecommerce_orders ASSELECT *FROM ecommerce.ordersWHERE updated_at >= COALESCE( (SELECT MAX(etl_run_timestamp) FROM etl_log WHERE source_system = 'ECOMMERCE' AND target_table = 'orders'), '1900-01-01'::timestamp);
CREATE TABLE stg_crm_orders ASSELECT *FROM crm.ordersWHERE updated_at >= COALESCE( (SELECT MAX(etl_run_timestamp) FROM etl_log WHERE source_system = 'CRM' AND target_table = 'orders'), '1900-01-01'::timestamp);
-- Step 2: Transform (apply business rules)CREATE TABLE stg_orders_transformed ASSELECT COALESCE(ecommerce_order_id, crm_order_id) AS order_id, COALESCE(ecommerce_customer_id, crm_customer_id) AS customer_id, COALESCE(ecommerce_order_date, crm_order_date) AS order_date, COALESCE(ecommerce_status, crm_status) AS order_status, COALESCE(ecommerce_total, crm_total) AS total_amountFROM ( SELECT order_id AS ecommerce_order_id, customer_id AS ecommerce_customer_id, order_date AS ecommerce_order_date, status AS ecommerce_status, total_amount AS ecommerce_total, NULL AS crm_order_id, NULL AS crm_customer_id, NULL AS crm_order_date, NULL AS crm_status, NULL AS crm_total FROM stg_ecommerce_orders
UNION ALL
SELECT NULL AS ecommerce_order_id, NULL AS ecommerce_customer_id, NULL AS ecommerce_order_date, NULL AS ecommerce_status, NULL AS ecommerce_total, order_id AS crm_order_id, account_id AS crm_customer_id, order_date AS crm_order_date, status AS crm_status, total_amount AS crm_total FROM stg_crm_orders) combined_orders;
-- Step 3: Load into EDW (SCD Type 2)INSERT INTO orders (order_id, customer_id, order_date, order_status, total_amount, updated_at)SELECT order_id, customer_id, order_date, order_status, total_amount, CURRENT_TIMESTAMP AS updated_atFROM stg_orders_transformedON CONFLICT (order_id) DO UPDATE SET customer_id = EXCLUDED.customer_id, order_date = EXCLUDED.order_date, order_status = EXCLUDED.order_status, total_amount = EXCLUDED.total_amount, updated_at = CURRENT_TIMESTAMP;
-- Log ETL runINSERT INTO etl_log (source_system, target_table, etl_run_timestamp, rows_processed)SELECT 'ECOMMERCE', 'orders', CURRENT_TIMESTAMP, COUNT(*) FROM stg_ecommerce_orders;ETL to Data Marts
-- ETL Process: EDW → Dependent Data Marts
-- Refresh sales data mart (daily)DELETE FROM mart_sales_sales_summary;
INSERT INTO mart_sales_sales_summarySELECT o.order_id, c.customer_id, c.customer_name, c.customer_segment, p.product_id, p.product_name, p.product_category, o.order_date, o.order_status, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price * (1 - oi.discount_percent / 100) AS line_total, o.total_amountFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_idWHERE o.order_date = CURRENT_DATE - INTERVAL '1 day';
-- Refresh ODS (near real-time)REFRESH MATERIALIZED VIEW CONCURRENTLY ods_customer_current;CIF vs. Kimball
| Dimension | Inmon CIF | Kimball |
|---|---|---|
| Approach | Top-down (EDW first) | Bottom-up (marts first) |
| EDW Design | 3NF (normalized) | Star schema (denormalized) |
| Data Marts | Dependent (from EDW) | Independent (from sources) |
| Time to value | Slow (EDW first) | Fast (marts first) |
| Agility | Low | High |
| Integration | Centralized | Decentralized |
| Best for | Enterprise, regulated industries | Most organizations |
SCD (Slowly Changing Dimensions)
SCD Type 2 in EDW
-- Slowly Changing Dimension Type 2: Track history in EDW-- Add effective dates to track changes
CREATE TABLE customers_sc2 ( customer_id BIGINT NOT NULL, customer_name VARCHAR(255) NOT NULL, customer_email VARCHAR(255), customer_segment VARCHAR(50), effective_from TIMESTAMP NOT NULL, effective_to TIMESTAMP, -- NULL = current record is_current BOOLEAN DEFAULT TRUE, created_at TIMESTAMP NOT NULL, PRIMARY KEY (customer_id, effective_from));
-- Insert new record (close old record, open new one)BEGIN;
-- 1. Close old recordUPDATE customers_sc2SET effective_to = CURRENT_TIMESTAMP, is_current = FALSEWHERE customer_id = 12345 AND is_current = TRUE;
-- 2. Insert new recordINSERT INTO customers_sc2 (customer_id, customer_name, customer_email, customer_segment, effective_from, is_current, created_at)SELECT 12345 AS customer_id, 'New Name' AS customer_name, 'new@email.com' AS customer_email, 'PREMIUM' AS customer_segment, CURRENT_TIMESTAMP AS effective_from, TRUE AS is_current, CURRENT_TIMESTAMP AS created_at;
COMMIT;
-- Query current recordsSELECT * FROM customers_sc2 WHERE is_current = TRUE;CIF Best Practices
DO
-- 1. Start with EDW (top-down)-- Build enterprise warehouse first, then marts
-- 2. Use 3NF for EDW-- Normalize for flexibility and integrity
-- 3. Use star schema for data marts-- Denormalize for performance
-- 4. Track ETL runs-- Log all ETL processes for audit
-- 5. Use staging tables-- Stage data before loading into EDWDON’T
-- 1. Don't build independent marts first-- Inmon: EDW first, then dependent marts
-- 2. Don't denormalize EDW-- Keep EDW normalized (3NF)
-- 3. Don't skip ODS for operational needs-- Use ODS for real-time operational reporting
-- 4. Don't ignore integration-- Centralized integration is key
-- 5. Don't forget history-- Track all changes in EDWKey Takeaways
- Top-down: Build EDW first, then data marts
- Normalized: EDW uses 3NF for flexibility
- Dependent marts: Data marts derived from EDW
- ODS: Operational Data Store for real-time needs
- Integration: Centralized integration across enterprise
- History: Complete audit trail in EDW
- Performance: Denormalize in data marts
- Use When: Enterprise, regulated industries, centralized governance
Back to Module 4