Skip to content

Inmon Corporate Information Factory (CIF)

Enterprise Data Warehousing Methodology


Overview

The Corporate Information Factory (CIF) is Bill Inmon’s enterprise data warehousing methodology. Unlike Kimball’s bottom-up approach (data marts first), Inmon advocates a top-down approach: build a centralized enterprise warehouse first, then derive dependent data marts.


CIF Architecture

Architecture Layers

Key Components:

  • Enterprise Data Warehouse (EDW): Atomic, integrated data warehouse
  • Data Marts: Dependent, summarized data for specific departments
  • Operational Data Store (ODS): Integrated operational data (current state)
  • ETL: Extract, Transform, Load processes

Enterprise Data Warehouse

Normalized Design

-- Enterprise Data Warehouse: 3NF (Third Normal Form)
-- Characteristics: Atomic, integrated, historical
-- 1. Customers table (3NF)
CREATE TABLE customers (
customer_id BIGINT PRIMARY KEY,
customer_name VARCHAR(255) NOT NULL,
customer_email VARCHAR(255) UNIQUE,
customer_phone VARCHAR(50),
address_id BIGINT, -- FK to addresses
customer_segment VARCHAR(50),
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
FOREIGN KEY (address_id) REFERENCES addresses(address_id)
);
-- 2. Addresses table (3NF)
CREATE TABLE addresses (
address_id BIGINT PRIMARY KEY,
address_line1 VARCHAR(255) NOT NULL,
address_line2 VARCHAR(255),
city VARCHAR(100) NOT NULL,
state VARCHAR(50) NOT NULL,
postal_code VARCHAR(20) NOT NULL,
country VARCHAR(100) NOT NULL
);
-- 3. Products table (3NF)
CREATE TABLE products (
product_id BIGINT PRIMARY KEY,
product_name VARCHAR(255) NOT NULL,
product_category VARCHAR(100) NOT NULL,
product_subcategory VARCHAR(100),
supplier_id BIGINT, -- FK to suppliers
unit_price DECIMAL(18,2) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
);
-- 4. Orders table (3NF)
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
order_date DATE NOT NULL,
order_status VARCHAR(50) NOT NULL,
total_amount DECIMAL(18,2) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- 5. Order items table (3NF)
CREATE TABLE order_items (
order_item_id BIGINT PRIMARY KEY,
order_id BIGINT NOT NULL,
product_id BIGINT NOT NULL,
quantity INT NOT NULL,
unit_price DECIMAL(18,2) NOT NULL,
discount_percent DECIMAL(5,2) DEFAULT 0,
created_at TIMESTAMP NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id),
CONSTRAINT uk_order_item UNIQUE (order_id, product_id)
);

Benefits of Normalized Design


Dependent Data Marts

Data Mart Design

-- Dependent Data Mart: Derived from EDW
-- Purpose: Performance-optimized, subject-specific
-- Sales data mart (denormalized for performance)
CREATE TABLE mart_sales_sales_summary AS
SELECT
o.order_id,
c.customer_id,
c.customer_name,
c.customer_segment,
p.product_id,
p.product_name,
p.product_category,
o.order_date,
o.order_status,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price * (1 - oi.discount_percent / 100) AS line_total,
o.total_amount AS order_total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;
-- Inventory data mart (denormalized for performance)
CREATE TABLE mart_inventory_current AS
SELECT
p.product_id,
p.product_name,
p.product_category,
s.supplier_id,
s.supplier_name,
i.quantity_on_hand,
i.reorder_level,
i.last_restock_date,
CURRENT_TIMESTAMP AS as_of_date
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
JOIN inventory i ON p.product_id = i.product_id
WHERE i.as_of_date = (SELECT MAX(as_of_date) FROM inventory);
-- Benefits:
-- - Optimized for specific queries
-- - Faster queries (denormalized)
-- - Derived from EDW (single source of truth)
-- - Refreshed periodically (daily, hourly)

Operational Data Store (ODS)

ODS Design

-- Operational Data Store: Current, integrated operational data
-- Purpose: Real-time operational reporting
CREATE TABLE ods_customer_current AS
SELECT DISTINCT ON (customer_id)
customer_id,
customer_name,
customer_email,
customer_phone,
address_line1,
city,
state,
postal_code,
customer_status,
CURRENT_TIMESTAMP AS ods_load_timestamp
FROM (
-- Union all customer data from source systems
SELECT
customer_id,
customer_name,
customer_email,
customer_phone,
address_line1,
city,
state,
postal_code,
customer_status,
updated_at
FROM ecommerce.customers
UNION ALL
SELECT
account_id AS customer_id,
account_name AS customer_name,
email AS customer_email,
phone AS customer_phone,
address_line1,
city,
state,
postal_code,
status AS customer_status,
updated_at
FROM crm.accounts
) source_customers
ORDER BY customer_id, updated_at DESC;
-- Benefits:
-- - Real-time view of current state
-- - Integrated across source systems
-- - No history (current state only)
-- - Used for operational dashboards

CIF ETL Patterns

ETL to EDW

-- ETL Process: Operational Systems → EDW
-- Step 1: Extract from source systems
CREATE TABLE stg_ecommerce_orders AS
SELECT *
FROM ecommerce.orders
WHERE updated_at >= COALESCE(
(SELECT MAX(etl_run_timestamp) FROM etl_log WHERE source_system = 'ECOMMERCE' AND target_table = 'orders'),
'1900-01-01'::timestamp
);
CREATE TABLE stg_crm_orders AS
SELECT *
FROM crm.orders
WHERE updated_at >= COALESCE(
(SELECT MAX(etl_run_timestamp) FROM etl_log WHERE source_system = 'CRM' AND target_table = 'orders'),
'1900-01-01'::timestamp
);
-- Step 2: Transform (apply business rules)
CREATE TABLE stg_orders_transformed AS
SELECT
COALESCE(ecommerce_order_id, crm_order_id) AS order_id,
COALESCE(ecommerce_customer_id, crm_customer_id) AS customer_id,
COALESCE(ecommerce_order_date, crm_order_date) AS order_date,
COALESCE(ecommerce_status, crm_status) AS order_status,
COALESCE(ecommerce_total, crm_total) AS total_amount
FROM (
SELECT
order_id AS ecommerce_order_id,
customer_id AS ecommerce_customer_id,
order_date AS ecommerce_order_date,
status AS ecommerce_status,
total_amount AS ecommerce_total,
NULL AS crm_order_id,
NULL AS crm_customer_id,
NULL AS crm_order_date,
NULL AS crm_status,
NULL AS crm_total
FROM stg_ecommerce_orders
UNION ALL
SELECT
NULL AS ecommerce_order_id,
NULL AS ecommerce_customer_id,
NULL AS ecommerce_order_date,
NULL AS ecommerce_status,
NULL AS ecommerce_total,
order_id AS crm_order_id,
account_id AS crm_customer_id,
order_date AS crm_order_date,
status AS crm_status,
total_amount AS crm_total
FROM stg_crm_orders
) combined_orders;
-- Step 3: Load into EDW (SCD Type 2)
INSERT INTO orders (order_id, customer_id, order_date, order_status, total_amount, updated_at)
SELECT
order_id,
customer_id,
order_date,
order_status,
total_amount,
CURRENT_TIMESTAMP AS updated_at
FROM stg_orders_transformed
ON CONFLICT (order_id) DO UPDATE SET
customer_id = EXCLUDED.customer_id,
order_date = EXCLUDED.order_date,
order_status = EXCLUDED.order_status,
total_amount = EXCLUDED.total_amount,
updated_at = CURRENT_TIMESTAMP;
-- Log ETL run
INSERT INTO etl_log (source_system, target_table, etl_run_timestamp, rows_processed)
SELECT 'ECOMMERCE', 'orders', CURRENT_TIMESTAMP, COUNT(*) FROM stg_ecommerce_orders;

ETL to Data Marts

-- ETL Process: EDW → Dependent Data Marts
-- Refresh sales data mart (daily)
DELETE FROM mart_sales_sales_summary;
INSERT INTO mart_sales_sales_summary
SELECT
o.order_id,
c.customer_id,
c.customer_name,
c.customer_segment,
p.product_id,
p.product_name,
p.product_category,
o.order_date,
o.order_status,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price * (1 - oi.discount_percent / 100) AS line_total,
o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date = CURRENT_DATE - INTERVAL '1 day';
-- Refresh ODS (near real-time)
REFRESH MATERIALIZED VIEW CONCURRENTLY ods_customer_current;

CIF vs. Kimball

DimensionInmon CIFKimball
ApproachTop-down (EDW first)Bottom-up (marts first)
EDW Design3NF (normalized)Star schema (denormalized)
Data MartsDependent (from EDW)Independent (from sources)
Time to valueSlow (EDW first)Fast (marts first)
AgilityLowHigh
IntegrationCentralizedDecentralized
Best forEnterprise, regulated industriesMost organizations

SCD (Slowly Changing Dimensions)

SCD Type 2 in EDW

-- Slowly Changing Dimension Type 2: Track history in EDW
-- Add effective dates to track changes
CREATE TABLE customers_sc2 (
customer_id BIGINT NOT NULL,
customer_name VARCHAR(255) NOT NULL,
customer_email VARCHAR(255),
customer_segment VARCHAR(50),
effective_from TIMESTAMP NOT NULL,
effective_to TIMESTAMP, -- NULL = current record
is_current BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP NOT NULL,
PRIMARY KEY (customer_id, effective_from)
);
-- Insert new record (close old record, open new one)
BEGIN;
-- 1. Close old record
UPDATE customers_sc2
SET effective_to = CURRENT_TIMESTAMP,
is_current = FALSE
WHERE customer_id = 12345
AND is_current = TRUE;
-- 2. Insert new record
INSERT INTO customers_sc2 (customer_id, customer_name, customer_email, customer_segment, effective_from, is_current, created_at)
SELECT
12345 AS customer_id,
'New Name' AS customer_name,
'new@email.com' AS customer_email,
'PREMIUM' AS customer_segment,
CURRENT_TIMESTAMP AS effective_from,
TRUE AS is_current,
CURRENT_TIMESTAMP AS created_at;
COMMIT;
-- Query current records
SELECT * FROM customers_sc2 WHERE is_current = TRUE;

CIF Best Practices

DO

-- 1. Start with EDW (top-down)
-- Build enterprise warehouse first, then marts
-- 2. Use 3NF for EDW
-- Normalize for flexibility and integrity
-- 3. Use star schema for data marts
-- Denormalize for performance
-- 4. Track ETL runs
-- Log all ETL processes for audit
-- 5. Use staging tables
-- Stage data before loading into EDW

DON’T

-- 1. Don't build independent marts first
-- Inmon: EDW first, then dependent marts
-- 2. Don't denormalize EDW
-- Keep EDW normalized (3NF)
-- 3. Don't skip ODS for operational needs
-- Use ODS for real-time operational reporting
-- 4. Don't ignore integration
-- Centralized integration is key
-- 5. Don't forget history
-- Track all changes in EDW

Key Takeaways

  1. Top-down: Build EDW first, then data marts
  2. Normalized: EDW uses 3NF for flexibility
  3. Dependent marts: Data marts derived from EDW
  4. ODS: Operational Data Store for real-time needs
  5. Integration: Centralized integration across enterprise
  6. History: Complete audit trail in EDW
  7. Performance: Denormalize in data marts
  8. Use When: Enterprise, regulated industries, centralized governance

Back to Module 4