Skip to content

Case Study - BFSI Banking & Finance

Regulatory-Compliant Data Platform


Business Context

Company: Mid-sized bank (5M customers, 3 regional offices)

Challenge: Build data platform for regulatory reporting, risk analytics, and customer insights while maintaining strict compliance and data governance.

Scale: 50TB/day, 100M transactions/day


Requirements


Architecture Design


Technology Selection

Component Selection Rationale

ComponentTechnologyCompliance Rationale
CDCGoldenGateZero downtime, consistent capture
StreamingKafka (encrypted)End-to-end encryption
DQGreat ExpectationsData contracts, validation
TransformdbtVersion control, documentation
CatalogAlationData governance, discovery
LineageMarquezOpen source, column-level
Data LakeS3 + Delta + SSE-KMSCustomer-managed keys
WarehouseSnowflakeSOC2, ISO 27001, FedRAMP
OrchestrationAirflowOn-prem deployment option

Regulatory Reporting

BASEL III/IV Reporting

-- BASEL III risk-weighted assets (RWA) calculation
-- dbt model: models/regulatory/basel_rwa.sql
WITH credit_risk AS (
SELECT
exposure_id,
customer_id,
exposure_amount,
product_type,
rating,
counterparty_country,
-- Risk weights by product type (simplified)
CASE
WHEN product_type = 'Mortgage' THEN 0.35
WHEN product_type = 'Commercial_Real_Estate' THEN 1.0
WHEN product_type = 'Corporate_Loan' THEN
CASE
WHEN rating >= 'AA' THEN 0.2
WHEN rating >= 'BBB' THEN 0.5
ELSE 1.0
END
WHEN product_type = 'Consumer_Loan' THEN 0.75
WHEN product_type = 'Credit_Card' THEN 1.0
ELSE 1.0
END AS risk_weight,
-- PD (Probability of Default) from rating
CASE
WHEN rating = 'AAA' THEN 0.0001
WHEN rating = 'AA' THEN 0.0005
WHEN rating = 'A' THEN 0.001
WHEN rating = 'BBB' THEN 0.005
WHEN rating = 'BB' THEN 0.02
WHEN rating = 'B' THEN 0.05
ELSE 0.10
END AS pd,
-- LGD (Loss Given Default) by collateral type
CASE
WHEN collateral_type = 'Cash' THEN 0.0
WHEN collateral_type = 'Government_Securities' THEN 0.0
WHEN collateral_type = 'Real_Estate' THEN 0.4
WHEN collateral_type = 'Corporate_Bonds' THEN 0.6
ELSE 0.8
END AS lgd
FROM {{ ref('credit_exposures') }}
),
market_risk AS (
SELECT
trading_book_id,
portfolio_id,
-- VaR calculation (simplified)
SUM(position_value) AS total_exposure,
STDDEV(position_value) AS volatility,
-- Market risk capital = 3 * 10-day VaR
3 * SQRT(10) * STDDEV(position_value) AS capital_requirement
FROM {{ ref('trading_positions') }}
WHERE position_date = DATE_TRUNC('month', CURRENT_DATE)
GROUP BY trading_book_id, portfolio_id
),
operational_risk AS (
-- Basic indicator approach (BIA)
SELECT
business_line,
SUM(gross_income) AS total_gross_income,
0.15 * SUM(gross_income) AS operational_risk_capital
FROM {{ ref('gross_income') }}
WHERE fiscal_year = YEAR(CURRENT_DATE)
GROUP BY business_line
)
-- Final RWA calculation
SELECT
'Credit_Risk' AS risk_type,
SUM(exposure_amount * risk_weight) AS rwa_amount,
SUM(exposure_amount * risk_weight * lgd) AS expected_loss,
SUM(exposure_amount * risk_weight * lgd * pd) AS unexpected_loss
FROM credit_risk
UNION ALL
SELECT
'Market_Risk' AS risk_type,
capital_requirement AS rwa_amount,
NULL AS expected_loss,
NULL AS unexpected_loss
FROM market_risk
UNION ALL
SELECT
'Operational_Risk' AS risk_type,
operational_risk_capital AS rwa_amount,
NULL AS expected_loss,
NULL AS unexpected_loss
FROM operational_risk
-- Calculate capital ratios
SELECT
risk_type,
rwa_amount,
rwa_amount / SUM(rwa_amount) OVER () AS rwa_percentage,
tier1_capital / rwa_amount AS tier1_ratio,
total_capital / rwa_amount AS total_capital_ratio
FROM (
SELECT
risk_type,
SUM(rwa_amount) AS rwa_amount,
(SELECT tier1_capital FROM {{ ref('regulatory_capital') }}) AS tier1_capital,
(SELECT total_capital FROM {{ ref('regulatory_capital') }}) AS total_capital
FROM rwa_components
GROUP BY risk_type
) cap_ratios;

Stress Testing (CCAR)

-- CCAR stress testing scenarios
-- dbt model: models/regulatory/ccar_stress_test.sql
{% set scenarios = ['baseline', 'adverse', 'severely_adverse'] %}
WITH macro_scenarios AS (
-- Federal Reserve stress scenarios
SELECT
scenario_name,
quarter,
-- Unemployment rate
CASE scenario_name
WHEN 'baseline' THEN 4.5
WHEN 'adverse' THEN 7.5
WHEN 'severely_adverse' THEN 10.0
END AS unemployment_rate,
-- GDP growth
CASE scenario_name
WHEN 'baseline' THEN 1.8
WHEN 'adverse' THEN -2.5
WHEN 'severely_adverse' THEN -6.0
END AS gdp_growth,
-- Housing prices
CASE scenario_name
WHEN 'baseline' THEN 1.02
WHEN 'adverse' THEN 0.85
WHEN 'severely_adverse' THEN 0.70
END AS housing_price_index
FROM unnest({{ scenarios | join(',') }}) AS t(scenario_name)
CROSS JOIN (
SELECT generate_series(1, 9) AS quarter -- 9 quarters
) q
),
credit_losses AS (
SELECT
s.scenario_name,
s.quarter,
e.exposure_id,
e.exposure_amount,
-- Adjusted PD based on unemployment shock
e.pd * (1 + (s.unemployment_rate - 4.5) * 0.3) AS shocked_pd,
-- Adjusted LGD based on housing prices
e.lgd / NULLIF(s.housing_price_index, 0) AS shocked_lgd,
-- Loss calculation
e.exposure_amount * shocked_pd * shocked_lgd AS expected_loss
FROM {{ ref('credit_exposures') }} e
CROSS JOIN macro_scenarios s
),
preprovision_net_revenue AS (
-- Projected PPNR under stress scenarios
SELECT
s.scenario_name,
s.quarter,
business_line,
-- Base revenue
base_revenue * (1 + s.gdp_growth * 0.5) AS stressed_revenue,
-- Base expenses
base_expenses * (1 + s.unemployment_rate * 0.1) AS stressed_expenses,
-- PPNR
base_revenue * (1 + s.gdp_growth * 0.5) -
base_expenses * (1 + s.unemployment_rate * 0.1) AS ppnr
FROM {{ ref('revenue_projection') }} r
CROSS JOIN macro_scenarios s
),
capital_projection AS (
SELECT
s.scenario_name,
s.quarter,
-- Starting capital
SUM(capital_amount) OVER (
PARTITION BY scenario_name
ORDER BY quarter
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_capital,
-- Add PPNR
SUM(ppnr) OVER (
PARTITION BY scenario_name
ORDER BY quarter
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_ppnr,
-- Subtract losses
SUM(expected_loss) OVER (
PARTITION BY scenario_name
ORDER BY quarter
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_losses,
-- Capital ratio
(SUM(capital_amount) OVER (
PARTITION BY scenario_name
ORDER BY quarter
) +
SUM(ppnr) OVER (
PARTITION BY scenario_name
ORDER BY quarter
) -
SUM(expected_loss) OVER (
PARTITION BY scenario_name
ORDER BY quarter
)) / risk_weighted_assets AS capital_ratio
FROM {{ ref('regulatory_capital') }} c
CROSS JOIN macro_scenarios s
LEFT JOIN preprovision_net_revenue pnr
ON s.scenario_name = pnr.scenario_name
AND s.quarter = pnr.quarter
LEFT JOIN credit_losses cl
ON s.scenario_name = cl.scenario_name
AND s.quarter = cl.quarter
)
-- Final CCAR results
SELECT
scenario_name,
quarter,
ROUND(capital_ratio * 100, 2) AS capital_ratio_pct,
cumulative_capital / 1000000 AS capital_millions,
cumulative_ppnr / 1000000 AS ppnr_millions,
cumulative_losses / 1000000 AS losses_millions,
-- Minimum capital ratio (regulatory requirement: 4.5% Tier 1)
CASE
WHEN MIN(capital_ratio) OVER (
PARTITION BY scenario_name
) < 0.045 THEN 'FAIL'
ELSE 'PASS'
END AS stress_test_result
FROM capital_projection
ORDER BY scenario_name, quarter;

Data Governance

Data Lineage

# Marquez lineage tracking
from marquez_client import Client
from datetime import datetime
# Initialize Marquez client
client = Client(url="http://marquez:5000")
# Define source dataset
core_banking_source = {
"namespace": "bfsi",
"name": "core_banking_transactions",
"source_type": "KAFKA",
"connection_url": "kafka.bfsi.internal:9093",
"topics": ["core-banking-transactions"]
}
# Define transform job
dbt_run_job = {
"namespace": "bfsi",
"name": "dbt_run_regulatory_models",
"location": "https://github.com/bfsi/data-platform/dbt",
"input_datasets": [
"bfsi.core_banking_transactions",
"bfsi.customer_master",
"bfsi.product_master"
],
"output_datasets": [
"bfsi.regulatory_basel_rwa",
"bfsi.regulatory_ccar_stress_test"
]
}
# Define destination dataset
regulatory_report = {
"namespace": "bfsi",
"name": "regulatory_basel_rwa",
"source_type": "SNOWFLAKE",
"connection_url": "snowflake.bfsi.internal",
"database": "regulatory",
"schema": "basel",
"table": "rwa_calculation"
}
# Register lineage
def register_lineage_job_run():
"""Register lineage for job run"""
# Create job run
job_run = client.create_job_run(
namespace="bfsi",
job_name="dbt_run_regulatory_models",
started_at=datetime.utcnow(),
nominal_start_time=datetime.utcnow(),
nominal_end_time=datetime.utcnow()
)
# Add input datasets
for input_dataset in dbt_run_job["input_datasets"]:
client.append_inputs(
namespace="bfsi",
job_name="dbt_run_regulatory_models",
run_id=job_run["run_id"],
inputs=[input_dataset]
)
# Add output datasets
for output_dataset in dbt_run_job["output_datasets"]:
client.append_outputs(
namespace="bfsi",
job_name="dbt_run_regulatory_models",
run_id=job_run["run_id"],
outputs=[output_dataset]
)
# Complete job run
client.complete_job_run(
namespace="bfsi",
job_name="dbt_run_regulatory_models",
run_id=job_run["run_id"]
)
return job_run

Data Catalog

# Alation data catalog integration
from alation_package import AlationHandler, AlationCatalogObject
# Initialize Alation handler
alation_handler = AlationHandler(
url="https://alation.bfsi.com",
userid="data-platform-service",
apikey="your-api-key"
)
# Register regulatory report table
def register_regulatory_table():
"""Register table in data catalog"""
table = AlationCatalogObject(
object_type="table",
name="basel_rwa_calculation",
schema="regulatory",
database="regulatory_db",
description="""
Basel III Risk-Weighted Assets (RWA) calculation.
This table contains the monthly RWA calculation for regulatory reporting.
Includes credit, market, and operational risk components.
Regulatory: BASEL III, BASEL IV
Frequency: Monthly
Source: dbt models
""",
# Tags
tags=["regulatory", "basel", "rwa", "monthly"],
# Custom fields
custom_fields={
"compliance_level": "critical",
"data_classification": "confidential",
"sla": "Day 3 of month",
"responsible_team": "Risk Management",
"data_steward": "risk-team@bfsi.com",
"regulatory_requirement": "BASEL III/IV",
"retention_period": "7 years"
}
)
# Create in Alation
alation_handler.create_object(table)
# Register column-level lineage
def register_column_lineage():
"""Register column-level lineage"""
columns = [
{
"name": "rwa_amount",
"description": "Risk-weighted asset amount",
"source_columns": [
"raw_credit_exposures.exposure_amount",
"raw_credit_exposures.product_type",
"raw_credit_exposures.rating"
],
"transformation": "exposure_amount * risk_weight(product_type, rating)"
},
{
"name": "tier1_ratio",
"description": "Tier 1 capital ratio (regulatory metric)",
"source_columns": [
"regulatory_capital.tier1_capital",
"basel_rwa_calculation.rwa_amount"
],
"transformation": "tier1_capital / rwa_amount"
}
]
for col in columns:
alation_handler.add_column_lineage(
table="basel_rwa_calculation",
column=col["name"],
source_columns=col["source_columns"],
transformation=col["transformation"]
)

Cost Optimization

Storage Costs

TierDataSizeCostOptimization
HotRegulatory reports5TB$115/monthSnowflake
WarmRaw data50TB$1,150/monthS3 Standard
ColdHistorical200TB$2,000/monthS3 IA
Archive7-year retention1PB$3,000/monthGlacier
Total$6,265/month

Optimization:

  • Snowflake auto-suspend (5 min idle)
  • Lifecycle: 30 days → IA, 1 year → Glacier
  • Compression: ZSTD-19 (~40% savings)
  • Partitioning: Date + report_type

Compute Costs

ComponentComputeCostOptimization
SnowflakeLarge warehouse$18,000/monthScheduled suspension
Spark Batch50 nodes (spot)$28,800/month70% spot savings
Kafka10 brokers (on-demand)$7,200/monthRight-sized
Total$54,000/month

Total Monthly Cost

CategoryCostAnnual
Storage$6,265$75,180
Compute$54,000$648,000
Network$3,000$36,000
Governance Tools$5,000$60,000
Total$68,265/month$819,180/year

Failure Modes

Mode 1: Regulatory Report Failure

Mitigation:

  • Automated data quality checks
  • Multiple calculation methods
  • Manual override procedures
  • Regulatory exception process

Mode 2: Data Breach

Mitigation:

  • Encryption at rest and in transit
  • Access logging and monitoring
  • Regular security audits
  • Incident response plan

SLA/SLO Definitions

slas:
regulatory_reporting:
timeliness:
basel_reports: "Day 3 of month"
ccar_submission: "Q2"
stress_testing: "Q2"
accuracy:
reconciliation: "100%"
data_quality: "> 99.9%"
availability:
reporting_platform: "99.9%"
data_governance:
lineage:
coverage: "100% of regulatory data"
freshness: "< 1 hour"
access_control:
audit_logging: "100%"
encryption: "100%"

Key Takeaways

  1. Regulatory first: Compliance drives all architecture decisions
  2. Data lineage: Track all transformations for audits
  3. Data quality: Critical for accurate reporting
  4. Encryption: End-to-end encryption required
  5. Access controls: Role-based, row-level security
  6. Audit logging: Comprehensive access tracking
  7. Cost optimization: Spot instances, lifecycle policies
  8. Governance tools: Catalog, lineage, quality

Back to Module 8