Case Study - BFSI Banking & Finance
Regulatory-Compliant Data Platform
Business Context
Company: Mid-sized bank (5M customers, 3 regional offices)
Challenge: Build data platform for regulatory reporting, risk analytics, and customer insights while maintaining strict compliance and data governance.
Scale: 50TB/day, 100M transactions/day
Requirements
Architecture Design
Technology Selection
Component Selection Rationale
| Component | Technology | Compliance Rationale |
|---|---|---|
| CDC | GoldenGate | Zero downtime, consistent capture |
| Streaming | Kafka (encrypted) | End-to-end encryption |
| DQ | Great Expectations | Data contracts, validation |
| Transform | dbt | Version control, documentation |
| Catalog | Alation | Data governance, discovery |
| Lineage | Marquez | Open source, column-level |
| Data Lake | S3 + Delta + SSE-KMS | Customer-managed keys |
| Warehouse | Snowflake | SOC2, ISO 27001, FedRAMP |
| Orchestration | Airflow | On-prem deployment option |
Regulatory Reporting
BASEL III/IV Reporting
-- BASEL III risk-weighted assets (RWA) calculation-- dbt model: models/regulatory/basel_rwa.sql
WITH credit_risk AS ( SELECT exposure_id, customer_id, exposure_amount, product_type, rating, counterparty_country, -- Risk weights by product type (simplified) CASE WHEN product_type = 'Mortgage' THEN 0.35 WHEN product_type = 'Commercial_Real_Estate' THEN 1.0 WHEN product_type = 'Corporate_Loan' THEN CASE WHEN rating >= 'AA' THEN 0.2 WHEN rating >= 'BBB' THEN 0.5 ELSE 1.0 END WHEN product_type = 'Consumer_Loan' THEN 0.75 WHEN product_type = 'Credit_Card' THEN 1.0 ELSE 1.0 END AS risk_weight, -- PD (Probability of Default) from rating CASE WHEN rating = 'AAA' THEN 0.0001 WHEN rating = 'AA' THEN 0.0005 WHEN rating = 'A' THEN 0.001 WHEN rating = 'BBB' THEN 0.005 WHEN rating = 'BB' THEN 0.02 WHEN rating = 'B' THEN 0.05 ELSE 0.10 END AS pd, -- LGD (Loss Given Default) by collateral type CASE WHEN collateral_type = 'Cash' THEN 0.0 WHEN collateral_type = 'Government_Securities' THEN 0.0 WHEN collateral_type = 'Real_Estate' THEN 0.4 WHEN collateral_type = 'Corporate_Bonds' THEN 0.6 ELSE 0.8 END AS lgd FROM {{ ref('credit_exposures') }}),
market_risk AS ( SELECT trading_book_id, portfolio_id, -- VaR calculation (simplified) SUM(position_value) AS total_exposure, STDDEV(position_value) AS volatility, -- Market risk capital = 3 * 10-day VaR 3 * SQRT(10) * STDDEV(position_value) AS capital_requirement FROM {{ ref('trading_positions') }} WHERE position_date = DATE_TRUNC('month', CURRENT_DATE) GROUP BY trading_book_id, portfolio_id),
operational_risk AS ( -- Basic indicator approach (BIA) SELECT business_line, SUM(gross_income) AS total_gross_income, 0.15 * SUM(gross_income) AS operational_risk_capital FROM {{ ref('gross_income') }} WHERE fiscal_year = YEAR(CURRENT_DATE) GROUP BY business_line)
-- Final RWA calculationSELECT 'Credit_Risk' AS risk_type, SUM(exposure_amount * risk_weight) AS rwa_amount, SUM(exposure_amount * risk_weight * lgd) AS expected_loss, SUM(exposure_amount * risk_weight * lgd * pd) AS unexpected_lossFROM credit_risk
UNION ALL
SELECT 'Market_Risk' AS risk_type, capital_requirement AS rwa_amount, NULL AS expected_loss, NULL AS unexpected_lossFROM market_risk
UNION ALL
SELECT 'Operational_Risk' AS risk_type, operational_risk_capital AS rwa_amount, NULL AS expected_loss, NULL AS unexpected_lossFROM operational_risk
-- Calculate capital ratiosSELECT risk_type, rwa_amount, rwa_amount / SUM(rwa_amount) OVER () AS rwa_percentage, tier1_capital / rwa_amount AS tier1_ratio, total_capital / rwa_amount AS total_capital_ratioFROM ( SELECT risk_type, SUM(rwa_amount) AS rwa_amount, (SELECT tier1_capital FROM {{ ref('regulatory_capital') }}) AS tier1_capital, (SELECT total_capital FROM {{ ref('regulatory_capital') }}) AS total_capital FROM rwa_components GROUP BY risk_type) cap_ratios;Stress Testing (CCAR)
-- CCAR stress testing scenarios-- dbt model: models/regulatory/ccar_stress_test.sql
{% set scenarios = ['baseline', 'adverse', 'severely_adverse'] %}
WITH macro_scenarios AS ( -- Federal Reserve stress scenarios SELECT scenario_name, quarter, -- Unemployment rate CASE scenario_name WHEN 'baseline' THEN 4.5 WHEN 'adverse' THEN 7.5 WHEN 'severely_adverse' THEN 10.0 END AS unemployment_rate, -- GDP growth CASE scenario_name WHEN 'baseline' THEN 1.8 WHEN 'adverse' THEN -2.5 WHEN 'severely_adverse' THEN -6.0 END AS gdp_growth, -- Housing prices CASE scenario_name WHEN 'baseline' THEN 1.02 WHEN 'adverse' THEN 0.85 WHEN 'severely_adverse' THEN 0.70 END AS housing_price_index FROM unnest({{ scenarios | join(',') }}) AS t(scenario_name) CROSS JOIN ( SELECT generate_series(1, 9) AS quarter -- 9 quarters ) q),
credit_losses AS ( SELECT s.scenario_name, s.quarter, e.exposure_id, e.exposure_amount, -- Adjusted PD based on unemployment shock e.pd * (1 + (s.unemployment_rate - 4.5) * 0.3) AS shocked_pd, -- Adjusted LGD based on housing prices e.lgd / NULLIF(s.housing_price_index, 0) AS shocked_lgd, -- Loss calculation e.exposure_amount * shocked_pd * shocked_lgd AS expected_loss FROM {{ ref('credit_exposures') }} e CROSS JOIN macro_scenarios s),
preprovision_net_revenue AS ( -- Projected PPNR under stress scenarios SELECT s.scenario_name, s.quarter, business_line, -- Base revenue base_revenue * (1 + s.gdp_growth * 0.5) AS stressed_revenue, -- Base expenses base_expenses * (1 + s.unemployment_rate * 0.1) AS stressed_expenses, -- PPNR base_revenue * (1 + s.gdp_growth * 0.5) - base_expenses * (1 + s.unemployment_rate * 0.1) AS ppnr FROM {{ ref('revenue_projection') }} r CROSS JOIN macro_scenarios s),
capital_projection AS ( SELECT s.scenario_name, s.quarter, -- Starting capital SUM(capital_amount) OVER ( PARTITION BY scenario_name ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_capital, -- Add PPNR SUM(ppnr) OVER ( PARTITION BY scenario_name ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_ppnr, -- Subtract losses SUM(expected_loss) OVER ( PARTITION BY scenario_name ORDER BY quarter ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_losses, -- Capital ratio (SUM(capital_amount) OVER ( PARTITION BY scenario_name ORDER BY quarter ) + SUM(ppnr) OVER ( PARTITION BY scenario_name ORDER BY quarter ) - SUM(expected_loss) OVER ( PARTITION BY scenario_name ORDER BY quarter )) / risk_weighted_assets AS capital_ratio FROM {{ ref('regulatory_capital') }} c CROSS JOIN macro_scenarios s LEFT JOIN preprovision_net_revenue pnr ON s.scenario_name = pnr.scenario_name AND s.quarter = pnr.quarter LEFT JOIN credit_losses cl ON s.scenario_name = cl.scenario_name AND s.quarter = cl.quarter)
-- Final CCAR resultsSELECT scenario_name, quarter, ROUND(capital_ratio * 100, 2) AS capital_ratio_pct, cumulative_capital / 1000000 AS capital_millions, cumulative_ppnr / 1000000 AS ppnr_millions, cumulative_losses / 1000000 AS losses_millions, -- Minimum capital ratio (regulatory requirement: 4.5% Tier 1) CASE WHEN MIN(capital_ratio) OVER ( PARTITION BY scenario_name ) < 0.045 THEN 'FAIL' ELSE 'PASS' END AS stress_test_resultFROM capital_projectionORDER BY scenario_name, quarter;Data Governance
Data Lineage
# Marquez lineage trackingfrom marquez_client import Clientfrom datetime import datetime
# Initialize Marquez clientclient = Client(url="http://marquez:5000")
# Define source datasetcore_banking_source = { "namespace": "bfsi", "name": "core_banking_transactions", "source_type": "KAFKA", "connection_url": "kafka.bfsi.internal:9093", "topics": ["core-banking-transactions"]}
# Define transform jobdbt_run_job = { "namespace": "bfsi", "name": "dbt_run_regulatory_models", "location": "https://github.com/bfsi/data-platform/dbt", "input_datasets": [ "bfsi.core_banking_transactions", "bfsi.customer_master", "bfsi.product_master" ], "output_datasets": [ "bfsi.regulatory_basel_rwa", "bfsi.regulatory_ccar_stress_test" ]}
# Define destination datasetregulatory_report = { "namespace": "bfsi", "name": "regulatory_basel_rwa", "source_type": "SNOWFLAKE", "connection_url": "snowflake.bfsi.internal", "database": "regulatory", "schema": "basel", "table": "rwa_calculation"}
# Register lineagedef register_lineage_job_run(): """Register lineage for job run"""
# Create job run job_run = client.create_job_run( namespace="bfsi", job_name="dbt_run_regulatory_models", started_at=datetime.utcnow(), nominal_start_time=datetime.utcnow(), nominal_end_time=datetime.utcnow() )
# Add input datasets for input_dataset in dbt_run_job["input_datasets"]: client.append_inputs( namespace="bfsi", job_name="dbt_run_regulatory_models", run_id=job_run["run_id"], inputs=[input_dataset] )
# Add output datasets for output_dataset in dbt_run_job["output_datasets"]: client.append_outputs( namespace="bfsi", job_name="dbt_run_regulatory_models", run_id=job_run["run_id"], outputs=[output_dataset] )
# Complete job run client.complete_job_run( namespace="bfsi", job_name="dbt_run_regulatory_models", run_id=job_run["run_id"] )
return job_runData Catalog
# Alation data catalog integrationfrom alation_package import AlationHandler, AlationCatalogObject
# Initialize Alation handleralation_handler = AlationHandler( url="https://alation.bfsi.com", userid="data-platform-service", apikey="your-api-key")
# Register regulatory report tabledef register_regulatory_table(): """Register table in data catalog"""
table = AlationCatalogObject( object_type="table", name="basel_rwa_calculation", schema="regulatory", database="regulatory_db", description=""" Basel III Risk-Weighted Assets (RWA) calculation.
This table contains the monthly RWA calculation for regulatory reporting. Includes credit, market, and operational risk components.
Regulatory: BASEL III, BASEL IV Frequency: Monthly Source: dbt models """, # Tags tags=["regulatory", "basel", "rwa", "monthly"], # Custom fields custom_fields={ "compliance_level": "critical", "data_classification": "confidential", "sla": "Day 3 of month", "responsible_team": "Risk Management", "data_steward": "risk-team@bfsi.com", "regulatory_requirement": "BASEL III/IV", "retention_period": "7 years" } )
# Create in Alation alation_handler.create_object(table)
# Register column-level lineagedef register_column_lineage(): """Register column-level lineage"""
columns = [ { "name": "rwa_amount", "description": "Risk-weighted asset amount", "source_columns": [ "raw_credit_exposures.exposure_amount", "raw_credit_exposures.product_type", "raw_credit_exposures.rating" ], "transformation": "exposure_amount * risk_weight(product_type, rating)" }, { "name": "tier1_ratio", "description": "Tier 1 capital ratio (regulatory metric)", "source_columns": [ "regulatory_capital.tier1_capital", "basel_rwa_calculation.rwa_amount" ], "transformation": "tier1_capital / rwa_amount" } ]
for col in columns: alation_handler.add_column_lineage( table="basel_rwa_calculation", column=col["name"], source_columns=col["source_columns"], transformation=col["transformation"] )Cost Optimization
Storage Costs
| Tier | Data | Size | Cost | Optimization |
|---|---|---|---|---|
| Hot | Regulatory reports | 5TB | $115/month | Snowflake |
| Warm | Raw data | 50TB | $1,150/month | S3 Standard |
| Cold | Historical | 200TB | $2,000/month | S3 IA |
| Archive | 7-year retention | 1PB | $3,000/month | Glacier |
| Total | $6,265/month |
Optimization:
- Snowflake auto-suspend (5 min idle)
- Lifecycle: 30 days → IA, 1 year → Glacier
- Compression: ZSTD-19 (~40% savings)
- Partitioning: Date + report_type
Compute Costs
| Component | Compute | Cost | Optimization |
|---|---|---|---|
| Snowflake | Large warehouse | $18,000/month | Scheduled suspension |
| Spark Batch | 50 nodes (spot) | $28,800/month | 70% spot savings |
| Kafka | 10 brokers (on-demand) | $7,200/month | Right-sized |
| Total | $54,000/month |
Total Monthly Cost
| Category | Cost | Annual |
|---|---|---|
| Storage | $6,265 | $75,180 |
| Compute | $54,000 | $648,000 |
| Network | $3,000 | $36,000 |
| Governance Tools | $5,000 | $60,000 |
| Total | $68,265/month | $819,180/year |
Failure Modes
Mode 1: Regulatory Report Failure
Mitigation:
- Automated data quality checks
- Multiple calculation methods
- Manual override procedures
- Regulatory exception process
Mode 2: Data Breach
Mitigation:
- Encryption at rest and in transit
- Access logging and monitoring
- Regular security audits
- Incident response plan
SLA/SLO Definitions
slas: regulatory_reporting: timeliness: basel_reports: "Day 3 of month" ccar_submission: "Q2" stress_testing: "Q2"
accuracy: reconciliation: "100%" data_quality: "> 99.9%"
availability: reporting_platform: "99.9%"
data_governance: lineage: coverage: "100% of regulatory data" freshness: "< 1 hour"
access_control: audit_logging: "100%" encryption: "100%"Key Takeaways
- Regulatory first: Compliance drives all architecture decisions
- Data lineage: Track all transformations for audits
- Data quality: Critical for accurate reporting
- Encryption: End-to-end encryption required
- Access controls: Role-based, row-level security
- Audit logging: Comprehensive access tracking
- Cost optimization: Spot instances, lifecycle policies
- Governance tools: Catalog, lineage, quality
Back to Module 8