Skip to content

Prerequisites & Baseline Knowledge

What You Should Know Before Advancing


Overview

This document outlines the expected baseline knowledge for Principal-level roles. Use it to identify gaps in your knowledge and prioritize learning areas.


Self-Assessment Guide

For each area, rate yourself:

  • Strong: Can teach this to others
  • Functional: Can work independently with occasional reference
  • Gap: Need significant study

Target: 80%+ across all areas before Staff/Principal interviews.


1. Core Data Engineering

Data Modeling (Prerequisite)

SkillDescriptionStaff LevelPrincipal Level
Star SchemaFact/dimensional modelingStrongStrong
Snowflake SchemaNormalized dimensional modelsStrongStrong
Normalization1NF-3NF, BCNFStrongStrong
DenormalizationPerformance trade-offsFunctionalStrong
Slowly Changing DimensionsSCD Types 1-4StrongStrong
Data VaultHub/Link/Satellite modelingFunctionalFunctional
Measure vs. DimensionFundamental distinctionStrongStrong

Self-Assessment: ___ / 7 Strong

Gap Resources:


SQL Fundamentals (Prerequisite)

SkillDescriptionStaff LevelPrincipal Level
Window FunctionsROW_NUMBER, RANK, LAG/LEADStrongStrong
CTEsCommon Table Expressions, recursionStrongStrong
JoinsAll types, performance implicationsStrongStrong
AggregationsGROUP BY, HAVING, rollupsStrongStrong
SubqueriesCorrelated, nestedStrongStrong
Query PlansExplain, analyze, optimizationFunctionalStrong
Set OperationsUNION, INTERSECT, EXCEPTStrongStrong

Self-Assessment: ___ / 7 Strong

Gap Resources:

  • LeetCode SQL (Medium/Hard)
  • “SQL Performance Explained” by Markus Winand

Data Formats (Prerequisite)

FormatUse CaseStaff LevelPrincipal Level
CSVSimple exchange, debuggingStrongStrong
JSONSemi-structured, nested dataStrongStrong
ParquetColumnar analyticsStrongStrong
ORCHive, Presto/TrinoFunctionalStrong
AvroStreaming, schema evolutionFunctionalStrong
Protocol BuffersHigh-performance RPCGapFunctional

Self-Assessment: ___ / 12 Strong points

Gap Resources:


2. Distributed Systems

CAP Theorem (Prerequisite)

ConceptUnderstanding Level
ConsistencyStrong: Explain linearizability, eventual consistency
AvailabilityStrong: Explain high availability patterns
Partition ToleranceStrong: Understand network failure modes
Trade-offsStrong: Analyze CP vs AP systems
Real-world ExamplesStrong: Can name systems in each category

Self-Assessment: ___ / 5 Strong

Gap Resources:

  • “Designing Data-Intensive Applications” by Martin Kleppmann (Ch. 5)

Consistency Models (Prerequisite)

ModelDefinitionExampleStaff Level
Strong ConsistencyLinearizabilitySingle-region RDBMSStrong
Eventual ConsistencyConverges over timeDynamoDB, CassandraStrong
Causal ConsistencyCausally related ops consistentGoogle SpannerFunctional
Read Your WritesSession consistencyMost web appsStrong
Monotonic ReadsNo time travelMost web appsStrong

Self-Assessment: ___ / 5 Strong

Gap Resources:


Scalability Patterns (Prerequisite)

PatternDescriptionUse CaseStaff Level
Horizontal ScalingAdd more nodesStateless servicesStrong
Vertical ScalingBigger machinesDatabasesStrong
ShardingData partitioningHigh-write throughputFunctional
CachingAvoid repeated workRead-heavy workloadsStrong
Load BalancingDistribute requestsAll servicesStrong
ReplicationCopy data across nodesHA, read scalingFunctional

Self-Assessment: ___ / 6 Strong

Gap Resources:


3. Data Pipeline Patterns

Batch Processing (Prerequisite)

PatternDescriptionStaff LevelPrincipal Level
Map-ReduceDistributed processingFunctionalStrong
DAG ExecutionDependency graphsStrongStrong
Incremental ProcessingProcess only new dataStrongStrong
PartitioningData splitting strategyStrongStrong
Pipeline OrchestrationScheduling, dependenciesStrongStrong
Error HandlingRetry, dead letter queuesStrongStrong
IdempotencySafe re-processingStrongStrong

Self-Assessment: ___ / 7 Strong

Gap Resources:


Streaming Concepts (Prerequisite)

ConceptDescriptionStaff LevelPrincipal Level
Event Time vs Processing TimeTime semanticsStrongStrong
WatermarksLate data handlingFunctionalStrong
WindowingTumbling, sliding, sessionFunctionalStrong
State ManagementKeyed state, operatorsGapFunctional
Exactly-OnceProcessing guaranteesFunctionalStrong
BackpressureFlow controlFunctionalFunctional
CheckpointingFault toleranceFunctionalFunctional

Self-Assessment: ___ / 14 Strong points

Gap Resources:


4. Cloud Fundamentals

Cloud Concepts (Prerequisite)

ConceptAWSGCPAzure
Object StorageS3GCSBlob Storage
ComputeEC2Compute EngineVMs
Managed KafkaMSKPub/SubEvent Hubs
WarehouseRedshiftBigQuerySynapse
OrchestrationMWAAComposerAirflow
IAMIAMIAMRBAC

Expected: Strong in one cloud, functional in others.

Self-Assessment: Primary cloud: ___ / 6 Strong | Secondary: ___ / 6 Functional

Gap Resources:


Cost Fundamentals (Prerequisite)

ConceptUnderstanding Required
Storage TiersHot, warm, cold pricing
Compute ModelsOn-demand, reserved, spot
Data TransferIngress/egress costs
EconomicsOpEx vs CapEx
Unit EconomicsCost per TB, cost per query

Self-Assessment: ___ / 5 Strong

Gap Resources:


5. Programming & Tools

Programming Languages (Prerequisite)

LanguageStaff LevelPrincipal LevelUse Case
PythonStrongStrongETL, orchestration, ML
SQLStrongStrongQueries, transformation
ScalaFunctionalGapSpark optimization
JavaFunctionalGapLegacy systems

Self-Assessment: ___ / 8 Strong points

Minimum: Strong Python + Strong SQL


Command Line & Linux (Prerequisite)

SkillStaff LevelPrincipal Level
Bash ScriptingStrongFunctional
File PermissionsStrongFunctional
Process ManagementStrongFunctional
NetworkingFunctionalFunctional
SSH/SCPStrongStrong
Text Processinggrep, sed, awkFunctional

Self-Assessment: ___ / 12 Strong points


Git & Version Control (Prerequisite)

SkillStaff LevelPrincipal Level
Branching StrategiesStrongStrong
Merge vs RebaseStrongFunctional
Pull RequestsStrongStrong
Conflict ResolutionStrongFunctional
Git Best PracticesStrongStrong

Self-Assessment: ___ / 5 Strong


Containerization (Functional for Staff)

SkillStaff LevelPrincipal Level
Docker BasicsStrongStrong
Dockerfile WritingStrongFunctional
Docker ComposeStrongFunctional
Kubernetes ConceptsFunctionalFunctional

Self-Assessment: ___ / 8 Strong points

Gap Resources:


6. Data Quality & Testing

Testing Concepts (Prerequisite)

ConceptDescriptionStaff Level
Unit TestingFunction-level testsStrong
Integration TestingEnd-to-end pipeline testsStrong
Data Quality TestsSchema, statistical testsStrong
Golden TablesExpected output validationFunctional
Data ProfilingStatistical analysisFunctional
Anomaly DetectionOutlier identificationFunctional

Self-Assessment: ___ / 6 Strong

Gap Resources:


7. Security & Compliance

Security Fundamentals (Functional for Staff)

ConceptDescriptionStaff Level
EncryptionAt-rest, in-transitStrong
IAM RolesLeast privilege accessStrong
PII HandlingData masking, tokenizationStrong
Network SecurityVPC, security groupsFunctional
Audit LoggingAccess loggingFunctional
ComplianceGDPR, CCPA, HIPAAFunctional

Self-Assessment: ___ / 6 Strong


8. Soft Skills (Prerequisite)

SkillStaff LevelPrincipal Level
Technical WritingStrongStrong
Presentation SkillsFunctionalStrong
Code ReviewsStrongStrong
MentoringFunctionalStrong
Stakeholder ManagementFunctionalStrong
Conflict ResolutionFunctionalFunctional

Self-Assessment: ___ / 12 Strong points


Prerequisite Gap Analysis

Critical Gaps (Blockers)

Must address before Staff/Principal pursuit:

  • Any “Gap” in Core Data Engineering
  • Any “Gap” in Distributed Systems
  • Any “Gap” in Cloud Fundamentals

Important Gaps (Should Address)

Address before interviews:

  • “Gap” in Data Pipeline Patterns
  • “Gap” in Data Quality & Testing
  • “Gap” in Cost Fundamentals

Nice-to-Have Gaps

Can learn on the job:

  • Programming language diversity (Scala/Java)
  • Deep Kubernetes knowledge
  • Advanced security concepts

Learning Priority Matrix


If You Have Gaps:

  1. Foundations First (2-4 weeks)

    • SQL fundamentals
    • Data modeling (Kimball)
    • Cloud platform basics
  2. Distributed Systems (2-4 weeks)

    • “Designing Data-Intensive Applications” (Kleppmann)
    • CAP theorem, consistency models
    • Scalability patterns
  3. Data Pipeline Patterns (2-4 weeks)

    • Batch processing patterns
    • Streaming fundamentals
    • Orchestration concepts
  4. Specialized Knowledge (4-8 weeks)

    • Lakehouse architecture (Module 1)
    • Performance optimization (Module 7)
    • System design practice (Module 9)

Staff/Principal Readiness Score

Scoring Guide

ScoreLevelReadiness
90-100%PrincipalReady for Principal roles
75-89%Senior StaffReady for Senior Staff roles
60-74%StaffReady for Staff roles
Below 60%SeniorFocus on gap areas

Calculate Your Score

(Strong Count × 1.0) + (Functional Count × 0.5) + (Gap Count × 0.0)
─────────────────────────────────────────────────────────────
Total Items

Your Score: ___ %

Next Steps:

  • 90%+: Proceed to Module 1, focus on architecture depth
  • 75-89%: Address key gaps while studying Module 1
  • 60-74%: Spend 4-8 weeks on gap areas first
  • Below 60%: 12+ weeks of foundation study needed

Module Prerequisites

Each module assumes certain baseline knowledge:

ModuleAssumes StrongCan Learn Alongside
1. Modern ArchitectureData formats, SQLLakehouse, OTF
2. Computing & ProcessingPython, batch patternsStreaming, Rust
3. Cloud InfrastructureOne cloud platformMulti-cloud, IaC
4. Data ModelingSQL, basic modelingAdvanced patterns
5. AI/ML & VectorsPython basicsML concepts, vectors
6. CI/CD for DataGit, testing conceptsData-specific patterns
7. Performance & CostBasic cloud knowledgeAdvanced optimization
8. Case StudiesAll above modulesArchitecture patterns
9. Career StrategyAll technical modulesInterview techniques

Final Checklist

Before claiming Staff/Principal readiness:

Technical

  • Can design an end-to-end data architecture
  • Can explain trade-offs between major technologies
  • Can identify and fix performance bottlenecks
  • Can estimate costs for proposed architectures
  • Can design for TB/PB scale
  • Can design for 99.9%+ availability

Leadership

  • Have led cross-team initiatives
  • Have mentored others to promotion
  • Have written RFCs or architecture docs
  • Have presented to technical leadership
  • Have influenced technical direction

Strategic

  • Can connect technical work to business outcomes
  • Can make build vs. buy decisions
  • Can identify and propose strategic initiatives
  • Can communicate with executives

If you checked < 10 items: Focus on the gaps before pursuing the next level.


This prerequisites document is your starting point. Be honest about your gaps. Strong foundations accelerate all future learning.