Prerequisites & Baseline Knowledge
What You Should Know Before Advancing
Overview
This document outlines the expected baseline knowledge for Principal-level roles. Use it to identify gaps in your knowledge and prioritize learning areas.
Self-Assessment Guide
For each area, rate yourself:
- Strong: Can teach this to others
- Functional: Can work independently with occasional reference
- Gap: Need significant study
Target: 80%+ across all areas before Staff/Principal interviews.
1. Core Data Engineering
Data Modeling (Prerequisite)
| Skill | Description | Staff Level | Principal Level |
|---|---|---|---|
| Star Schema | Fact/dimensional modeling | Strong | Strong |
| Snowflake Schema | Normalized dimensional models | Strong | Strong |
| Normalization | 1NF-3NF, BCNF | Strong | Strong |
| Denormalization | Performance trade-offs | Functional | Strong |
| Slowly Changing Dimensions | SCD Types 1-4 | Strong | Strong |
| Data Vault | Hub/Link/Satellite modeling | Functional | Functional |
| Measure vs. Dimension | Fundamental distinction | Strong | Strong |
Self-Assessment: ___ / 7 Strong
Gap Resources:
- Module 4: Data Modeling
- “The Data Warehouse Toolkit” by Ralph Kimball
SQL Fundamentals (Prerequisite)
| Skill | Description | Staff Level | Principal Level |
|---|---|---|---|
| Window Functions | ROW_NUMBER, RANK, LAG/LEAD | Strong | Strong |
| CTEs | Common Table Expressions, recursion | Strong | Strong |
| Joins | All types, performance implications | Strong | Strong |
| Aggregations | GROUP BY, HAVING, rollups | Strong | Strong |
| Subqueries | Correlated, nested | Strong | Strong |
| Query Plans | Explain, analyze, optimization | Functional | Strong |
| Set Operations | UNION, INTERSECT, EXCEPT | Strong | Strong |
Self-Assessment: ___ / 7 Strong
Gap Resources:
- LeetCode SQL (Medium/Hard)
- “SQL Performance Explained” by Markus Winand
Data Formats (Prerequisite)
| Format | Use Case | Staff Level | Principal Level |
|---|---|---|---|
| CSV | Simple exchange, debugging | Strong | Strong |
| JSON | Semi-structured, nested data | Strong | Strong |
| Parquet | Columnar analytics | Strong | Strong |
| ORC | Hive, Presto/Trino | Functional | Strong |
| Avro | Streaming, schema evolution | Functional | Strong |
| Protocol Buffers | High-performance RPC | Gap | Functional |
Self-Assessment: ___ / 12 Strong points
Gap Resources:
2. Distributed Systems
CAP Theorem (Prerequisite)
| Concept | Understanding Level |
|---|---|
| Consistency | Strong: Explain linearizability, eventual consistency |
| Availability | Strong: Explain high availability patterns |
| Partition Tolerance | Strong: Understand network failure modes |
| Trade-offs | Strong: Analyze CP vs AP systems |
| Real-world Examples | Strong: Can name systems in each category |
Self-Assessment: ___ / 5 Strong
Gap Resources:
- “Designing Data-Intensive Applications” by Martin Kleppmann (Ch. 5)
Consistency Models (Prerequisite)
| Model | Definition | Example | Staff Level |
|---|---|---|---|
| Strong Consistency | Linearizability | Single-region RDBMS | Strong |
| Eventual Consistency | Converges over time | DynamoDB, Cassandra | Strong |
| Causal Consistency | Causally related ops consistent | Google Spanner | Functional |
| Read Your Writes | Session consistency | Most web apps | Strong |
| Monotonic Reads | No time travel | Most web apps | Strong |
Self-Assessment: ___ / 5 Strong
Gap Resources:
- Kleppmann, Ch. 5
- Jepsen.io Consistency Models
Scalability Patterns (Prerequisite)
| Pattern | Description | Use Case | Staff Level |
|---|---|---|---|
| Horizontal Scaling | Add more nodes | Stateless services | Strong |
| Vertical Scaling | Bigger machines | Databases | Strong |
| Sharding | Data partitioning | High-write throughput | Functional |
| Caching | Avoid repeated work | Read-heavy workloads | Strong |
| Load Balancing | Distribute requests | All services | Strong |
| Replication | Copy data across nodes | HA, read scaling | Functional |
Self-Assessment: ___ / 6 Strong
Gap Resources:
3. Data Pipeline Patterns
Batch Processing (Prerequisite)
| Pattern | Description | Staff Level | Principal Level |
|---|---|---|---|
| Map-Reduce | Distributed processing | Functional | Strong |
| DAG Execution | Dependency graphs | Strong | Strong |
| Incremental Processing | Process only new data | Strong | Strong |
| Partitioning | Data splitting strategy | Strong | Strong |
| Pipeline Orchestration | Scheduling, dependencies | Strong | Strong |
| Error Handling | Retry, dead letter queues | Strong | Strong |
| Idempotency | Safe re-processing | Strong | Strong |
Self-Assessment: ___ / 7 Strong
Gap Resources:
- Module 2: Batch Processing
- “Designing Data-Intensive Applications”, Ch. 10-11
Streaming Concepts (Prerequisite)
| Concept | Description | Staff Level | Principal Level |
|---|---|---|---|
| Event Time vs Processing Time | Time semantics | Strong | Strong |
| Watermarks | Late data handling | Functional | Strong |
| Windowing | Tumbling, sliding, session | Functional | Strong |
| State Management | Keyed state, operators | Gap | Functional |
| Exactly-Once | Processing guarantees | Functional | Strong |
| Backpressure | Flow control | Functional | Functional |
| Checkpointing | Fault tolerance | Functional | Functional |
Self-Assessment: ___ / 14 Strong points
Gap Resources:
- Module 2: Streaming
- “Stream Processing with Apache Flink”
4. Cloud Fundamentals
Cloud Concepts (Prerequisite)
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Object Storage | S3 | GCS | Blob Storage |
| Compute | EC2 | Compute Engine | VMs |
| Managed Kafka | MSK | Pub/Sub | Event Hubs |
| Warehouse | Redshift | BigQuery | Synapse |
| Orchestration | MWAA | Composer | Airflow |
| IAM | IAM | IAM | RBAC |
Expected: Strong in one cloud, functional in others.
Self-Assessment: Primary cloud: ___ / 6 Strong | Secondary: ___ / 6 Functional
Gap Resources:
Cost Fundamentals (Prerequisite)
| Concept | Understanding Required |
|---|---|
| Storage Tiers | Hot, warm, cold pricing |
| Compute Models | On-demand, reserved, spot |
| Data Transfer | Ingress/egress costs |
| Economics | OpEx vs CapEx |
| Unit Economics | Cost per TB, cost per query |
Self-Assessment: ___ / 5 Strong
Gap Resources:
- Module 7: FinOps
- Cloud pricing calculators (AWS/GCP/Azure)
5. Programming & Tools
Programming Languages (Prerequisite)
| Language | Staff Level | Principal Level | Use Case |
|---|---|---|---|
| Python | Strong | Strong | ETL, orchestration, ML |
| SQL | Strong | Strong | Queries, transformation |
| Scala | Functional | Gap | Spark optimization |
| Java | Functional | Gap | Legacy systems |
Self-Assessment: ___ / 8 Strong points
Minimum: Strong Python + Strong SQL
Command Line & Linux (Prerequisite)
| Skill | Staff Level | Principal Level |
|---|---|---|
| Bash Scripting | Strong | Functional |
| File Permissions | Strong | Functional |
| Process Management | Strong | Functional |
| Networking | Functional | Functional |
| SSH/SCP | Strong | Strong |
| Text Processing | grep, sed, awk | Functional |
Self-Assessment: ___ / 12 Strong points
Git & Version Control (Prerequisite)
| Skill | Staff Level | Principal Level |
|---|---|---|
| Branching Strategies | Strong | Strong |
| Merge vs Rebase | Strong | Functional |
| Pull Requests | Strong | Strong |
| Conflict Resolution | Strong | Functional |
| Git Best Practices | Strong | Strong |
Self-Assessment: ___ / 5 Strong
Containerization (Functional for Staff)
| Skill | Staff Level | Principal Level |
|---|---|---|
| Docker Basics | Strong | Strong |
| Dockerfile Writing | Strong | Functional |
| Docker Compose | Strong | Functional |
| Kubernetes Concepts | Functional | Functional |
Self-Assessment: ___ / 8 Strong points
Gap Resources:
6. Data Quality & Testing
Testing Concepts (Prerequisite)
| Concept | Description | Staff Level |
|---|---|---|
| Unit Testing | Function-level tests | Strong |
| Integration Testing | End-to-end pipeline tests | Strong |
| Data Quality Tests | Schema, statistical tests | Strong |
| Golden Tables | Expected output validation | Functional |
| Data Profiling | Statistical analysis | Functional |
| Anomaly Detection | Outlier identification | Functional |
Self-Assessment: ___ / 6 Strong
Gap Resources:
7. Security & Compliance
Security Fundamentals (Functional for Staff)
| Concept | Description | Staff Level |
|---|---|---|
| Encryption | At-rest, in-transit | Strong |
| IAM Roles | Least privilege access | Strong |
| PII Handling | Data masking, tokenization | Strong |
| Network Security | VPC, security groups | Functional |
| Audit Logging | Access logging | Functional |
| Compliance | GDPR, CCPA, HIPAA | Functional |
Self-Assessment: ___ / 6 Strong
8. Soft Skills (Prerequisite)
| Skill | Staff Level | Principal Level |
|---|---|---|
| Technical Writing | Strong | Strong |
| Presentation Skills | Functional | Strong |
| Code Reviews | Strong | Strong |
| Mentoring | Functional | Strong |
| Stakeholder Management | Functional | Strong |
| Conflict Resolution | Functional | Functional |
Self-Assessment: ___ / 12 Strong points
Prerequisite Gap Analysis
Critical Gaps (Blockers)
Must address before Staff/Principal pursuit:
- Any “Gap” in Core Data Engineering
- Any “Gap” in Distributed Systems
- Any “Gap” in Cloud Fundamentals
Important Gaps (Should Address)
Address before interviews:
- “Gap” in Data Pipeline Patterns
- “Gap” in Data Quality & Testing
- “Gap” in Cost Fundamentals
Nice-to-Have Gaps
Can learn on the job:
- Programming language diversity (Scala/Java)
- Deep Kubernetes knowledge
- Advanced security concepts
Learning Priority Matrix
Recommended Study Order
If You Have Gaps:
-
Foundations First (2-4 weeks)
- SQL fundamentals
- Data modeling (Kimball)
- Cloud platform basics
-
Distributed Systems (2-4 weeks)
- “Designing Data-Intensive Applications” (Kleppmann)
- CAP theorem, consistency models
- Scalability patterns
-
Data Pipeline Patterns (2-4 weeks)
- Batch processing patterns
- Streaming fundamentals
- Orchestration concepts
-
Specialized Knowledge (4-8 weeks)
- Lakehouse architecture (Module 1)
- Performance optimization (Module 7)
- System design practice (Module 9)
Staff/Principal Readiness Score
Scoring Guide
| Score | Level | Readiness |
|---|---|---|
| 90-100% | Principal | Ready for Principal roles |
| 75-89% | Senior Staff | Ready for Senior Staff roles |
| 60-74% | Staff | Ready for Staff roles |
| Below 60% | Senior | Focus on gap areas |
Calculate Your Score
(Strong Count × 1.0) + (Functional Count × 0.5) + (Gap Count × 0.0)─────────────────────────────────────────────────────────────Total ItemsYour Score: ___ %
Next Steps:
- 90%+: Proceed to Module 1, focus on architecture depth
- 75-89%: Address key gaps while studying Module 1
- 60-74%: Spend 4-8 weeks on gap areas first
- Below 60%: 12+ weeks of foundation study needed
Module Prerequisites
Each module assumes certain baseline knowledge:
| Module | Assumes Strong | Can Learn Alongside |
|---|---|---|
| 1. Modern Architecture | Data formats, SQL | Lakehouse, OTF |
| 2. Computing & Processing | Python, batch patterns | Streaming, Rust |
| 3. Cloud Infrastructure | One cloud platform | Multi-cloud, IaC |
| 4. Data Modeling | SQL, basic modeling | Advanced patterns |
| 5. AI/ML & Vectors | Python basics | ML concepts, vectors |
| 6. CI/CD for Data | Git, testing concepts | Data-specific patterns |
| 7. Performance & Cost | Basic cloud knowledge | Advanced optimization |
| 8. Case Studies | All above modules | Architecture patterns |
| 9. Career Strategy | All technical modules | Interview techniques |
Final Checklist
Before claiming Staff/Principal readiness:
Technical
- Can design an end-to-end data architecture
- Can explain trade-offs between major technologies
- Can identify and fix performance bottlenecks
- Can estimate costs for proposed architectures
- Can design for TB/PB scale
- Can design for 99.9%+ availability
Leadership
- Have led cross-team initiatives
- Have mentored others to promotion
- Have written RFCs or architecture docs
- Have presented to technical leadership
- Have influenced technical direction
Strategic
- Can connect technical work to business outcomes
- Can make build vs. buy decisions
- Can identify and propose strategic initiatives
- Can communicate with executives
If you checked < 10 items: Focus on the gaps before pursuing the next level.
This prerequisites document is your starting point. Be honest about your gaps. Strong foundations accelerate all future learning.