Lakehouse Architecture Deep Dive: Iceberg, Delta Lake, and Hudi Compared
The data lakehouse is built on a deceptively simple idea: add a metadata and transaction layer on top of cheap object storage, and you get warehouse-grade reliability at lake-scale costs. The table format is that layer — and three formats dominate the market: Apache Iceberg, Delta Lake, and Apache Hudi.
All three solve the same core problem. The differences in how they solve it have real consequences for query performance, streaming support, tooling compatibility, and operational complexity. This post cuts through the marketing to give you a technical basis for choosing.
What a Table Format Actually Does
Without a table format, a collection of Parquet files in S3 is just files. There's no concept of a table, no transaction guarantees, no schema history, and no efficient way to find which files contain which data.
A table format adds:
- A metadata layer — a manifest of which files make up the current table state
- ACID transactions — atomic writes so readers never see partial updates
- Schema evolution — add, rename, or reorder columns without rewriting data
- Partition evolution — change how data is partitioned without full rewrites
- Time travel — query the table as it existed at any past snapshot
- File-level statistics — min/max values per file enable query engines to skip irrelevant files
Apache Iceberg
Iceberg was developed at Netflix and donated to the Apache Software Foundation. It's the youngest of the three but has gained the most momentum in the last two years, particularly in the AI/ML space.
Architecture:
Catalog (Glue / Nessie / REST)
│
▼
Metadata file (JSON) ← current table state pointer
│
▼
Manifest list ← list of manifest files for this snapshot
│
▼
Manifest files ← list of data files + statistics per file
│
▼
Data files (Parquet/ORC/Avro)
The multi-level metadata hierarchy is Iceberg's key design choice. Manifest files contain file-level statistics (min/max per column), enabling aggressive pruning without a full table scan.
Strengths:
- Partition evolution — change partition strategy (e.g., daily → hourly) without rewriting data. No other format matches this.
- Hidden partitioning — queries don't need to know about partitions; the engine handles it automatically
- Multi-engine support — Spark, Flink, Trino, Presto, DuckDB, StarRocks, Snowflake (external tables), BigQuery — widest engine compatibility
- Row-level deletes — efficient delete files rather than full partition rewrites (critical for GDPR compliance)
- Branching and tagging — WAP (Write-Audit-Publish) pattern for safe data quality workflows
Weaknesses:
- More complex catalog setup than Delta Lake
- Smaller default ecosystem around streaming upserts vs. Hudi
Best for: Multi-engine environments, AI/ML platforms, organizations prioritizing open standards and avoiding vendor lock-in.
Delta Lake
Delta Lake was developed by Databricks and open-sourced in 2019. It's the most widely adopted format in Spark-heavy environments and has the richest feature set within the Databricks platform.
Architecture:
_delta_log/ directory
│
├── 00000000000000000000.json ← transaction log entries (JSON)
├── 00000000000000000001.json
├── ...
└── 00000000000000000010.checkpoint.parquet ← periodic checkpoint
Delta stores all transaction history as JSON log entries in a _delta_log directory alongside the data files. Checkpoints compact the log periodically for performance.
Strengths:
- Simplest setup — no separate catalog required; the
_delta_logdirectory is self-contained - DML operations —
MERGE,UPDATE,DELETEare first-class, highly optimized in Spark - Change Data Feed — built-in CDC stream of row-level changes, ideal for downstream consumers
- OPTIMIZE + ZORDER — compaction and multi-dimensional clustering in one command
- Liquid Clustering (new) — automatic, adaptive clustering replacing static partitioning
Weaknesses:
- Tightest coupling to Spark/Databricks — other engines have lagged in compatibility
- Partition evolution is less flexible than Iceberg
_delta_logcan become a bottleneck on very high-frequency write tables
Best for: Databricks-centric stacks, teams with heavy Spark usage, workloads requiring frequent DML operations.
Apache Hudi
Hudi (Hadoop Upserts Deletes and Incrementals) was developed at Uber and has the longest history of the three. It was designed specifically for streaming upserts — a use case the others have caught up to but Hudi still leads in maturity.
Architecture:
Hudi uses two table types with different optimization profiles:
- Copy-on-Write (CoW) — rewrites entire file on every update. Faster reads, slower writes.
- Merge-on-Read (MoR) — writes updates to delta log files, merges on read. Faster writes, slightly slower reads.
Strengths:
- Streaming upserts — the most mature implementation for high-frequency, key-based upserts
- Incremental queries — query only records that changed since a given commit, natively
- MoR table type — write performance advantage for CDC and streaming use cases
- Record-level index — Bloom filter and HBase-backed indexes for fast key lookups
Weaknesses:
- Steeper learning curve — two table types, multiple index options, more configuration surface
- Multi-engine support has improved but still behind Iceberg
- Community and tooling ecosystem smaller than the other two
Best for: Real-time data ingestion from CDC streams, use cases with high-frequency keyed upserts, Uber/LinkedIn-style event pipelines.
Side-by-Side Comparison
| Feature | Iceberg | Delta Lake | Hudi |
|---|---|---|---|
| ACID transactions | ✓ | ✓ | ✓ |
| Time travel | ✓ | ✓ | ✓ |
| Schema evolution | ✓ | ✓ | ✓ |
| Partition evolution | ✓ (best) | Limited | Limited |
| Streaming upserts | Good | Good | Best |
| Multi-engine support | Best | Good | Improving |
| DML (MERGE/UPDATE) | ✓ | ✓ (best) | ✓ |
| Row-level deletes | ✓ | ✓ | ✓ |
| Catalog required | Yes | No | No |
| Databricks native | Via OSS | ✓ | Via OSS |
| AI/ML ecosystem | Best | Good | Good |
Decision Guide
Choose Iceberg if:
- You need to query the same data from multiple engines (Spark + Trino + Flink + DuckDB)
- AI/ML workloads are a priority — broadest Python and notebook ecosystem support
- You want to avoid any single vendor dependency
- Partition evolution matters for your use case
Choose Delta Lake if:
- Your stack is primarily Databricks or Spark
- You need the most mature MERGE/UPDATE/DELETE operations
- Change Data Feed for downstream streaming consumers is a key requirement
- You want the simplest possible setup (no catalog required)
Choose Hudi if:
- Your primary use case is high-frequency streaming upserts from a CDC source
- You need MoR tables for write-heavy workloads with moderate read SLAs
- You're already running a Hudi deployment and it's working well
Production Recommendation
For new greenfield deployments targeting AI readiness in 2025, Iceberg is the default choice. The multi-engine support and rapidly growing ecosystem — including native Snowflake and BigQuery external table support — means you're not betting on a single platform. The Nessie catalog gives you Git-like branching semantics that pair naturally with dbt's testing-before-promotion workflow.
Book a strategy session to design your lakehouse architecture.