Delta Lake at Scale: Real-World Lessons in Multi-Writer Architecture

ADVANCED DATA ARCHITECTURES

Delta Lake at Scale: Real-World Lessons in Multi-Writer Architecture

Customer Journey Analytics as a leader in the market is experiencing rapid growth and handles massive volumes of data. Due to this accelerated pace, we reached a point where the existing infrastructure was close to its limits to reliably support additional customers. Moreover, ensuring compliance with privacy regulations became increasingly costly—even with the current data load. To address these challenges and build a solution that can scale sustainably over time, we assembled a specialised team to explore and implement a more efficient alternative.

In lakehouse architectures, Hive-style partitioning remains a foundational strategy—valued for its simplicity, transparency, and compatibility with engines like Apache Spark, Presto, and Databricks SQL, and formats such as Delta Lake, Iceberg, and Hudi. Yet, as data volumes and velocity surge, so do the complexities of managing concurrent writes.This session explores the realities of operating a multi-writer architecture at scale. We will begin by looking at the single-writer model, examine its limitations, and transition into the multi-writer paradigm powered by Optimistic Concurrency Control (OCC). However, OCC alone falls short when multiple data producers—streaming, batch, and governance processes—simultaneously target unpredictable partitions. The result? Elevated conflict rates, expensive retries, and potential starvation.To address this, Adobe developed a partition-aware write orchestration framework that:

Splits writes by partition to minimise conflict scope.
Implements idempotent writes, partial progress tracking, and exponential backoff with jitter to enhance reliability and throughput.
Scales seamlessly to petabyte-scale datasets and trillions of records across diverse, concurrent data flows.

Join us to discover how this approach unlocks the full potential of Hive-partitioned Delta Lake—preserving compatibility while meeting modern scalability demands. You’ll leave with actionable strategies to optimise your own Lakehouse write patterns, all this while making the process transparent for the customers and ensure the same quality of the data.
Furthermore you will get a glimpse in what’s next and how we think to build AI models to enable further optimisation of the orchestration framework.

Book Now