Optimizing Large Scale Batch ETL Pipelines in Adobe Experience Platform

Optimizing Large Scale Batch ETL Pipelines in Adobe Experience Platform

Processing data at scale is hard; doing it while keeping costs in check is even harder.

In this session, we will look at a series of techniques and practices of working with a high-throughput batch ETL pipeline using the Spark distributed computing framework.

We will address topics such as:
– dealing with input data structures
– shaping the data for improved performance
– tradeoffs in data access patterns
– stability of large computing clusters
– reducing computing footprint in order to achieve significant operating cost reductions.

Reserve Your Seat