Building End-To-End Data Solutions on Kubernetes: From Ingest to Visualisation
Designed to guide participants through the entire data lifecycle, this workshop focuses on creating robust and scalable data solutions on Kubernetes using the KubeLake platform, which relies on open-source technologies like NiFi, MinIO, Kafka, Apache Spark, Zeppelin, Trino, and Apache Superset. Attendees will explore how these components come together to build a modern data architecture capable of handling multiple data types and high-velocity workloads, from ingestion to visualization.
This workshop emphasizes practical, hands-on learning, providing actionable knowledge that participants can directly apply to build end-to-end data solutions at scale. Through interactive sessions and live demonstrations, participants will:
- Ingest Data Seamlessly with Apache NiFi and Kafka, automating data flows and managing both batch and real-time data streams.
- Store and Manage Data Efficiently using MinIO for object storage and Trino for fast, distributed querying, ensuring optimal data accessibility.
- Process Data at Scale with Apache Spark, executing large-scale data transformations and analytics directly on Kubernetes.
- Visualize Insights through Apache Superset, creating dynamic dashboards to analyze data in real-time and drive data-driven decisions.
By the end of this workshop, participants will have gained a comprehensive understanding of how to design, implement, and optimize data solutions on Kubernetes, preparing them to tackle real-world data challenges with confidence.
Pre-requisites:
- bring your laptop
- basic understanding of Kubernetes concepts and architecture
- some experience with data processing tools (e.g., Spark, Kafka) is beneficial but not required
- last but not least a basic knowledge of Scala or Python