This three-day course equips participants with practical skills to develop, manage, and optimize Apache Spark pipelines on GCP Dataproc serverless through targeted lectures, hands-on labs, and a capstone project. By the end, attendees will understand Spark batch and streaming use-cases, master its execution model and core data structures, and build reusable, performance-tuned pipelines for diverse data workloads.
The course equips participants with practical skills to develop, manage, and optimize Apache Spark pipelines on GCP Dataproc serverless. Through targeted lectures, hands-on labs, and a capstone project, attendees will master Spark’s architecture, data structures, pipeline development, and tuning to maintain and expand DPP data pipelines, create reusable code, and address batch and streaming contexts.
21 hours of intensive training with live instruction delivered over three to five days to accommodate varied scheduling needs.
Students receive comprehensive courseware, including slides, code samples, and lab guides with pre-configured datasets.