Skip to main content

Apache Airflow Administration: Scalable Workflow Automation and Orchestration

Executive Summary

This intensive two-day course delivers a deep dive into Apache Airflow's architecture and core components—DAGs, operators, schedulers and executors—while contrasting it with Cron Jobs and Celery. Through hands-on labs in installation, Python/PostgreSQL and Kubernetes (EKS/Helm) deployment, custom container image building, and monitoring with logs and Grafana, participants will master the skills to configure, scale and optimize production-grade workflow automation solutions.

Programming Professionals Collaborating

Description

This course provides a deep dive into Apache Airflow, a powerful workflow automation platform for managing complex data pipelines. Participants will explore the architecture of Airflow, including Directed Acyclic Graphs (DAGs), operators, and schedulers. The course covers installation, configuration, and integration with Kubernetes, AWS EKS, and Helm. Attendees will gain hands-on experience deploying Airflow, optimizing workflows, customizing container images, and monitoring performance using logging and metrics. Designed for professionals, this course ensures participants can build scalable, reliable, and efficient workflow automation solutions.

Objectives

  • Understand Apache Airflow's architecture and how it compares to Cron Jobs and Celery.
  • Learn the fundamentals of DAGs, operators, tasks, variables, and schedulers.
  • Install and configure Apache Airflow using Python environments, PostgreSQL, and Kubernetes.
  • Gain hands-on experience deploying Airflow on Kubernetes, including EKS and Helm.
  • Configure Airflow's executors, logs, and advanced settings for scalability.
  • Build and use custom Airflow container images with additional dependencies.
  • Implement monitoring solutions using logs, Grafana, and external storage.
  • Apply best practices for workflow reliability, scaling, and automation in production environments.

Duration

14 hours of intensive training with live instruction delivered over three to five days to accommodate varied scheduling needs.

Request Information

Course Outline

What is Apache Airflow?
  • Distributed Task Automation
  • Compared to Cron Jobs
  • Compared to Celery
  • Scalability and Reliability
  • Directed Acyclic Graphs (DAGs)
  • Workflows as Code
Workflows as Code (no programming)
  • Anatomy of a DAG
  • Directed Acyclic Graphs
  • Operators
  • Tasks
  • Variables
  • XComs
  • Providers
  • Connections
  • Explore how DAG parts connect to the UI
  • DAG Serialization
  • Listeners
  • Schedulers
  • Pools
Installation and Configuration
  • Python Virtual Environment
  • Install Airflow
  • Airflow Constraints File
  • Standalone Mode
  • Run the Webserver and Scheduler Independently
  • SQLite vs PostgreSQL
  • Configure with PostgreSQL
  • Airflow and Kubernetes (with Minikube)
  • Airflow and AWS Elastic Kubernetes Service (EKS)
  • Airflow Helm Chart
Hands-On Kubernetes (K8s)
  • Containerization and Orchestration
  • Kubectl
  • Helm
  • Nodes
  • Namespaces
  • Pods, Containers, and Services
  • Connect to the Internet (EKS)
  • Keda Autoscaler
  • Pod Logs
  • SSH into Pods/Containers
  • Live Upgrading Airflow
Airflow Configuration
  • Airflow Configuration File Location
  • Airflow Executor Configuration
  • Airflow Log Levels
  • Helm Chart Configuration
  • Learn How to Configure Airflow and K8s Pods
  • Local Executor
  • Celery Executor
  • K8s Pod Executor
Airflow Custom Image
  • Airflow Container Image
  • Why Create a Custom Image?
  • Create a Custom Image
  • Install Software with Apt
  • Install Software with PyPi
  • Install Providers and Custom Software
  • Use the Custom Container Image
Monitoring
  • Logging
  • Log File Structure
  • Log Levels
  • Review Task Logs in the Web UI
  • External Log Storage
  • Metrics Configuration
  • Monitor with Grafana
  • Notifications

Prerequisites

  • Practical experience with Python.
  • Familiarity with Containerization and Container Orchestration.
  • Basic Linux command line skills.

Training Materials

Students receive comprehensive courseware, including reference documents, code samples, and lab guides.