Skip to main content

Apache Airflow Programming: Developing, Configuring, and Automating Workflows

Executive Summary

Over three days, this course immerses participants in Apache Airflow's architecture and configuration, guiding them through setting up environments, choosing executors, and developing robust DAGs with Python. Through hands-on exercises—ranging from dynamic task mapping and templating to cloud integrations and custom plugin development—attendees will master best practices for automating, monitoring, and optimizing production-ready workflows.

Programming Professionals Collaborating

Description

This course provides a comprehensive introduction to Apache Airflow, covering its architecture, configuration, and workflow automation capabilities. Participants will learn how to set up and manage Airflow environments, configure executors, and develop DAGs using Python. The course explores essential components like tasks, operators, variables, and connections, as well as advanced topics such as dynamic DAGs, templating, and custom plugins. Hands-on exercises include running DAGs, scheduling tasks, integrating cloud providers, and monitoring workflows through logs and the Airflow UI. By the end of the course, participants will be equipped to build, automate, and optimize data pipelines using Airflow.

Objectives

  • Understand Apache Airflow's architecture and how it automates distributed workflows.
  • Set up and configure Airflow using different execution modes and database backends.
  • Learn key Airflow components, including DAGs, tasks, operators, variables, and connections.
  • Develop and run DAGs using the Operator API, TaskFlow API, and dynamic task mapping.
  • Integrate Airflow with cloud providers such as AWS and Azure.
  • Utilize built-in operators and sensors to automate task execution and monitoring.
  • Extend Airflow by creating custom operators, providers, and plugins.
  • Apply best practices for scheduling, logging, debugging, and optimizing workflows.

Duration

21 hours of intensive training with live instruction delivered over three to five days to accommodate varied scheduling needs.

Request Information

Course Outline

What is Apache Airflow?
  • Distributed Task Automation
  • Compared to Cron Jobs
  • Compared to Celery
  • Scalability and Reliability
  • Directed Acyclic Graphs (DAGs)
  • Workflows as Code
Development Server
  • Methods for Running Apache Airflow
  • Standalone Mode with SQLite and Sequential Executor
  • Regular Mode with PostgreSQL and Local Executor
  • Webserver and Scheduler Processes
  • Airflow CLI tool
Apache Airflow Configuration
  • Airflow Configuration File
  • Airflow Home Folder and DAGs Folder
  • Configure the Executor
  • Expose Configuration to the Airflow Web UI
  • Configure the Log Level: Info vs Debug
  • Refresh Frequency for Airflow and the DAGs Folder
  • Reviewing Log Files
Essential Components
  • DAGs
  • Tasks
  • Operators
  • Variables
  • Providers
  • Connections
  • Pool
  • XComs
Coding and Running a DAG
  • Workflows as Code
  • Operator API vs Taskflow API
  • Trigger from Web UI
  • Trigger from CLI
  • Viewing Task Logs
  • DAG Serialization
  • Explore DAG execution in the Web UI
Programming DAGs
  • Create a DAG
  • Orchestrating Tasks
  • DAG Parameters
  • Task Parameters/li>
  • Using Variables
  • Using XComs
  • Using Connections
  • Dynamic DAGs
  • Dynamic Task Mapping
  • Templating with Jinja
  • Manually Trigger a Pipeline
  • Scheduling
Providers and Connections
  • Azure Provider and Blobs
  • AWS Provider and S3
  • File System
Built-In Operators
  • Bash Operator
  • Http Operator
  • Email Operator
  • Python Operator
  • PostgreSQL Operator
  • S3 File Transform Operator
Built-In Sensors
  • File Sensor
  • Python Sensor
Advanced Programming
  • Custom Operators
  • Custom Providers
  • Custom Plugins

Prerequisites

  • Practical experience writing Python scripts and programs.

Training Materials

Students receive comprehensive courseware, including reference documents, code samples, and lab guides.