Screenshot 2023-11-13 at 19.57.33

Building Data Pipelines with Apache Airflow

Overview:

Apache Airflow is an open-source platform that allows to programmatically author, schedule and monitor data pipelines.

In this workshop the participants will learn how to install, configure and use Airflow for data pipeline management. We’ll also learn how to create data pipelines and extend Airflow functionality with operators, hooks, connectors and more.

Training Duration

2 days (16 academic hours)

Target Audience:

Data engineers, DataOps and MLOps professionals.

Prerequisites:

Basic knowledge of Python. Understanding of Data Processing concepts.

Required Equipment:

Each participant will need a PC with a browser and an internet connection.

Lab environments will be provided by http://strigo.io

  • Share post

Description

Outline:

1. Introduction to Airflow

  • Overview of Airflow and its use cases
  • Airflow architecture and components
  • Setting up Airflow on a local machine

2. Creating DAGs

  • Defining tasks and dependencies
  • Setting up schedules and triggers
  • Using variables and templates

3. Operators and Hooks

  • Types of operators and when to use them
  • Using hooks to interact with external systems
  • Creating custom operators and hooks

4. Monitoring and Troubleshooting

  • Using the Airflow UI to monitor DAGs
  • Logging and exception handling
  • Common errors and troubleshooting techniques

5. Advanced Topics

  • Airflow connections and authentication
  • Using Airflow with Kubernetes
  • Best practices and tips for scaling Airflow

Reviews

There are no reviews yet.

Be the first to review “Building Data Pipelines with Apache Airflow”

Your email address will not be published. Required fields are marked *