Building Data Pipelines with Apache Airflow
Overview:
Apache Airflow is an open-source platform that allows to programmatically author, schedule and monitor data pipelines.
In this workshop the participants will learn how to install, configure and use Airflow for data pipeline management. We’ll also learn how to create data pipelines and extend Airflow functionality with operators, hooks, connectors and more.
Training Duration
2 days (16 academic hours)
Target Audience:
Data engineers, DataOps and MLOps professionals.
Prerequisites:
Basic knowledge of Python. Understanding of Data Processing concepts.
Required Equipment:
Each participant will need a PC with a browser and an internet connection.
Lab environments will be provided by http://strigo.io
Description
Outline:
1. Introduction to Airflow
- Overview of Airflow and its use cases
- Airflow architecture and components
- Setting up Airflow on a local machine
2. Creating DAGs
- Defining tasks and dependencies
- Setting up schedules and triggers
- Using variables and templates
3. Operators and Hooks
- Types of operators and when to use them
- Using hooks to interact with external systems
- Creating custom operators and hooks
4. Monitoring and Troubleshooting
- Using the Airflow UI to monitor DAGs
- Logging and exception handling
- Common errors and troubleshooting techniques
5. Advanced Topics
- Airflow connections and authentication
- Using Airflow with Kubernetes
- Best practices and tips for scaling Airflow
Reviews
There are no reviews yet.