

Here we would provide the URL we want to trigger and set the connection id to user_api. In our case, we would be using the HTTP connection option. Here you can provide various connections like AWS connections, ElasticSearch connections, Kubernetes cluster connections, etc. One such type is configuring “Connections”. Now the third parameter i.e http_conn_id will require something to be explained.Īirflow provides a mechanism to create and store some configurations that you can use across workflows. The task id identifies the task in the DAG, and the endpoint identifies the API to fetch. To create the HttpSensor operator, we provide it with a task id, HTTP connection id, and endpoint. with DAG ( 'user_content_processing', schedule_interval =, default_args =default_args ) as dag : is_api_available = HttpSensor ( task_id = 'is_api_available', http_conn_id = 'user_api', endpoint = 'api/' ) Print the extracted fields using the bash echo command.Creating an Airflow DAG.Īs an example, Let's create a workflow that does the following You can find more operators here.įinally, with the theory done, Let’s do something exciting i.e. There are quite a few other operators that will help you make an HTTP call, connect to a Postgres instance, connect to Slack, etc. DummyOperator - used to represent a dummy task.BranchOperator - used to create a branch in the workflow.PythonOperator - used to run a Python function you define.BashOperator - used to run bash commands.These classes are provided by airflow itself. Now, what is an operator? An operator is a Python class that does some work for you. Key ConceptsĪ workflow is made up of tasks and each task is an operator. Well, not difficult but pretty straightforward. Once you create Python scripts and place them in the dags folder of Airflow, Airflow will automatically create the workflow for you. The workflow execution is based on the schedule you provide, which is as per the Unix cron schedule format.
