Hello Stack Overflow community,
I'm currently working on a project at work where I need to automatically trigger an Airflow DAG in Cloud Composer on Google Cloud Platform (GCP) whenever a .csv file is uploaded to a Google Cloud Storage (GCS) bucket. However, there's a restriction in my organisation that prevents the use of Cloud Functions for this purpose.
I've successfully implemented a solution using Cloud Functions, but due to organisational constraints, I need to explore alternative methods that are both efficient and cost-effective. I would appreciate any guidance or suggestions on achieving this without relying on Cloud Functions.
If you've encountered a similar scenario or have ideas on how to set up this file upload trigger without using Cloud Functions, your insights would be incredibly valuable.
Thank you in advance for your help!
As mentioned, I know how to do this using Cloud Functions, but this is prohibited in my organisation, so I need to find alternative methods.
Something I can think of is Polling with a Time-Based DAG
Set up a DAG in Airflow that runs on a regular schedule (e.g., every 5 minutes). Within the DAG, use a GoogleCloudStorageListOperator to list the files in your GCS bucket. Compare the current file list with a previously stored list. If a new file appears, trigger the necessary downstream Airflow tasks or a different DAG.
hope this help :)
ref : https://airflow.apache.org/docs/apache-airflow/1.10.12/_api/airflow/contrib/operators/gcs_list_operator/index.html