How can I use GCS Delete in Data Fusion Studio?

351 views Asked by At

Apologies if this is very simple but I am a complete beginner at GCP.

I've created a pipline that picks up multiple CSVs from a bucket, wrangles them then writes them into BigQuery. I want it to then delete the contents of the bucket folder the files came from. So let's say I pulled the CSVs using gs://bucket/Data/Country/*.CSV can I use GCS Delete to get rid of all the CSVs in there?

As a desperate attempt :D, in the Objects to delete, I specified gs://bucket/Data/Country/*.* but this didn't do a thing.

1

There are 1 answers

1
Enrique Zetina On BEST ANSWER

According to the Google Cloud Storage Delete plugin documentation its necessary to put each object separating it by comma.

There are feature request asking for the possibility to allow suffixes and prefixes when using this plugin, you can use the +1 button and provide your feedback about how this feature could be useful.

On the other hand, I thought in a workaround that could be work for you. Using the GCS documentation I have created an script to list all csv objects in a bucket, you only have to copy & paste the output in the Objects to Delete property of the plugin. Its important to mentioned that I used this workaround with 100 files more-less, I'm not sure if it's feasible to use with a larger amount of files.


from google.cloud import storage
bucket_name="MY_BUCKET"
file_format="csv"

def list_csv(bucket_name):
    storage_client = storage.Client()
    blobs = storage_client.list_blobs(bucket_name)
    for blob in blobs:
        if file_format in blob.name:
            print("gs://"+ bucket_name + "/" + blob.name+",")
    return None

list_csv(bucket_name)