How to launch (create job or update job) at Google Cloud Run with cron schedule and container overides

36 views Asked by At

I have a process at google cloud that will launch multiple cloud run job with different sets of env variables to clean up the google storage old data (deeply nested and very large number of folders).

Due to the time they run, it is better for me to schedule them in different time in order not to slow down the performance.

try:
    
    run_job_client = run_v2.JobsClient()

    job = run_v2.Job()
    job.template.template.max_retries = 1187

    run_name="projects/" + PROJECT_ID + "/locations/" + JOB_REGION \
        + "/jobs/" + CALL_JOB    

    override_spec = {
        'container_overrides': [
            {
                'env': [
                    {'name':'TARGET_FILE', 'value':TARGET_FILE},
                    {'name':'SEQUENCE_NUMBER', 'value':SEQUENCE_NUMBER},
                    {'name':'PROJECT_CODE', 'value':PROJECT_CODE},

                ]
            }
        ]
    }
    
    job_request = run_v2.RunJobRequest(
        name=run_name,
        overrides=override_spec
    )

    run_job_client.run_job(request=job_request)    
    print('I delete the folder') 
except:
    print('I failed to delete')

I am a beginner in gcloud and python. I understand I can launch run_v2.RunJobRequest to launch a job with new var setting, but I do not how to schedule this launch with a cron time to start. Any idea to can I schedule this RunJobRequest with a specific start time?

A second question, which I have spent sometimes on but not getting anywhere, so just try to see if anyone can help. To solve the above need, I tried to explore the use of run_v2.CreateJobRequest and see if I can create a new job with and then use scheduler to schedule it.

The sample code here mentioned about accessing the container, I have the container already on storage, but there is not option to point that container at

    request = run_v2.CreateJobRequest(
        parent="parent_value",
        job=job,
        job_id="job_id_value",
    )

any idea what is the correct method of using run_v2.CreateJobRequest?

1

There are 1 answers

5
guillaume blaquiere On

You aren't in the exact right way. Cloud Run Jobs executes job, no more. If you want a scheduler, you need to use a scheduling service, like Cloud Scheduler.

Then, it's no so difficult:


That's being said, I also wondering about your design. You can call aggressively Cloud Storage, you just need to stay in the allowed quotas, and I see none about the concurrent requests (or number of requests per minutes)

Your slowness can come from the object distribution (if the object keys are sequential, all the object use the same node in the Cloud Storage cluster and you create a saturation, a bottleneck, on this node; also named hotspot)

Last but not least, you can have parallel tasks in Cloud Run Jobs. Imagine you have a list of bucket/prefix to delete, let say 20, you can run a Cloud Run Job with 5 tasks (instance) in parallel. Each instance has an env var that represent the total number of task and the current task index of the current instance. Like that, you can calculate, at startup, which subset of the list the current instance must process.

A simple way to execute multiple runs in a single job