Issue with Azure ML job command

144 views Asked by At

I am trying to run a job in Azure Ml using job command on a compute cluster. I have stored the data in a folder in Azure ML. Everytime I create and run the job command it displays the following warning:

Your file exceeds 100 MB. If you experience low speeds, latency, or broken connections, we recommend using the AzCopyv10 tool for this file transfer.

Example: azcopy copy '/....' 'https://....'

See https://docs.microsoft.com/azure/storage/common/storage-use-azcopy-v10 for more information.

I am unable to understand the follwoing:

  1. Where does the problem lie, in the data storage or the job?

  2. If it wants me to upload the job using azcopy how do I do it? I am unable to find any references for the same.

  3. I have tried using azcopy to move the data into a container in the blob storage. When I try to link it to the job it results in a streaming error. I have tried to implement the command suggested but it causes an error

1

There are 1 answers

0
Rishabh Meshram On BEST ANSWER

It seems, the problem lies in the size of the data you're trying to transfer, not in the data storage or the job itself.

For AZ Copy, you can check this Documentation to for details regarding authorize and copy your file to storage account. For uploading Files and accessing file, you can also check the azureml.fsspec package in python.

As the command job support blob storage URI and URI_File Data Asset, you can use this as your input format.

Below is sample code snippet for reference:

type: command
command: head ${{inputs.input_data}}
compute: azureml:cpu-cluster
environment: azureml://registries/azureml/environments/sklearn-1.1/versions/4
inputs:
  input_data:
    mode: ro_mount
    path: azureml:wasbs://[email protected]/titanic.csv
    type: uri_file

For more details and example you can check Access data in a job.