Mounting an S3 bucket in docker in a clearml agent

1.7k views Asked by At

What is the best practice for mounting an S3 container inside a docker image that will be using as a ClearML agent? I can think of 3 solutions, but have been unable to get any to work currently:

  1. Use prefabbed configuration in ClearML, specifically CLEARML_AGENT_K8S_HOST_MOUNT. For this to work, the S3 bucket would be mounted separately on the host using rclone and then remapped into docker. This appears to only apply to Kubernetes and not Docker - and therefore would not work.
  2. Mount using s3fuse as specified here. The issue is will it work with the S3 bucket secret stored in ClearML browser sessions? This would also appear to be complicated and require custom docker images, not to mention running the docker image as --privileged or similar.
  3. Pass arguments to docker using "docker_args and docker_bash_setup_script arguments to Task.create()" as specified in the 1.0 release notes. This would be similar to (1), but the arguments would be for bind-mounting the volume. I do not see much documentation or examples on how this new feature may be used for this end.
2

There are 2 answers

0
johnml1135 On BEST ANSWER

I was able to get another option entirely to work, namely, mount a drive on in WSL and then pass it to Docker. Let's get to it:

Why not host in Windows itself, why rclone in WSL?

Steps to mount the drive in ClearML in Windows:

  • You can install rclone in WSL and the mount will be accessible to docker
    • create the folder /data/my-mount (this needs to be in /data - I don't know why and I can't find out with a Google search, but I found out about it here)
    • You can put the configuration file in windows (use the --config option).
    • Note: ClearML will not support spaces in mounted paths, even though docker will. Therefore your path has to be /data/my-mount rather than /data/my mount. There is a bug that I opened about this.
  • You can test mounting by calling docker and mounting the file.
    • Example: docker run -it -v \\wsl$\Ubuntu\data:/data my-docker-image:latest ls /data/my-mount
    • Note: You will have to mount /data rather than /data/my-mount, otherwise you may get this error: docker: Error response from daemon: error while creating mount source path
  • Now, you can setup the clearml.conf file in C:\Users\Myself\clearml.conf such that:
default_docker: {
   # default docker image to use when running in docker mode
   image: "my-docker-image:latest"

   # optional arguments to pass to docker image
   arguments: ["-v","\\wsl$\Ubuntu\data:/data", ]
}
  • Note that you can also run clearml-agent out of WSL and then would only need to specify ["-v","/data:/data", ].
  • Run clearml agent in cmd: clearml-agent daemon --docker
1
Harsh Manvar On

i would recommend you to check out the Storage gateway S3 behind the gateway you can use the NFS, EFS or S3 bucket.

Read more at : https://aws.amazon.com/storagegateway/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc

There are multiple ways you can do this. You can also use the CSI driver to connect the S3 also.

https://github.com/ctrox/csi-s3

rclone is nice option if you can use it, which will sync data to the POD host system in that if large files are there it might take time due to file size and network latecy.

Personal suggestion S3 is object storage so if you are looking forward to do file operations like writing the file or zip file it might take time to do operation based on my personal experience.

Remember that s3 is NOT a file system, but an object store - while mounting IS an incredibly useful capability - I wouldn't leverage anything more than file read or create - don't try to append a file, don't try to use file system trickery

If that the case I would recommend using the NFS or SSD to the container.

while if we look for s3fs-fuse it has own benefit of multipart upload and MD5 & local caching etc.

The easiest way you can write your own script which will sync to the local directory with the directory of S3 bucket over HTTP or else Storage gateway S3 is good option.

Amazon S3 File Gateway provides a seamless way to connect to the cloud in order to store application data files and backup images as durable objects in Amazon S3 cloud storage. Amazon S3 File Gateway offers SMB or NFS-based access to data in Amazon S3 with local caching.