Index related disk availablity issue in GCP VM

186 views Asked by At

I am using Google Cloud Platform (GCP) VM for some Machine learning work. I have 500 GB disk. But as there are millions of files in terms of image datasets used to perform model training, the inodes have exhausted , and as a result of this I am not able to store more images on the disk though the used space is only around 50% of the capacity.

This GCP document asks to scale the disk which involves more cost. Is there any other way of solving this issue without scaling up the disk in GCP?

2

There are 2 answers

0
A.R On BEST ANSWER

The fix for the issue is to increase inodes via formatting with parameter -i and lower value for inodes to bytes ratio. The default inode_ratio is 16384. A lesser value can be put for this ratio.

Example:

mkfs.ext4 -m 0 -E lazy_itable_init=0,lazy_journal_init=0,discard -i 8192 /dev/disk/by-id/google-data
1
Srividya On

If you are facing inode exhaustion on your GCP VM due to a large number of files, there are few alternative approaches you can try.

Each file stored on the file share consumes one inode. If the file system runs out of inodes, you are not able to store more files on the file share even if you haven't reached the maximum allocated capacity. The only way to add inodes is by adding capacity. However, reaching the maximum inodes is rare and is only a concern if you need to store many small files.

You can also consider another approach without necessarily scaling the disk and incurring additional charges. Instead of relying on the VM’s local disk, you can consider using Google Cloud Storage buckets to store your image datasets. GCS doesn't have inode limitations and can handle large-scale storage efficiency. You can mount a GCS bucket as a file system on your VM using GCSFUSE.

Cloud Storage Fuse lets you mount and access Cloud Storage buckets as local file systems and can connect with GKE, Compute Engine VM’s etc. You can refer to the document for mounting a cloud storage bucket using gscfuse.