How to find less frequenlty accessed files in HDFS

434 views Asked by At

Beside using Cloudera Navigator, how can I find the less frequently accessed files, in HDFS.

1

There are 1 answers

0
U880D On BEST ANSWER

I assume that you are looking for the time a file was last accessed (open, read, etc.), because as longer in the past the file would be less accessed.

Whereby you can do this in Linux quite simple via ls -l -someMoreOptions, in HDFS more work is necessary.

Maybe you could monitor the /hdfs-audit.log for cmd=open of the mentioned file. Or you could implement a small function to read out the FileStatus.getAccessTime() and as mentioned under Is there anyway to get last access time of HDFS files? or How to get last access time of any files in HDFS? in Cloudera Community.

In other words, it will be necessary to create a small program which scans all the files, read out the properties

...
status = fs.getFileStatus(new Path(line));
...   
long lastAccessTimeLong = status.getAccessTime();
Date lastAccessTimeDate = new Date(lastAccessTimeLong);
...

and order it. It that you will be able find files which were not accessed for long times.