Listing Files in a Directory On Demand in Java

143 views Asked by At

I have a directory with over 100K files in it, and I want to execute some function for each file. Right now I'm using File.listFiles() to do this, but this is extremely inefficient because:

  1. All file names must be read in before any processing occurs, causing an unnecessarily long hang.
  2. All file names end up being placed into an array, taking up huge amounts of memory. At any given time I only need enough memory to store one filename, but here I always need enough memory to store all filenames.

What I really want is something that behaves like a UNIX directory handle, but I couldn't find anything like this. I also looked up the exactly how File.listFiles() in OpenJDK, but it ultimately ends up at a native function call for UNIX-based systems (line 268) and also for Windows (line 525). Worse yet, the native calls are expected to return arrays.

I'd like to avoid plugging into the JNI or calling an external program, if possible.

2

There are 2 answers

0
dkatzel On BEST ANSWER

If you are using Java 7, Nio2's new Path get the files of a directory as a Stream (like an iterator)

try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
    for (Path file: stream) {
        System.out.println(file.getFileName());
    }
} catch (IOException | DirectoryIteratorException x) {
    // IOException can never be thrown by the iteration.
    // In this snippet, it can only be thrown by newDirectoryStream.
    System.err.println(x);
}

Check out the tutorial : http://docs.oracle.com/javase/tutorial/essential/io/dirs.html#listdir

0
Sotirios Delimanolis On

You can use Java 7 FileVisitor with Files.walkFileTree()

Files.walkFileTree(Paths.get("/your/path"), new SimpleFileVisitor<Path>() {
    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        // do what you want with the file
        return FileVisitResult.CONTINUE;
    }
    // more methods to override going through directories
});

to walk each file, directory, or symbolic link (if you want), one at a time. It internally uses DirectoryStream<Path>.