How to parse file patterns using Apache commons CLI

1.1k views Asked by At

I'm trying to parse my command line arguments using the apache commons CLI. It might be a bit heavy handed for the example here, but it makes sense in the context of the program I'm creating. I'm trying to read a file pattern filter, similar to what grep uses to select files to process.

My Argument looks like this:

Program --input *.*

I've written a test program to see what the parser is seeing;

public static void main(String[] args) {

    Options options = new Options();
    options.addOption(new Option(INPUT_FILTER_SHORT, INPUT_FILTER_LONG, true, INPUT_FILTER_DESCRIPTION));

    CommandLineParser parser = new BasicParser();
    CommandLine cmd = parser.parse(options, args);

    System.out.println(cmd.getOptionValue(INPUT_FILTER_SHORT));
}

This prints out:

.classpath

If I change my arguments to:

Program --input test.txt

I get the output:

test.txt

I'm assuming that I have to do something to tell apache commons what * is not a special character? I can't seem to find anything about this online.

I'm experiencing this on Windows (7). I'm fairly certain it's the *.* which is causing the issue as when I swap to using patterns that don't use *, the expected pattern shows up.

1

There are 1 answers

2
slim On

Your problem isn't really to do with Commons CLI, but to do with how the shell and the Java executable together process the parameters.

To eliminate other factors, and see what's going on, use a short Java program:

public class ArgsDemo {
     public static void main(String[] args) {
         for(int i=0; i<args.length; i++) {
              System.out.println("" + i + ": " + args[i]);
         }
     }
}

Play with java ArgsDemo hello world, java ArgsDemo * etc. and observe what happens.

On UNIX and Linux:

Java does no special processing of *. However, the shell does. So if you did:

$ mkdir x
$ cd x
$ touch a b
$ java -jar myjar.jar MyClass *

... then MyClass.main() would be invoked with the parameter array ["a","b"] -- because the UNIX shell expands * to files in the current directory.

You can suppress this by escaping:

$ java -jar myjar MyClass *  // main() sees ["*"])

(Note that a UNIX shell wouldn't expand *.* to .classpath because this form would ignore "hidden" files starting with .)

On Windows

cmd.exe does not do UNIX-style wildcard expansion. If you supply * as a parameter to a command in Windows, the command gets a literal *. So for example, PKUNZIP *.zip passes *.zip to PKUNZIP.EXE, and it's up to that program to expand the wildcard if it wants to.

Since some release of Java 7, the Java executable for Windows does some wildcard to filename expansion of its own, before passing the parameters to your main() class.

I've not been able to find clear documentation of Java-for-Windows' wildcard expansion rules, but you should be able to control it with quoting, escaping the quotes to prevent cmd.exe interpreting them:

> java.exe -jar myjar.jar MyClass """*.*"""

(Untested as I don't have a Windows box handy, and quoting in cmd.exe is a bit of a beast - do please experiment and either edit the above or leave a comment)