Standalone Shell script working fine but when used is srcs of sh_binary its not working

1k views Asked by At

I have project structure as follows- PROJECT_STRUCTURE

Now my_shbin.sh is as below -

#!/bin/bash
find ../../ \( -name "*.java" -o -name "*.xml" -o -name "*.html" -o -name "*.js" -o -name "*.css" \) | grep -vE "/node_modules/|/target/|/dist/" >> temp-scan-files.txt

# scan project files for offensive terms
IFS=$'\n'
for file in $(cat temp-scan-files.txt); do
    grep -iF -f temp-scan-regex.txt $file >> its-scan-report.txt
done

This script works completely fine when invoked individually and gives required results.But when I add the below sh_binary in my BUILD file I do not see anything in temp-scan-files.txt file and thus nothing in its-scan-report.txt file

sh_binary(
    name = "findFiles",
    srcs = ["src/test/resources/my_shbin.sh"],
    data = glob(["temp-scan-files.txt", "temp-scan-regex.txt", "its-scan-report.txt"]),
)

I ran sh_binary from intellij using the play icon and also tried running it from terminal using bazel run :findFiles. No error is shown but I cannot see data in temp-scan-files.txt. Any help on this issue.The documentation of bazel is very confined with approx no information whatsoever except the use case.

1

There are 1 answers

2
ahumesky On

When a binary is run using bazel run, it's run from the "runfiles tree" for that binary. The runfiles tree is a directory tree that bazel creates that contains symlinks to the binary's inputs. Try putting pwd and tree at the beginning of the shell script to see what this looks like. The reason that the runfiles tree doesn't contain any of the files in src/main is that they're not declared as inputs to the sh_binary (e.g. using the data attribute). See https://docs.bazel.build/versions/master/user-manual.html#run

Another thing to note is that the glob in data = glob(["temp-scan-files.txt", "temp-scan-regex.txt", "its-scan-report.txt"]), won't match anything, because those files are in src/test/resources relative to the BUILD file. However, the script tries to modify these files, and it's not typically possible to modify input files (if this sh_binary were being run as a build action, the inputs would be effectively read-only. This would work only because bazel run is similar to running the final binary by itself outside bazel, e.g. like bazel build //target && bazel-bin/target)

The most straight-forward way to do this might be something like this:

genrule(
  name = "gen_report",
  srcs = [
    # This must be the first element of srcs so that
    # the regex file gets passed to the "-f" of grep in cmd below.
    "src/test/resources/temp-scan-regex.txt",
  ] + glob([
    "src/main/**/*.java",
    "src/main/**/*.xml",
    "src/main/**/*.html",
    "src/main/**/*.js",
    "src/main/**/*.css",
  ],
  exclude = [
    "**/node_modules/**",
    "**/target/**",
    "**/dist/**",
  ]),
  outs = ["its-scan-report.txt"],
  # The first element of $(SRCS) will be the regex file, passed to -f.
  cmd = "grep -iF -f $(SRCS) > $@",
)

$(SRCS) are the files in srcs delimited by a space, and $@ means "the output file, if there's only one". $(SRCS) will contain the temp-scan-regex.txt file, which you probably don't want to include as part of the scan, but if it's the first element, then it will be the parameter to -f. This is maybe a bit hacky and a little fragile, but it's also kind of annoying to try to separate the file out (e.g. using grep or sed or array slicing).

Then bazel build //project/root/myPackage:its-scan-report.txt