LLVM ScalarEvolution Pass Cannot Compute Exit Count for Loop Vectorizer

498 views Asked by At

I'm trying to figure out how to run LLVM's built-in loop vectorizer. I have a small program containing an extremely simple loop (I had some output at one point which is why stdio.h is still being included despite never being used):

  1 #include <stdio.h>
  2 
  3 unsigned NUM_ELS = 10000;
  4 
  5 int main() {
  6     int A[NUM_ELS];
  7 
  8 #pragma clang loop vectorize(enable)
  9     for (int i = 0; i < NUM_ELS; ++i) {
 10         A[i] = i*2;
 11     }
 12 
 13     return 0;
 14 }

As you can see, it does nothing at all useful; I just need the for loop to be vectorizable. I'm compiling it to LLVM bytecode with

clang -emit-llvm -O0 -c loop1.c -o loop1.bc
llvm-dis -f loop1.bc

Then I'm applying the vectorizer with

opt -loop-vectorize -force-vector-width=4 -S -debug loop1.ll

However, the debug output gives me this:

LV: Checking a loop in "main" from loop1.bc
LV: Loop hints: force=? width=4 unroll=0
LV: Found a loop: for.cond
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing: Cannot prove legality.

I've dug around in the LLVM source a bit, and it looks like SCEV comes from the ScalarEvolution pass, which has the task of (among other things) counting the number of back edges back to the loop condition, which in this case (if I'm not mistaken) should be the trip count minus the first trip (so 9,999 in this case). I've run this pass on a much larger benchmark and it gives me the exact same error at every loop, so I'm guessing it isn't the loop itself, but that I'm not giving it enough information.

I've spent quite a bit of time combing through the documentation and Google results to find an example of a full opt command using this transformation, but have been unsuccessful so far; I'd appreciate any hints as to what I may be missing (I'm new to vectorizing code so it could be something very obvious).

Thank you,

Stephen

1

There are 1 answers

1
Anton Korobeynikov On BEST ANSWER

vectorization depends on number of other optimization which needs to be run before. They are not run at all at -O0, therefore you cannot expect that your code would be 'just' vectorized there.

Adding -O2 before -loop-vectorize in opt cmdline would help here (make sure your 'A' array is external / used somehow, otherwise everything will be optimized away).