How does Timsort perform on data in descending order?

918 views Asked by At

From the:

http://svn.python.org/projects/python/trunk/Objects/listsort.txt

and:

http://en.wikipedia.org/wiki/Timsort

I see, that Timsort has some optimizations when a0 > a1 > a2 > ..., but what about next array:

10000,10000,9999,9999,9998,9998,....,9,9,8,8,7,7,6,6,5,5,4,4,3,3,2,2,1,1,0,0

What is a time efficiency for such array?

(integers were used to simplify an example, stable sorting is required) I have made some measurements and, seems, such arrays are not "good" cases for Timsort.

actually, TimSort in JDK http://cr.openjdk.java.net/~martin/webrevs/openjdk7/timsort/raw_files/new/src/share/classes/java/util/TimSort.java has a method "countRunAndMakeAscending"

@SuppressWarnings("unchecked")
private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
    assert lo < hi;
    int runHi = lo + 1;
    if (runHi == hi)
        return 1;

    // Find end of run, and reverse range if descending
    if (((Comparable) a[runHi++]).compareTo(a[lo]) < 0) { // Descending
        while(runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) < 0)
            runHi++;
        reverseRange(a, lo, runHi);
    } else {                              // Ascending
        while (runHi < hi && ((Comparable) a[runHi]).compareTo(a[runHi - 1]) >= 0)
            runHi++;
    }

    return runHi - lo;
}

why not to implement it in another way:

private static int countRunAndMakeAscending(Object[] a, int lo, int hi) {
    int runHi = lo;
    int lastEqual = lo;
    int ascending = 0;
    while (++runHi < hi) {
      int c = ((Comparable) a[runHi+1]).compareTo(a[runHi]);
      if (ascending == 0) {
        if (c != 0) {
          if (c > 0) {
            ascending = 1;
          } else {
            ascending = -1;
            reverseRange(a, lastEqual, runHi);
            lastEqual = runHi;
          }
        }
      } else if (ascending == 1) {
        if (c < 0) {
          return runHi - lo;
        }
      } else {
        if (c > 0) {
          reverseRange(a, lastEqual, runHi);
          reverseRange(a, lo, runHi);
          return runHi - lo;
        } else if (c < 0) {
          reverseRange(a, lastEqual, runHi);
          lastEqual = runHi;
        }
      }
    }
    if (ascending == -1) {
      reverseRange(a, lastEqual, runHi);
      reverseRange(a, lo, runHi);
    }
    return runHi - lo;
}

so it will work fine with non ascending order?

1

There are 1 answers

2
loreb On

Yes.

Basically it decided that "ascending" really meant "not descending", without any loss of generality -- in case you have eg [5,5,4 3] it will just break it into [5,5] (ascending) and then [4,3] (descending) on the next call.

As to why, I guess it's for simplicity: simply try counting the number of invocations of reverseRange() in your code and in the original and you'll get the idea (I got it by noticing how long it took me to understand one version compared to the other :)

edit: WRONG WRONG WRONG! As Oscar Smith pointed out, the reason is to make timsort a stable sorting algorithm. If anyone knows how to transfer the undeserved bounty...