Why the private package String constructor (int, int, char[]) has been removed?

164 views Asked by At

In Java 6, there was a package private constructor to return a new String with the offset being changed.

643     // Package private constructor which shares value array for speed.
644     String(int offset, int count, char value[]) {
645         this.value = value;
646         this.offset = offset;
647         this.count = count;
648     }

It has been marked deprecated in Java 7 and being removed in Java 8. I was relying on some API that made calls to subSequence repeatedly, until I experienced a performance problem.

Digging into the code I saw that subSequence were using this constructor in Java 6. But as for now, it uses another one that copies the underlying array and start at the desired and ended offset, thus making it an O(1) operation to O(n).

Replacing the problematic call to subSequence has increased the performance by a factor of 10.

I was wondering why such a change was made. The only thing I am thinking of is that it could create potential memory leaks, for example:

String veryLargeString = ....;
String target = veryLargeString.substring(0, 10);
//assume I don't need anymore veryLargeString at this point

At that point the underlying char array can't be GC because it is still used by the target String. Thus you have in memory a large array but you only need the first 10 values of it.

Is this the only good use case or is there other reasons why this constructor has been removed?

1

There are 1 answers

0
Jon Skeet On BEST ANSWER

Yes, String was changed significantly in Java 7 update 6 - now separate String objects never share an underlying char[]. This was definitely a trade-off:

  • Strings no longer need to maintain an offset and length (saving two fields per instance; not a lot, but it is for every string...)
  • One small string can't end up keeping a huge char[] alive (as per your post)
  • ... but operations that were previously cheap now end up creating copies

In some use cases, the previous code would work better - in others, the new code would work better. It sounds like you're in the first camp, unfortunately. I can't imagine this decision was made lightly, however - I suspect that a lot of tests against different common work loads were made.