I know that the string interning optimization is in both C# and Java.
When I try these two codes using Java:
public static void main(String[] args) {
char p[]={'h','e','e','l','l','l','l'}; // Make string from char[] to ensure it's not already interned
String s1 = new String(p);
String i1 = s1.intern();
System.out.println(s1== i1); // true
}
Code number 2:
public static void main(String[] args) {
char p[]={'h','e','e','l','l','l','l'}; // Make string from char[] to ensure it's not already interned
String s1 = new String(p);
String i1 = s1.intern();
System.out.println(s1== i1); // true
String o = "heellll";
System.out.println(o== i1); // true
System.out.println(s1== o); // true
}
However, when I try the same two code using C#:
unsafe static void Main(string[] args)
{
char[] p = { 'h', 'e', 'e', 'l', 'l', 'l', 'l' }; // Make string from char[] to ensure it's not already interned
String s1 = new String(p);
String i1 = string.Intern(s1);
Console.WriteLine(object.ReferenceEquals(s1, i1)); // True
}
Code number 2:
unsafe static void Main(string[] args)
{
char[] p = { 'h', 'e', 'e', 'l', 'l', 'l', 'l' }; // Make string from char[] to ensure it's not already interned
String s1 = new String(p);
String i1 = string.Intern(s1);
Console.WriteLine(object.ReferenceEquals(s1, i1)); // False
String o = "heellll";
Console.WriteLine(object.ReferenceEquals(o, i1)); // True
Console.WriteLine(object.ReferenceEquals(s1, o)); // False
}
I expect two different scenarios, and I hope one of these scenarios shows the reason why the output of the C# code (code number 2) is not the same as the output of the Java code (code number 2):
My first expectation: in C#, the CLR checks the string literals before it runs any line in your code, so String o = "heellll"; is executed before running the code (So String o = "heellll"; is executed String s1 = new String(p); then "heellll" is added to the intern pool before s1 so when you execute the line String i1 = string.Intern(s1); s1 is not added but string.Intern(s1); returns the reference of "heellll"), on the other hand (in Java) the JVM does not check the string literals before run any line in your code.
My second expectation: in C# for the CLR, it is a priority to save references of string literals (the CLR saves s1 reference (in the line String i1 = string.Intern(s1);) but when the CLR finds a string literals that represents the value of a string object (s1) (in line String o = "heellll";) the CLR replaces s1 reference ( that is in the intern pool) with that string literals reference, but if the CLR finds a string object that represents the value of another string object, the CLR does nothing), on the other hand (in Java) for the JVM, it is not a priority to save references of string literals.
So, which scenario is the right scenario? If the two scenarios are wrong, what is the reason why the output of the C# code (code number 2) is not the same as the output of the Java code (code number 2)?
UPD2: See @user85421 comments below this post.
UPD: I was assuming that JVM checks string pool in string constructor, but actually it doesn't. Actually I don't really understand what is happening here, seems like some weird JVM optimization. Normative documents suggest that string should be loaded to pool on class load (see https://stackoverflow.com/a/3451183/5647513), but small examples show that it doesn't.
Seems that JVM interns literals on first use.
string interning is performed before execution at class/method load (
both JVM(see comments) and CLR). The reason whyobject.ReferenceEquals(s1, i1)returnsfalseis thatStringconstructor in CLR doesn't check string pool when called and always creates new object.You probably want to ask -- why it does so? The exact answer is: CLR team decided that way. But it's not very informative, so I will speculate a bit on this topic.
String constructor is used to create strings in runtime, probably from untrusted source (user input), so checking is string interned provides very little benefit (small chance that random string will be in the pool), but hinders every string creation (pool lookup is not free, it's O(n) at least + thread synchronization).
String interning provided by runtime rarely finds good application beside reduced binary size (all duplicate string literals in source code are collapsed into 1). If you are certain that you are going to compare a lot of strings (from trusted source) manually crafted string cache would perform better in all cases (at least it will not contain all the unrelated string from your program and all of it's dependencies). Caching strings from untrusted source leads to unbound heap growth, it can be dealt if you are managing pool by itself, but runtime string pool is append only.
Do not use
unsafe, it's really unsafe (probably even more unsafe than using C/C++ directly instead) and doesn't provide any benefits in regular code.