The following code produces the output "Hello World!" (no really, try it).
public static void main(String... args) {
// The comment below is not a typo.
// \u000d System.out.println("Hello World!");
}
The reason for this is that the Java compiler parses the Unicode character \u000d
as a new line and gets transformed into:
public static void main(String... args) {
// The comment below is not a typo.
//
System.out.println("Hello World!");
}
Thus resulting into a comment being "executed".
Since this can be used to "hide" malicious code or whatever an evil programmer can conceive, why is it allowed in comments?
Why is this allowed by the Java specification?
Unicode decoding takes place before any other lexical translation. The key benefit of this is that it makes it trivial to go back and forth between ASCII and any other encoding. You don't even need to figure out where comments begin and end!
As stated in JLS Section 3.3 this allows any ASCII based tool to process the source files:
This gives a fundamental guarantee for platform independence (independence of supported character sets) which has always been a key goal for the Java platform.
Being able to write any Unicode character anywhere in the file is a neat feature, and especially important in comments, when documenting code in non-latin languages. The fact that it can interfere with the semantics in such subtle ways is just an (unfortunate) side-effect.
There are many gotchas on this theme and Java Puzzlers by Joshua Bloch and Neal Gafter included the following variant:
(This program turns out to be a plain "Hello World" program.)
In the solution to the puzzler, they point out the following:
Source: Java: Executing code in comments?!