how to prevent javascript injection in java

5.4k views Asked by At

I have a rich text area where the user can type something. I am trying to prevent JavaScript injection using the following regex:

return input == null ? null : input.replaceAll("(?i)<script.*?>.*?</script.*?>", "") // case 1
            .replaceAll("(?i)<.*?javascript:.*?>.*?</.*?>", "") // case 2
            .replaceAll("(?i)<.*?\\s+on.*?>.*?</.*?>", ""); // case 3

Above, input is the text from the rich text area and I am using this regex to avoid possible JavaScript injections.

The problem is case 3. If the user's text contains "on", all the text before "on" gets removed.

How can I make the last case more rigid and avoid the above problem?

1

There are 1 answers

2
Igor Deruga On BEST ANSWER

If you want to remove "on" and everything up to the end of the tag, you can use this: .replaceAll("(?i)(<.?\s+)on.?(>.*?)", "$1$2");

This renders "ACD" as "ACD". But be aware that if someone puts a ">" character inside the script, it will mess up the regex...

EDIT: the moral of my remark is that I would not recommend a custom parsing to remove javascript code. I suggest you get yourself acquainted with the answer to the following question: Java: Best way to remove Javascript from HTML and probably use Jsoup.clean (if it is possible in your environment).