I'm using http://caja.appspot.com/html-css-sanitizer-minified.js to sanitize user html, however in some instances I want to restrict the tags used to just a white list.
I've found https://code.google.com/p/google-caja/wiki/CajaWhitelists which describes how to define a white list, but I can't work out how to pass it to the html_sanitize method provided by html-css-sanitizer-minified.js
I've tried calling html.sanitizeWithPolicy(the_html, white_list); but I get an error:
TypeError: a is not a function
Which is hard to debug due to the minification, but it seems likely that html-css-sanitizer-minified.js does not contain everything in the html-sanitizer.js file.
I've tried using html-sanitizer.js combined with cssparser.js instead of the minified version, but I get errors before calling it, presumably because I am missing other dependencies.
How can I make this work?
Edit: sanitizeWithPolicy does exist in the minified file, but something is missing further down the process. This suggests that this file can't be used with a custom white list. I'm now investigating if it is possible to work out which uniminified files I need to include to make my own version.
Edit2: I was missing two files https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html4-defs.js?spec=svn1950&r=1950 and https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/uri.js?r=5170
However I am now getting an error because sanitizeWithPolicy expects a function not a whitelist object. Also the html4-defs.js file is very old and according to this I would have to build the caja project in order get a more recent one.
I solved this by downloading the unminified files
https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js
https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/uri.js
https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html4-defs.js?spec=svn1950&r=1950 (This last one is from an old revision. This file is built from the Java files, would be great if a more up to date one was available.)
I then added a new function to html-sanitizer.js
I then made this function public with this near the end of the file withthe other public function statements.
Now I can call it like so:
I then manually added some html5 elements to the html4-defs.js file (The ones that just define block elements like and ).
The attributes sanitization was still broken. This is due to the html4-defs.js file being out of date with the html-sanitizer.js. I changed this in html-sanitizer.js :
to
This is far from ideal but without compiling Caja and generating an up to date html-defs.js file I can't see a way around this.
This still leaves css sanitization. I would like this as well, but I am missing the css def files and can't find any that work via search so I have turned it off for now.
EDIT: I've managed to extract the html-defs from html-css-sanitizer-minified.js. I've uploaded a copy to here. It includes elements like 'nav' so it has been updated for html5.
I've tried to do the same for the css parsing, I managed to extract the defs, but they depend on a bit count, and I can't find anyway to calculate what bits were used for which defaults.