I would like to allow users to post HTML to a site but need to ensure that no Javascript is injected into the site.
So far I have created a validation attribute to check the incoming html for dodgy doings
[AttributeUsage(AttributeTargets.Property,
AllowMultiple = false, Inherited = true)]
public class CheckHtml : ValidationAttribute, IMetadataAware {
private static Regex _check = new Regex(
@"<script[^>]*>.*?<\/script>|<[^>]*(click|mousedown|mouseup|mousemove|keypress|keydown|keyup)[^>]*>",
RegexOptions.Singleline|RegexOptions.IgnoreCase|RegexOptions.Compiled);
protected override ValidationResult IsValid(
object value, ValidationContext validationContext) {
if(value!=null
&& _check.IsMatch(value.ToString())){
return new ValidationResult("Content is not acceptable");
}
return ValidationResult.Success;
}
/// <summary>
/// <para>Allow Html</para>
/// </summary>
public void OnMetadataCreated(ModelMetadata metadata) {
if (metadata == null) {
throw new ArgumentNullException("metadata");
}
metadata.RequestValidationEnabled = false;
}
}
Is this going to be enough? What do you do to check for such naughtyness?
Take a look at the Microsoft AntiXSS library. It boasts a
AntiXSS.GetSafeHtmlFragment()
method which returns the HTML stripped of all the XSS-badness.As David has pointed out, a white list is always the way to go. AntiXSS uses a whitelist of HTML elements/attributes that are safe against XSS / filters out JavaScript.