Design pattern for blocking undesirable content

450 views Asked by At

Last year I was working on a Christmas project which allowed customers to send emails to each other with a 256 character free-text field for their Christmas request. The project worked by searching the (very-large) product database for suggest products that matched the text field, but offered a free text option for those customers that could not find the product in question.

One obvious concern was the opportunity for customers to send rather explicit requests to some unsuspecting customer with the company's branding sitting around it.

The project did not go ahead in the end, for various reasons, the profanity aspect being one.

However, I've come back to thinking about the project and wondering what kinds of validation could be used here. I'm aware of clbuttic which I know is the standard response to any question of this nature.

The solutions that I considered were:

  • Run it through something like WebPurify
  • Use MechanicalTurk
  • Write a regex pattern which looks for the word in the list. A more complicated version of this would consider plurals and past tenses of the word as well.
  • Write an array of suspicious words, and score each one. If the submission goes above a score, the validation fails.

So there are two questions:

  1. If the submission fails, how do you handle it from a UI perspective?
  2. What are the pros and cons of these solutions, or any others that you can suggest?

NB - answers like "profanity filters are evil" are irrelevant. In this semi-hypothetical situation, I haven't decided to implement a profanity filter or been given the choice of whether or not to implement one. I just have to do the best I can with my programming skills (which should be on a LAMP stack if possible).

3

There are 3 answers

2
Sander Marechal On BEST ANSWER

Have you thought about bayesian filtering? Bayesian filtering is not just for detecting spam. You can train them in a variety of text recognition tasks. Grab a bayesian filter, collect a bunch of request texts and start marking them as containing profanity or not. After some time (how much time depends a lot on the amount and type of training data) your filter will be able to detect requests containing profanity from those containing no profanity.

It's not fool-proof, but it's much, much better than simple string matching and trying to deal with clbuttic problems. You have a variety of possibilities for bayesian filtering in PHP.

bogofilter

Bogofilter is a stand-alone bayesian filter that runs on any unix-y OS. It's targeted at filtering e-mail but you can train it for any kind of text. I have succesfully used this to implement a custom comment spam filter for my own website (source). You can interface with bogofilter like you can with any other commandline application. See my source code link for an example.

Roll your own

If you like a challenge, you could implement a bayesian filter entirely from scratch. Here's a decent article about implementing a bayesian filter in PHP.

Existing PHP libraries

(Ab)use an existing e-mail filter

You could use a standard SpamAssassin or DSpam installation and train it to recognise profanity. Just make sure that you disable options specifically aimed at e-mail messages (e.g. parsing mime blocks, reading headers) and just enable the options that deal with the baysian text processing. DSpam may be easier to adapt. SpamAssassin has the advantage that you can add custom rules on top of the bayesian filter. For SpamAssassin, make sure you disable all the default rules and write your own rules instead. The default rules are all targeted at spam e-mail detection.

4
Zach Rattner On

In the past, I've used a glorified form of str_replace. Here was my rationale:

  1. Profane words could afford to be replaced by silly words, conveying the original point of the message but discouraging the use of profanity
  2. On successful posts where filtering took place, users were shown a success message, but there was a notification that sanitization had taken place (something like, "Your post was added, potty mouth.")
  3. I didn't ever wan the submission to fail. Posts were either posted uncensored, or censored. In your case, you might want to prevent profane posts entirely.

For what it's worth, Apple only recently stopped banning obscene language in their free laser engravings. Perhaps they had a reasonable rationale?

4
Syntax Error On

What about using a few string matching rules and sticking only those into a moderation queue?

It sounds like many requests may not use the free text field so they should safely go through.

Then, only a small percentage should trip your string matches to end up in moderation. Even with a large userbase this should keep moderation time to a minimum. You might even make obvious profanity like the f or n word automatic fails to cut the remaining list down even more.

Make your moderation page easy to use and highlight the words that flagged the messages and that should make it a quick process to scan through and clean up. Adjust as needed if people are trying to post too much garbage or if there's too many false positives.

Or just use this strategy with baysian filtering like @Sander suggested.

Edit: Also a "report abuse" button will help you find out if bad stuff is getting through, but this would involve saving sent messages for a while and that might not be ideal if this is going to be highly active.