How sanitize and store user input, that contains HTML regex pattern in WordPress

1.6k views Asked by At

I working on some WordPress plugin that one of its features is ability to store HTML regex pattern, entered by user, to DB and then display it on settings page.

My method is actually work but I wonder if that code is secure enough:

That's the user entered pattern:

<div(.+?)class='sharedaddy sd-sharing-enabled'(.*?)>(.+?)<\div><\div><\div>

That's the way I'm storing HTML pattern in DB:

$print_options['custom_exclude_pattern'] = htmlentities(stripslashes($_POST['custom_exclude_pattern']),ENT_QUOTES,"UTF-8"); 

That's how it's actually stored in WordPress DB:

s:22:"custom_exclude_pattern";s:109:"&lt;div(.+?)class=&quot;sharedaddy sd-sharing-enabled&quot;(.*?)&gt;(.+?)&lt;\div&gt;&lt;\div&gt;&lt;\div&gt;";

And that's how the output is displayed on settings page:

<input type="text" name="custom_exclude_pattern" value="<?php echo str_replace('"',"'",html_entity_decode($print_options['custom_exclude_pattern'])); ?>" size="30" />

Thanks for help :)

3

There are 3 answers

2
Scott Arciszewski On BEST ANSWER

From the comments, it sounds like you are concerned about two separate issues (and possibly unaware of a third one that I will mention in a minute) and looking for one solution for both: SQL Injection and Cross-Site Scripting. You have to treat each one separately. I implore you to read this article by Defuse Security.

How to Prevent SQL Injection

This has been answered before on StackOverflow with respect to PHP applications in general. WordPress's $wpdb supports prepared statements, so you don't necessarily have to figure out how to work with PDO or MySQLi either. (However, any vulnerabilities in their driver WILL affect your plugin. Make sure you read the $wpdb documentation thoroughly.

You should not escape the parameters before passing them to a prepared statement. You'll just end up with munged data.

Cross-Site Scripting

As of this writing (June 2015), there are two general situations you need to consider:

  1. The user should not be allowed to submit any HTML, CSS, etc. to this input.
  2. The user is allowed to submit some HTML, CSS, etc. to this input, but we don't want them to be able to hack us by doing so.

The first problem is straightforward enough to solve:

echo htmlentities($dbresult['field'], ENT_QUOTES | ENT_HTML5, 'UTF-8');

The second problem is a bit tricky. It involves allowing only certain markup while not accidentally allowing other markup that can be leveraged to get Javascript to run in the user's browser. The current gold standard in XSS defense while allowing some HTML is HTML Purifier.

Important!

Whatever your requirements, you should always apply your XSS defense on output, not before inserting stuff into the database. Recently, Wordpress core had a stored cross-site scripting vulnerability that resulted from the decision to escape before storing rather than to escape before rendering. By supplying a sufficiently long comment, attackers could trigger a MySQL truncation bug on the escaped text, which allowed them to bypass their defense.

Bonus: PHP Object Injection from unserialize()

That's how it's actually stored in WordPress DB:

s:22:"custom_exclude_pattern";s:109:"&lt;div(.+?)class=&quot;sharedaddy sd-sharing-enabled&quot;(.*?)&gt;(.+?)&lt;\div&gt;&lt;\div&gt;&lt;\div&gt;";

It looks like you're using serialize() when storing this data and, presumably, using unserialize() when retrieving it. Be careful with unserialize(); if you let users have any control over the string, they can inject PHP objects into your code, which can also lead to Remote Code Execution.

Remote Code Execution, for the record, means they can take over your entire website and possibly the server that hosts your blog. If there is any chance that a user can alter this record directly, I highly recommend using json_encode() and json_decode() instead.

2
PoPeio On

I hope I got the point, if not then correct me: you are trying to dynamically insert a pattern for an input field, based on the same pattern being stored in your db, right? Well, personally I think patterns are a good help for usability, in that the user knows his input format is not correct without needing to submit and refresh every time. The big problem of patterns is, HTML code can be modified client-side. I believe the only safe solution would be to check server-side for the correctness of the input... There is no way a client side procedure can be safer than a server-side one!

0
PoPeio On

Well, if you are gonna let your user input a regex, you could just do something like prepared statement + htmlentities($input, ENT_COMPAT, "UTF-I"); to sanitize the input, and then do the opposite, that is html_entity_decode($dataFromDb, ENT_COMPAT, " UTF-8");. A must is the prepared statement, all the other ways to work around a malicious input can be combined in lots of different ways!