How to convert HTML-ENTITIES and preg_replace in PHP

1.8k views Asked by At

I'm trying to convert   to whitespace.

and then use preg_replace to do some Regex.

like this.

$title = " TEST Ok.2-2";
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.

//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = "~\s+(ok[.]?)~i";
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/\s+/', ' ', $title);
$title = trim($title);

//The result = TEST < Ok.2-2 (not work!)
echo($title);

with this code the mb_convert_encoding and html_entity_decode is work well but when I try to use preg_replace to regex the whitespace it seem it not found the whitespace that converted.

Now out put: TEST < Ok.2-2

Expected out put: TEST < OK. 2-2

NOW MY SOLUTION

I added the str_replace to hardcode replace a &nbsp; to whitespace and use mb_convert_encoding or html_entity_decode to convert another htmlentity.

$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = str_replace('&nbsp;', ' ', $title);
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.

//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = '~\s+(ok[.]?)~i';
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/\s+/', ' ', $title);
$title = trim($title);

//The result TEST < OK. 2-2 (WORK!)
echo($title);

NOW my out put: TEST < OK. 2-2

MY expected: TEST < OK. 2-2

Any suggestion for best solution?

1

There are 1 answers

2
chris85 On

I think this will give you what you are after.

$title = trim(
     preg_replace('~\s+~', ' ', 
          str_ireplace(array('&nbsp;', ' ok.'), array(' ', ' OK. '), 
     "&nbsp;TEST&nbsp;Ok.2-2")
     )
);

This will:

  1. Strip leading and trailing white spaces (trim)
  2. Replace multiple white spaces with a single white space (preg_replace('~\s+~', ' ')
  3. Replace &nbsp; to a single space (str_ireplace)
  4. Replace ok. case insensitive to OK. (str_ireplace)

Output:

TEST OK. 2-2

Your HTML entity decode example is correct, http://sandbox.onlinephpfunctions.com/code/eed7e30d507f7197585f29c1fdde9e7744fc572d

$title = html_entity_decode("&nbsp;TEST&nbsp;Ok.2-2", ENT_NOQUOTES, 'UTF-8');
echo $title;

Output:

 TEST Ok.2-2

Edit:

<?php
$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = trim(preg_replace('~\s+~', ' ', str_ireplace(array('&nbsp;', '&lt;', 'Ok.'), array(' ', '', ' OK. '), $title)));
echo $title;

It's probably safer to just remove the 2 entities with the str_replace. If your string were <h1>&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2</h1> and you decoded then removed all < your string would not function as it had.

Output:

TEST OK. 2-2