I am having an issue with an XML file that I am generating from data from my database.
I am specifying an encoding type of UTF-8.
I have some text that when I view it in a browser, or in the database appears to represent a é
character. However, when I view the XML file in Notepad++ it shows as [xE9].
This is the definition at the top of my XML file:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version ="2.0" xmlns:g="http://base.google.com/ns/1.0">
This is an excerpt from my XML file and shows the character that is causing issues. I'm confused as to why this shows as non-UTF-8 character as it does below, but this is the reason why my XML is not valid.
<description><![CDATA[work appliqu顤ress. Picco three-quarter sleeved style. Cutwork appliqu顦eatures fitted, with side pockets.]]></description>
In my PHP script I am using the htmlspecialchars function, but it doesn't appear to deal with this character:
<description><![CDATA[<?php echo htmlspecialchars($product['product-description']) ?: 'CRMPicco Online'; ?>]]></description>
Unfortunately, there are a number of instances in the file where this character is present so I can't just remove that one character from the database.
Should I be able to clean this up in PHP?
This can be done using the
iconv
function in PHP:I have changed the code to use this, and it works.