I have an image containing ITPC data and use the following command to extract the IPTC as textual data:
convert image.jpg IPTCTEXT:iptc.txt
The problem is that this seems to be using entities for "special characters":
2#120#Caption="Beschreibung für den Import aus IPTC"
Actually it should be "für" here. But instead of getting the correct entity ü for the "ü" character i get two entities (probably both bytes of the UTF-8 encoded character got transformed to entites separated). And these two entites i cannot parse correctly.
Is there any way to get the correct entity or disable the entities completely returning UTF-8 characters?
Edit: I tried parsing the entities using StringEscapeUtils.unescapeXml in Java but i get two characters ("ü") instead of the "ü" as both entities are unescaped separated.
Edit2: Example image here: http://fs1.directupload.net/images/150615/5eiv6wwf.jpg
I am not sure why you are seeing something different from me. I am running ImageMagick 6.9.1-4 on a Mac under OS X.
If I do this:
I get this:
And if I hex dump that, I get this:
I think it may be related to your Terminal's locale settings - although I don't know why it still happens when you redirect to a file. Have you tried things like: