How to convert 'Â¿' special character in unix

Question

How to convert 'Â¿' special character in unix

996 views Asked by Praveen kumar At 13 November 2014 at 12:44

I have a file file.dat which has CNBC: AmericaÂ¿s Gun: The Rise of the AR–15

Unfortunately i got some special characters which dint converted properly in iconv function in unix.

$ file -bi file.dat

text/plain; charset=utf-8

$ cat file.dat | cut -c14 | od -x

0000000 bfc2 000a

0000003

Can you please help me out to convert the special character?

Thanks in advance

-Praveen

Original Q&A

There are 1 answers

**tripleee** · Answer 1 · 2014-11-13T13:27:41+00:00

Your file is basically fine, it's in proper UTF-8 and the character you are looking at is an INVERTED QUESTION MARK (U+00BF) (though you seem to be using some legacy 8-bit character set to view the file, and the output of od -x is word-oriented little-endian, so you get the hex backwards -- the sequence is 0xC2 0xBF, not the other way around).

This article explains that when Oracle tries to export to an unknown character set, it will replace characters it cannot convert with upside-down question marks. So I guess that's what happened here. The only proper fix is to go back to your Oracle database and export in a proper format where curly apostrophes are representable (which I imagine the character really should be).

If the file came from somebody else's Oracle database, ask them to do the export again, or ask them what the character should be, or ignore the problem, or guess what character to put there, and use your editor. If there are just a few problem characters, just do it manually. If there are lots, maybe you can use context-sensitive substitution rules like

it¿s => it’s
dog¿s => dog’s
¿problem¿ => ‘‘problem’’
na¿ve => naïve
¿yri¿ispy¿rykk¿ => äyriäispyörykkä (obviously!)

The use of ¿ as a placeholder for "I don't know" is problematic, but Unicode actually has a solution: the REPLACEMENT CHARACTER (U+FFFD). I guess you're not going to like this, but the only valid (context-free) replacement you can perform programmatically is s/\u{00BF}/\u{FFFD}/g (this is Perl-ish pseudocode, but use whatever you like).

TechQA.

How to convert 'Â¿' special character in unix

There are 1 answers

Related Questions in FILE

Related Questions in ENCODING

Related Questions in UTF-8

Related Questions in CHARACTER-ENCODING

Related Questions in UNICODE-STRING

Popular Questions

Popular Tags

Trending Questions