Replace all characters except ascii 32 to 127 and ascii 0, 13, 27 in postgres sql

6.3k views Asked by At

Is there any function which can replace all characters except ascii 32 to 127 and ascii 0, 13, 27 in postgres sql. I do not want to replace spaces, line feeds etc. I want to replace weird characters like club signs, square or a weird asterisk.

I tried modifying regexp_replace like below but it is not working.

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x00-\x7f]', '', 'g') 
--This is giving error ERROR: 22021: invalid byte sequence for encoding "UTF8": 0x00

select *, regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^[:ascii:]]', '', 'g')
--This one is taking everything beyond 255 also in the set. 

Thanks much for your time and help

2

There are 2 answers

2
anubhava On BEST ANSWER

Try unicode range:

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[\u0080-\u00ff]', '', 'g')

Reference

This will remove any character in 128-255 ascii range.

1
Craig Ringer On

You were almost right with:

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x00-\x7f]', '', 'g') 

but the null byte \x00 is not valid in PostgreSQL string literals, so you'd have to start at \x01. Your desired range starts at 32 (0x20), so use that plus a few specific inclusions for 13 (0x0d) and 27 (0x1b):

select regexp_replace('abc$wanto&tore9046move#special~04 chars', '[^\x20-\x7f\x0d\x1b]', '', 'g')

or, with a more useful input:

regress=> select regexp_replace('aáiï*∞ıb ', '[^\x20-\x7f\x0d\x1b]', '', 'g');
 regexp_replace 
----------------
 ai*b 
(1 row)