I have a question of how to handle HL7v2 encoding characters appearing when using a non-standard (non 7 bit ASCII) character sets. As an example, this is a part of a HL7v2 message:
MSH|^~\&|appl|fac|||20240314081500||ORM^O01|10089|P|2.3||||||ISO IR87
PID|||Japan_Test_1||Yamada^Tarou~<esc>$B;3ED<esc>(B^<esc>$BB@O:<esc>(B~<esc>$B$d$^$@<esc>(B^<esc>$B$?$m$&<esc>(B|...
where "<esc>" denotes the presence of byte 0x1B (the ESC character). The message uses the "ISO IR87" character set (JIS X 0208-1990). The family name in the second repeat of the patient name contains the JISX0208 encoding of the Hiragana letter "ま" (ma) which is the bytes 0x24 0x5E, which happen to correspond to the ASCII characters $ and ^.
The question is, since the byte 0x5E appears here, does the HL7 standard require me to escape it? Ie must I use "\S\" here instead? On one hand one can argue that 0x5E, ASCII encoding of ^, appears and hence need to be escaped. On the other hand, the caret character (^) does not appear, 0x5E is only a part of the encoding of the character "ま" (ma).
Put in other words do I need to resolve HL7 escaping first, or do I need to take care of the character encoding first? I have tried to search the HL7 standard without finding a definitive answer.
The spec is silent on this because it hadn't really occurred to us that this is an issue. HL7 messages are sequences of characters not bytes, and you resolve the encoding first before resolving the escaping of the characters
having said that... there will be parsers out there that don't understand JISX0208 encoding and fall over on this because they see if as an unescaped separator character, so you'd have to check each and every trading party.