I built a parser for HL7 based on documentation I found and thought it was working well--until I got examples of test data. I built it with the following assumptions:
- The
~
is a "repeat" character. Basically meaning the value of the field passed is an array of the given values. - The
^
indicates the field is represented by an array, but the expectation is the array items are used to build a final value. - The
&
is similar to the^
, but is a nested array inside of a^
.
These assumptions don't appear very accurate given the test data I have. Can someone help set me straight on what the right way to interpret these are?
As you are building a parser, I will go into little more details.
Please refer to this reference:
(x0D) Segment separator | Field separator, aka pipe ^ Component separator, aka hat & Sub-component separator ~ Field repeat separator \ Escape character
As stated above:
~
represents that there are multiple values provided for this specific field. So, in terms of programming language, it is an array or list or similar data structure. Your assumption is correct. Please refer to this answer for more details.^
represent component parts of the given field. That means, one field may have multiple components. All these components combine represent final value. This should not be related to array in programming language terms I think. The example here is Person Name. Entire Person Name is single data which is split in family name, given name etc. As you can see, this is not an array. This is not multiple values; this is single value split in multiple sub values. So instead of array, you can think this asclass
orstruct
as in Composition.&
is sub-component which is similar to component as stated above with the difference that, it further splits data in given component in sub-components. Again, I think this should be linked with language specificclass
orstruct
instead of an array.Also, the characters listed above are default and most commonly used for the purpose stated. But, they can be changed. Basically, these characters are defined in each message in
MSH(2)
. Note that first field is always field separator (|
) which is non-negotiable. So the next (second) field holds the Encoding Characters. As you are writing parser, you should read the encoding characters from here and use them accordingly further.Order of the characters is also defined as mentioned here:
Please refer to these other answers those discuss about HL7 Escape Sequences, conventions, and terms used.