Using Apache Nifi to extract HL7 values and apply regex

807 views Asked by At

I need to extract patient info from the HL7 XML document using Apache Nifi, and to apply regex to extract diagnostic results from the sections that contain embedded HTML (yes, sorry. not my design choice :-( )

First path to data of interest in the HL7 is:

"ClinicalDocument" \ "recordTarget" \ "patientRole" \ "patient" \ "name",

and the second, more complicated one is:

"ClinicalDocument" \ "structuredBody" \ "component" \ "section" \ "text @mediaType="text/x-hl7-text+xml"" where the value of the title element equals to "Diagnostic Results"

I need to match on text of the sub-node text value of the title of the section within component that has value "Diagnostic Results" (Diagnostic Results), and then extract the text value of the peer node text.

My HL7 XML snippets look like:

</ClinicalDocument>
...
        <recordTarget>
            <patientRole>
....
            <patient>
                <name><given>John</given><family>Doe</family></name>
...
<structuredBody>
...
<component>
    <section classCode="DOCSECT" moodCode="EVN">
        <templateId root="0.0.0.0.0.0.1" />
        <code code="000-01" codeSystem="0.0.0.1.0.0"  />
        <title>Diagnostic Results</title>
        <text mediaType="text/x-hl7-text+xml">
            Some data of interest expressed in n microns.<content ID="NKN_results"/>
        </text>

Any suggestions on how do I do this in Apache Nifi?

1

There are 1 answers

0
mattyb On BEST ANSWER

You should be able to use XPath and the NiFi EvaluateXPath processor to match and extract the <text> element. I started with the structuredBody tag as root for the following expression:

/structuredBody/component/section[title = 'Diagnostic Results' and text[@mediaType='text/x-hl7-text+xml']]/text

But you should be able to adapt it for the full XML path. Once the <text> element is parsed out, starting in NiFi 0.5.0 you can use the GetHtmlElement processor to extract from the embedded HTML. Previous to NiFi 0.5.0, if the HTML is well-formed (XHTML, e.g.) you can use another EvaluateXPath processor instead.