Java regular expression to remove empty xml nodes and childrens completely

1.1k views Asked by At

I am struggling to find the best solution. Below is my XML :

                <Dbtr>
                    <Nm>John doe</Nm>
                    <Id>
                        <OrgId>
                            <Othr>
                                <Id/>
                             </Othr>
                        </OrgId>
                    </Id>
                </Dbtr>

This is should replaced like this below :

                <Dbtr>
                    <Nm>John doe</Nm>
                </Dbtr>

So all the empty nodes and children without any values should be left out. I am using following expression and it don't work as per my wishes

docStr = docStr.replaceAll("<(\\w+)></\\1>|<\\w+/>", ""); 

Any help would be really appreciated.

Edit : I am creating this XML (and not parsing it) this will be sent out to clearing house, who will reject this xml message because of this empty tags. The way I am creating this xml is not in my hand I just provide the values from the db and as you can see some of the values are empty, this code (I have no control) writes out the xml tag already and then writes the value, all I can control is to not write "null". The best bet for me now is to get the output xml like this and replace it with some regexp logic and form an xml without empty tags, that can pass schema validation.

1

There are 1 answers

1
AudioBubble On
    String xml = ""
        + "<Dbtr>"
        + "    <Nm>John doe</Nm>"
        + "    <Id>"
        + "        <OrgId>"
        + "            <Othr>"
        + "                <Id/>"
        + "             </Othr>"
        + "        </OrgId>"
        + "    </Id>"
        + "</Dbtr>";
    while (true) {
        String repl = xml.replaceAll("<(\\w+)>\\s*</\\1>|<\\w+/>", "");
        if (repl.length() == xml.length())
            break;
        xml = repl;
    }
    System.out.println(xml);
    // -> <Dbtr>    <Nm>John doe</Nm>    </Dbtr>