How to get html tag text using XMLSlurper in Groovy

6.1k views Asked by At

I am trying to modify html code in Groovy. I parsed it using XMLSlurper. The problem is i need to edit text of certain tag which contains text and children tags. Html code looks like this:

<ul><li>Text to modify<span>more text</span></li></ul>

In groovy i am trying this code:

def ulDOM = new XmlSlurper().parseText(ul);
def elements = ulDOM.li.findAll{
    it.text().equals("text i am looking for");
}

The problem is i got empty array in 'elements' because it.text() returns text from 'it' node together with whole DOM subtree text nodes. In this case "Text to modifymore text". Note that contains() method is not enough for my solution.

My question is how to get exact text from a certain tag and not the text from whole DOM subtree?

1

There are 1 answers

3
Jayan On BEST ANSWER

.text() evaluate children and appends. Hence it will always include merged line.

Could you consinder localText()? Not exactly what you expect, it returns an array of strings.

import org.testng.Assert

ul='''<ul>
          <li>Text to modify<span>more text</span>
          </li>
       </ul> '''

def ulDOM = new XmlSlurper().parseText(ul);


def elements = ulDOM.li.findAll{
    String[] text = it.localText();
    text[0].equals("Text to modify");
}
Assert.assertTrue(elements.size()==1)