XPath | create expression that finds a keyword in attributes - case-insensitiv

111 views Asked by At

I have XML files with only some nodes. Anyhow the nodes have lots of attributes. I am querying for attributes containing a specific keyword.

String expression = "/posts/row[@PostTypeId='1' and @*[contains(.,'Security')]]";

This is working fine, but only finding the exact match with a capital 'S'. I want to have users provide the keyword and it is not in my hands whether they use only lower case. What I am trying to do is to use

//posts/row[translate(@*, ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’, ‘abcdefghijklmnopqrstuvwxyz’)]

to make everything lower case on XML side as well as the keyword and combine both attempts into one expression.

This is my attempt, but not working

String expression = "/posts/row[@*[contains(translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '" + keyword + "')]]";

Can someone help me with that please? Am I totally wrong in my attempt?

Thanks in advance, Bodo


It seems I could sort this out meanwhile with the following

String expression = "/posts/row[@PostTypeId='1' and @*[contains(translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), '" + keyword + "')]]";

I don't know what makes it working now, so anyone who can shed some light on this?

1

There are 1 answers

1
Jens Erat On

You're very close, but you mentioned translating the keyword to lower case but never do it. You will have to apply translate twice. I added some whitespace for better readability, you can but everything in one line again (and replace $keyword by the string concatenation you used, of course):

/posts/row[
  @PostTypeId='1' and @*[
    contains(
      translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
      translate($keyword, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')
    )
  ]
]

Make sure not to forget any non-ascii-characters like German umlauts or other accents! It might get annoying to implement such keywords. XPath 2.0 knows lower-case(...) which supports unicode.