I'm trying to extract some data from a given XML file. Therefore, I have to select some specific nodes by their attribute values. My XML looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<svg ....>
....
<g font-family="'BentonSans Medium'" font-size="12">
<text>bla bla bla</text>
....
</g>
....
</svg>
I've tried to escape the apostrophs in the value but I couldn't get it working.
from lxml import etree as ET
tree = ET.parse("file.svg")
root = tree.getroot()
xPath = ".//g[@font-family=''BentonSans Medium']"
print(root.findall(xPath))
I always get errors of this kind:
File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 214, in prepare_predicate
raise SyntaxError("invalid predicate")
Anyone got ideas how to select these nodes with XPath?
Try this:
Your code fails because you haven't put the closing single quote:
It should be after the last
'
:But it doesn't make the XPath expression correct, as
'
is interpreted just as is.By the way, if you want to check if the
font-family
contains the given string, usecontains()
XPath function with thexpath
method:Output
The sample code fetches all
g
elements withfont-family
attribute values containingBentonSans Medium
string.I don't know why the
findall
method doesn't work withcontains()
, but thexpath
seems more flexible, and I would recommend using this method instead.