I've been trying to find good documentation to solve this ... but from what I can see from what little documentation, this code should have worked ... I'm rather curious as to why this isn't working, but I'm certainly not an expert.
>>> import sys
>>> import re
>>> from odf.opendocument import load
>>> from odf import text, teletype
>>> infile = load(r'C:\Users\Iainc\Documents\The Seventh Story.odt')
>>> for item in infile.getElementsByType(text.P):
... s = teletype.extractText(item)
... m = re.sub(r'\[\((?:(?!\[\().)*?\)\]', '', s);
... if m != s:
... new_item = text.P()
... new_item.setAttribute('stylename', item.getAttribute('stylename'))
... new_item.addText(m)
... item.parentNode.insertBefore(new_item, item)
... item.parentNode.removeChild(item)
... infile.save(r'C:\Users\Iainc\Documents\The Seventh Story 2.odt')
File "<stdin>", line 10
infile.save(r'C:\Users\Iainc\Documents\The Seventh Story 2.odt')
^^^^^^
SyntaxError: invalid syntax
This is supposed to go through a document full of multiple nested notes (ex, "[(blah blah [(blah [(blah (blah) blah)] )] blah )]") and remove all the notes, only leaving the text before the first "[(" or after the last ")]". I think this code should work to do this, so far as I can tell, but why the error? And I'm not certain even the filter is quite working as it should.
I don't know why you are getting the
SyntaxError, but to remove all the notes while leaving the text between each group of nested notes,re.subwill probably need to be called repeatedly in a loop.Your regex matches from
[(to the first occurence of)]that follows it, but not if[(appears again between them. This has the effect of matching the innermost note of each group of nested notes, which is then substituted for the empty string to remove it.To match across line endings you're going to need the re.DOTALL flag or to put
(?s)at the start of the regex, or to use a match-any-character class like[\S\s]instead of.For example: