python-docx: Remove bibliography (w:sdt) section

38 views Asked by At

My end goal here is to find and remove any bibliography section from a Microsoft Word document.

As mentioned in this issue. There currently isn't any API support for w:sdt tags (which I think is only for bibliographies). However, in response to this issue, a work-around was given of:

paragraph = ... # however you get the paragraph, maybe with `for paragraph in document.paragraphs`
p = paragraph._element
sdts = p.xpath('w:sdt')
for sdt in sdts:
    parent = sdt.getparent()
    parent.remove(sdt)

When using the above, the document's xml updates and deletes the w:sdt tags as I want it to. However, when I save the new document the output .docx file still includes the bibliography section.

Why is it that despite the python-docx document's xml not including the w:sdt elements does this not reflect once the document is saved and opened in Microsoft Word. I have read here that document relationships could have something to do this with but given that I get no error when opening the new word document, I don't think this is the problem. Any ideas?

Thanks in advance.

Note: I am asking this question here as the python-docx GitHub hasn't had much activity recently.

0

There are 0 answers