I need to add an ATTLIST declaration to the DOCTYPE tag in html documents.
After reading the documentation and googling, this is what I've come up with:
from bs4 import BeautifulSoup, Doctype # minimal html document doc = """<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" > <html/>""" soup = BeautifulSoup(doc, features='html.parser') # the modified doctype tag doctype = """<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [<!ATTLIST span bodyref CDATA #IMPLIED>] >""" dt = BeautifulSoup(doctype, features='html.parser') for item in soup.contents: if isinstance(item, Doctype): item.replace_with(dt) break print(soup.prettify(formatter=None))
This produces the desired result, but it feels a bit "hacky". I'd like to just insert the ATTLIST part into the tag, and not replace the whole thing, as I've done here. Does anyone know how to do that?