Pyquery invalidates html code

375 views Asked by At

I was using pyquery to construct a webpage:

> page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js"></script><script type="text/javascript" src="tools.min.js"></script></head><body></body></html>')
> print page
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>

The script (and body) tags aren't supposed end like that though. Firefox ignores the rest of the header.

I tried breaking the above up into single elements (ie adding one script tag at a time), but to no avail:

> page = PyQuery('<html><head></head></html>')
> page.find('head').append('<script type="text/javascript" src="jquery-1.4.min.js"/></script>')
> page.find('head').append('<script type="text/javascript" src="tools.min.js"></script>')
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>

The same thing happens with <iframe/> tags (forced to use these due to youtube), they don't get closed by firefox and all proceeding code is ignored.

How can I force pyquery to close these using a separate close tag, as I believe, is according to html standards.

Oh and if anyone's wondering, I'm not doing it all in beautifulsoup because (1) I get beautifulsoup errors and (2) it's a deprecated package, the author stopped supporting it about a year or two ago.

2

There are 2 answers

0
Answeror On BEST ANSWER

Try:

page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js">\n</script><script type="text/javascript" src="tools.min.js">\n</script></head><body></body></html>')

It also works with iframe.

0
catalin.costache On

You should use print page.__html__() to dump a html or, better, print page.html(method='html')