I am trying a simple parsing of an html file which contains unit test results in the body
url = urllib2.urlopen('file:/randomstuff/results.txt').read()
soup = BeautifulSoup(url, 'lxml')
save = soup.body.findAll(text = re.compile("failed"))
the best I can get out of this is 1 instance of the text (when there are closer to 50) with lxml and html5lib. The other parsers find none. Is there anyway I can work around the broken html?
an example of the body is this
********* Finished testing of LogLevelTypeTest *********
********* Start testing of AppLoggerConfigTest *********
Config: Using QTest library 4.8.1, Qt 4.8.1
PASS : initTestCase
PASS : testSetFromEnvironment
PASS : cleanupTestCase
Totals: 3 passed, 0 failed, 0 skipped
Html Looks like this
<html>
<head></head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
"Common Unit Test Results"
...
...
</pre>
</body>