none of the parsers are finding all beautiful soup python

76 views Asked by At

I am trying a simple parsing of an html file which contains unit test results in the body

url = urllib2.urlopen('file:/randomstuff/results.txt').read()
soup = BeautifulSoup(url, 'lxml')
save = soup.body.findAll(text = re.compile("failed"))

the best I can get out of this is 1 instance of the text (when there are closer to 50) with lxml and html5lib. The other parsers find none. Is there anyway I can work around the broken html?

an example of the body is this

********* Finished testing of LogLevelTypeTest *********
********* Start testing of AppLoggerConfigTest *********
Config: Using QTest library 4.8.1, Qt 4.8.1
PASS : initTestCase
PASS : testSetFromEnvironment
PASS : cleanupTestCase
Totals: 3 passed, 0 failed, 0 skipped

Html Looks like this

<html>
   <head></head>
   <body>
   <pre style="word-wrap: break-word; white-space: pre-wrap;">
      "Common Unit Test Results"
      ...
      ...
   </pre>
 </body>

0

There are 0 answers