How do I get text from tables using PyQuery?

1.7k views Asked by At

I should start by saying I've not used pyquery much, so this question is probably easy, but I've tried a bunch of stuff and am stuck.

I'm using PyQuery to get info from a table. Here is the beginning of my table:

<table class="sortable" border="1" cellspacing="1" cellpadding="1" bordercolor="#333333">
    <thead>
        <tr class="headerfont">
            <td><div align="center">Year</div></td>
            <td><div align="center">Name</div></td>
            <td><div align="center">College</div></td>
            <td><div align="center">POS</div></td>
            <td align="center"><div align="center">Height <span style="font-size:10px;">(in)</span></div></td>
            <td align="center"><div align="center">Weight <span style="font-size:10px;">(lbs)</span></div></td>
            <td>Hand Size <span style="font-size:10px;">(in)</span></td>
            <td>Arm Length <span style="font-size:10px;">(in)</span></td>
            <td><div align="center"><span style="font-size:14px;">Wonderlic</span></div></td>
            <td><div align="center">40 <span style="font-size:12px;">Yard</span></div></td>
            <td><div align="center"><span style="font-size:12px;">Bench Press</span></div></td>
            <td style="font-size:14px;"><div align="center">Vert Leap <span style="font-size:10px;">(in)</span></div></td>
            <td style="font-size:14px;"><div align="center">Broad Jump <span style="font-size:10px;">(in)</span></div></td>
            <td>Shuttle</td>
            <td>3Cone</td>
            <td>60Yd Shuttle</td>
        </tr>
    </thead>
    <tbody>

It keeps going after the last line, but that's all the contents. So, if I run:

from pyquery import PyQuery as pq
table = pq(*stuff above*)
for c in table('thead tr td'):
    print c.text

I get:

None
None
None
None
None
None
Hand Size 
Arm Length 
None
None
None
None
None
Shuttle
3Cone
60Yd Shuttle

Obviously I don't want the 'None' ones as it's not correct. I tried various combos of thead tr td div but then I don't get the ones I'm getting. Then I tried making a list of the div ones first and counting through them to combine the lists, but it seems super hacky and I'm also not getting Wonderlic. Also, it seems the documentation says to use text(), but I get TypeError: 'NoneType' object is not callable when I try to add parens. Any insight would be greatly appreciated. Thanks!

2

There are 2 answers

0
Doubledown On BEST ANSWER

So as it turns out you have to add .items() to the end of the query to get the pyquery items instead of the htmlelements. Once I did this things like c.text() worked instead of throwing errors.

columns = [c.text() for c in table('thead tr td').items()]

This was much better as it uses the pyquery api as intended.

1
rofelia09 On

Your code is able to read all the tags and print the value of the tags either it is true or false. Try this it may help.

from pyquery import PyQuery as pq
table = pq(*stuff above*)
for c in table('thead tr td'):
   if c.text == True:
       print c.text
   else:
       continue