I should start by saying I've not used pyquery much, so this question is probably easy, but I've tried a bunch of stuff and am stuck.
I'm using PyQuery to get info from a table. Here is the beginning of my table:
<table class="sortable" border="1" cellspacing="1" cellpadding="1" bordercolor="#333333">
<thead>
<tr class="headerfont">
<td><div align="center">Year</div></td>
<td><div align="center">Name</div></td>
<td><div align="center">College</div></td>
<td><div align="center">POS</div></td>
<td align="center"><div align="center">Height <span style="font-size:10px;">(in)</span></div></td>
<td align="center"><div align="center">Weight <span style="font-size:10px;">(lbs)</span></div></td>
<td>Hand Size <span style="font-size:10px;">(in)</span></td>
<td>Arm Length <span style="font-size:10px;">(in)</span></td>
<td><div align="center"><span style="font-size:14px;">Wonderlic</span></div></td>
<td><div align="center">40 <span style="font-size:12px;">Yard</span></div></td>
<td><div align="center"><span style="font-size:12px;">Bench Press</span></div></td>
<td style="font-size:14px;"><div align="center">Vert Leap <span style="font-size:10px;">(in)</span></div></td>
<td style="font-size:14px;"><div align="center">Broad Jump <span style="font-size:10px;">(in)</span></div></td>
<td>Shuttle</td>
<td>3Cone</td>
<td>60Yd Shuttle</td>
</tr>
</thead>
<tbody>
It keeps going after the last line, but that's all the contents. So, if I run:
from pyquery import PyQuery as pq
table = pq(*stuff above*)
for c in table('thead tr td'):
print c.text
I get:
None
None
None
None
None
None
Hand Size
Arm Length
None
None
None
None
None
Shuttle
3Cone
60Yd Shuttle
Obviously I don't want the 'None' ones as it's not correct. I tried various combos of thead tr td div
but then I don't get the ones I'm getting. Then I tried making a list of the div ones first and counting through them to combine the lists, but it seems super hacky and I'm also not getting Wonderlic. Also, it seems the documentation says to use text()
, but I get TypeError: 'NoneType' object is not callable when I try to add parens. Any insight would be greatly appreciated.
Thanks!
So as it turns out you have to add .items() to the end of the query to get the pyquery items instead of the htmlelements. Once I did this things like c.text() worked instead of throwing errors.
This was much better as it uses the pyquery api as intended.