Why the Selenium xpath to scrape ab table is NOT matching, although an attribute is unique given

60 views Asked by At

I try to scrape the NASDAQ values from the www.n-tv.de website. I'm crawling with SELENIUM through the Sites. The Stock Values are on the Site in Tables.

The Source COde of Table for Example is like:

<div class="tableholder">
  <table class="cnttable zebra to le">
    <thead>
      <tr>
        <th>Name</th><th class="ri">Kurs</th><th class="ri">%</th><th class="ri">Absolut</th><th class="ri hidden-xs-down">Relation</th><th class="ri hidden-xs-down">Zeit</th><th class="ri hidden-xs-down hidden-sm-down">Handelsvolumen</th><th class="hidden-xs-down hidden-sm-down">ISIN</th>
      </tr>
    </thead>
    <tbody>
      
      <tr class="linked" onclick="document.location='https://www.n-tv.de/boersenkurse/aktien/activision-blizzard-295693';">
        <td>Activision Blizzard</td>
        <td class="ri"><span class="icon_neg">66,53$</span></td>
        <td class="ri"><span class="neg">-1,42%</span></td>
        <td class="ri"><span class="neg">-0,96</span></td>
        <td class="relation hidden-xs-down"><span class="neg">&nbsp;<span><span></span></span><span style="border-width: 24px;"></span></span></td>
        <td class="ri hidden-xs-down">31.12.</td>
        <td class="ri hidden-xs-down hidden-sm-down">8 Tsd.</td>
        <td class="hidden-xs-down hidden-sm-down">US00507V1098</td>
      </tr>
  
      
      ...
  
    </tbody>
  </table>
</div>

SO I do not understand the following Problem:

Seachrching the Web Elements of NASDAQ table i will do per Xpath :

nasdaq = driver.find_element_by_xpath('//table[@class="cnttable zebra to le"]')
       
rows_nasdaq = nasdaq.find_elements_by_class_name('linked')

I have made another solution, that works correctly by searching the tableholder elements (3 on this site) and after listing them then taking only the third object, but i really want to understand, why that xpath selctor above is not working to this the element , although i have the class name unique on this site as an attribute of the table tag element.

I do not use css or something, has someone an idea, why in this case the xpath is not matching ??

1

There are 1 answers

3
HedgeHog On BEST ANSWER

Assumed yo like to scrape this url https://www.n-tv.de/boersenkurse/suche/?suchbegriff=to%20le.

You have to wait for element you try to find is present in the DOM and can use selenium waits for this:

nasdaq = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@class="cnttable zebra to le"]')))

Need to be imported

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Example:

....
driver.get('https://www.n-tv.de/boersenkurse/suche/?suchbegriff=to%20le')
nasdaq = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@class="cnttable zebra to le"]')))

for i in nasdaq.find_elements_by_class_name('linked'):
    print(i.get_attribute('onclick'))

Output

document.location='https://www.n-tv.de/boersenkurse/indizes/swx-sp-tra-leis-tr-303397';
document.location='https://www.n-tv.de/boersenkurse/aktien/apollo-tourism-+-leisure-1562996';
document.location='https://www.n-tv.de/boersenkurse/aktien/toqublanmonde--eo-047-11904326';
document.location='https://www.n-tv.de/boersenkurse/indizes/cb-p2p-onl-lend---digbanking-12533785';
document.location='https://www.n-tv.de/boersenkurse/indizes/concinngenddivwomin-leader-3254557';
document.location='https://www.n-tv.de/boersenkurse/indizes/concinnity-msos-leaders-39076931';
...

EDIT

Based on your comment I got the "link" - Issue, there was no table under url https://www.n-tv.de/ but the nasdaq is linked by https://www.n-tv.de/boersenkurse/indizes/nasdaq-849974 and there I found your table.

So it is not necessary to wait, but it can't hurt either. I have imported the table directly with pandas into a dataframe:

import pandas as pd
...
driver.get('https://www.n-tv.de/boersenkurse/indizes/nasdaq-849974')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//table[@class="cnttable zebra to le"]')))

pd.read_html(driver.page_source)[3]

Output

Note: Relation column is empty, cause there is no text stored in it and you can simply drop it, if you like

Name Kurs % Absolut Relation Zeit Handelsvolumen ISIN
Activision Blizzard 67,12$ -0,44% -30 nan 18:05 4 Mio. US00507V1098
Adobe 545,25$ -3,39% -1912 nan 18:05 2 Mio. US00724F1012
Advanced Micro Devices 141,89$ -5,55% -834 nan 18:05 44 Mio. US0079031078
Airbnb 167,86$ -2,79% -481 nan 18:05 2 Mio. US0090661010
Align Technology 629,44$ -2,87% -1861 nan 18:02 178 Tsd. US0162551016
... ... ... ... ... ... ... ...