Google scholar web scraping using USER ID

43 views Asked by At

I am trying to get the name, total citations since 2018, h-index since 2018 from google scholar using selenium and beautiful soup. However, the code is only able to get the name, and no data for citations or h-index. They are in a box like structure on the right side of the page in "div class = "gsc_rsb" role = "navigation"" Is there anyway to access the info here?

My goal: get the total citations and h-index for each user-id since 2018 (all of which is visible in a box on the right side of a user page What I have: all the unique 12 character user ids

Here is a snippet of my code:

try: # Visit the Google Scholar profile page driver.get(f'https://scholar.google.com/citations hl=en&view_op=list_works&sortby=pubdate&user={user_id}')

    # Wait for page to load (you can adjust the wait time)
    driver.implicitly_wait(10)  # Wait for 10 seconds

    # Extract faculty name - WORKING
    faculty_name = driver.find_element(By.CSS_SELECTOR, '#gsc_prf_in').text.strip()
    faculty_info['Name'] = faculty_name

    #total citations - NOT WORKING
    total_citations = driver.find_element(By.XPATH, '/html/body/div/div[12]/div[2]/div/div[4]/form/div[1]/table/tbody/tr[1]/td[2]/a').text
    faculty_info['Total Citations'] = total_citations


    # Extract H-index - NOT WORKING
    h_index = driver.find_element(By.CSS_SELECTOR, '#gsc_rsb_st tbody tr:nth-of-type(2) td:nth-of-type(3)').text
    faculty_info['H-Index'] = h_index

except Exception as e:
    print(f"Failed to retrieve data for user ID: {user_id}")
    print(e)

finally:
    # Close the Chrome driver
    driver.quit()
    return faculty_info

I tried both CSS selector and XPath for everything. I can get almost any info from the user page on google scholar except anything that is in the box on the right side of the page.

0

There are 0 answers