I am trying to get the name, total citations since 2018, h-index since 2018 from google scholar using selenium and beautiful soup. However, the code is only able to get the name, and no data for citations or h-index. They are in a box like structure on the right side of the page in "div class = "gsc_rsb" role = "navigation"" Is there anyway to access the info here?
My goal: get the total citations and h-index for each user-id since 2018 (all of which is visible in a box on the right side of a user page What I have: all the unique 12 character user ids
Here is a snippet of my code:
try: # Visit the Google Scholar profile page driver.get(f'https://scholar.google.com/citations hl=en&view_op=list_works&sortby=pubdate&user={user_id}')
# Wait for page to load (you can adjust the wait time)
driver.implicitly_wait(10) # Wait for 10 seconds
# Extract faculty name - WORKING
faculty_name = driver.find_element(By.CSS_SELECTOR, '#gsc_prf_in').text.strip()
faculty_info['Name'] = faculty_name
#total citations - NOT WORKING
total_citations = driver.find_element(By.XPATH, '/html/body/div/div[12]/div[2]/div/div[4]/form/div[1]/table/tbody/tr[1]/td[2]/a').text
faculty_info['Total Citations'] = total_citations
# Extract H-index - NOT WORKING
h_index = driver.find_element(By.CSS_SELECTOR, '#gsc_rsb_st tbody tr:nth-of-type(2) td:nth-of-type(3)').text
faculty_info['H-Index'] = h_index
except Exception as e:
print(f"Failed to retrieve data for user ID: {user_id}")
print(e)
finally:
# Close the Chrome driver
driver.quit()
return faculty_info
I tried both CSS selector and XPath for everything. I can get almost any info from the user page on google scholar except anything that is in the box on the right side of the page.