I am currently working on a project scraping data from Google Scholar. I wish to scrape the country of residence of each profile, however this is not clearly listed. For example, from this page I would want UK, given that the email address listed is from ucl.ac.uk. To give another example, from this page I would want to give the Netherlands given the email address is from vumc.nl. However, if we were to look at this profile from the URL TLD, we cannot determine the country.
So far I have written this code to capture the domain:
import csv
from bs4 import BeautifulSoup
import urllib.request
import string
import time
url = 'https://scholar.google.com/citations?user=VGoSakQAAAAJ'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'lxml')
buttons = soup.findAll("div", { "id" : "gsc_prf_ivh" })
for each in buttons:
s = each.text
So, how can I determine from a user's Google Scholar profile, with fairly high accuracy, their country?