Scraping Authors based on tags from Google Scholar

Question

Scraping Authors based on tags from Google Scholar

1.2k views Asked by user7339949 At 25 December 2016 at 15:35

I am working on a project where I hope to scrape data from Google Scholar. I want to scrape all authors tagged in a category (eg. Anaphylaxis) and store their number of citations, h-index and i-10 index in a CSV file. However, I am unsure how to do this given that Google Scholar has no API. I understand I can use a scraper like beautiful soup but am unsure how to scrape the data without being blocked.

So, my question is how can I use bs4 to store all authors tagged as Anaphylaxis and each author's citations, h-index and i-10 index in a csv file.

Original Q&A

There are 2 answers

**Kyle Pastor** · Answer 1 · 2016-12-25T15:42:55+00:00

All the scraper is doing is parsing some HTML pages. Upon a search, the authors are in the div with class = "gs_a" If you use Beautiful Soup and look for this class you will be able to find all of the authors. You can go page by page by updating the url.

https://scholar.google.ca/scholar?start=20&q=polymer&hl=en&as_sdt=0,5 to https://scholar.google.ca/scholar?start=30&q=polymer&hl=en&as_sdt=0,5

ie. The start=30 then 40 etc.

Then you can loop over the author names base on the link path in the gs_a class tags.

Let me know if this helps!

-Kyle

**Milos Djurdjevic** · Answer 2 · 2021-05-07T12:50:15+00:00

To get all the profiles for any "category" (label:query), or a "name" you could use a third party solution like SerpApi. It's a paid API with a free trial.

Example python code (available in other libraries also):

from serpapi import GoogleSearch

params = {
  "api_key": "SECRET_API_KEY",
  "engine": "google_scholar_profiles",
  "q": "Coffee",
  "hl": "en",
  "mauthors": "label:anaphylaxis"
}

search = GoogleSearch(params)
results = search.get_dict()

Example JSON output:

"profiles": [
  {
    "name": "Jerrold H Levy",
    "link": "https://scholar.google.com/citations?hl=en&user=qnH5V28AAAAJ",
    "serpapi_link": "https://serpapi.com/search.json?author_id=qnH5V28AAAAJ&engine=google_scholar_author&hl=en",
    "author_id": "qnH5V28AAAAJ",
    "affiliations": "Professor of Anesthesiology and Surgery (Cardiothoracic)",
    "email": "Verified email at duke.edu",
    "cited_by": 80353,
    "interests": [
      {
        "title": "bleeding",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Ableeding",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:bleeding"
      },
      {
        "title": "anaphylaxis",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aanaphylaxis",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:anaphylaxis"
      },
      {
        "title": "anticoagulation",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Aanticoagulation",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:anticoagulation"
      },
      {
        "title": "shock",
        "serpapi_link": "https://serpapi.com/search.json?engine=google_scholar_profiles&hl=en&mauthors=label%3Ashock",
        "link": "https://scholar.google.com/citations?hl=en&view_op=search_authors&mauthors=label:shock"
      }
    ],
    "thumbnail": "https://scholar.googleusercontent.com/citations?view_op=small_photo&user=qnH5V28AAAAJ&citpid=2"
  },
  ...
}

You can check out the documentation for more details.

Disclaimer: I work at SerpApi.

TechQA.

Scraping Authors based on tags from Google Scholar

There are 2 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in GOOGLE-SCHOLAR

Popular Questions

Popular Tags

Trending Questions