I am working on a project where I hope to scrape data from Google Scholar. I want to scrape all authors tagged in a category (eg. Anaphylaxis) and store their number of citations, h-index and i-10 index in a CSV file. However, I am unsure how to do this given that Google Scholar has no API. I understand I can use a scraper like beautiful soup but am unsure how to scrape the data without being blocked.
So, my question is how can I use bs4 to store all authors tagged as Anaphylaxis and each author's citations, h-index and i-10 index in a csv file.
All the scraper is doing is parsing some HTML pages. Upon a search, the authors are in the div with class = "gs_a" If you use Beautiful Soup and look for this class you will be able to find all of the authors. You can go page by page by updating the url.
https://scholar.google.ca/scholar?start=20&q=polymer&hl=en&as_sdt=0,5 to https://scholar.google.ca/scholar?start=30&q=polymer&hl=en&as_sdt=0,5
ie. The start=30 then 40 etc.
Then you can loop over the author names base on the link path in the gs_a class tags.
Let me know if this helps!
-Kyle