No results when scraping google scholar

35 views Asked by GYJ At 06 October 2023 at 02:26

I'm trying to scrape the search results of google scholar (https://scholar.google.com/scholar?hl=en&as_sdt=40000005&sciodt=0%2C22&cites=5652101630448192864&scipsc=&as_ylo=2015&as_yhi=) with BeautifulSoup. I need the titles, journal names, and (potentially) abstracts of these search result papers. When I send http get request to url, the response seems to contain information. However, when I try to use BeautifulSoup's find_all method to extract information in "divs" of "gs_r gs_or gs_scl," there are zero results.

I'm not sure if the issue is my IP address getting blocked or something, but does anyone know how to resolve this problem? Here is my code:

import requests
from bs4 import BeautifulSoup
import csv
import time

# the url from which our scraping starts
base_url = "https://scholar.google.com/scholar?hl=en&as_sdt=40000005&sciodt=0%2C22&cites=5652101630448192864&scipsc=&as_ylo=2015&as_yhi="

# headers to mask activity as from real user
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}

csv_filename = "GoyenkoHoldenTrzcinka_lit_review.csv"
with open(csv_filename, "w", newline="", encoding="utf-8") as csv_file:
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(["Title", "Journal", "Abstract"])

    page_number = 0
    while True:
        # make url for current page
        url = base_url + f"&start={page_number * 10}"

        # sent http get request to url
        response = requests.get(url, headers=headers)

        # parse html of the page
        soup = BeautifulSoup(response.text, "html.parser")

        # find search results div
        results = soup.find_all("div", class_="gs_r gs_or gs_scl")  # line with a potential issue

        # check if there's no more results
        if not results:
            break

        # loop through current page's search results
        for result in results:
            title = result.find("h3", class_="gs_rt").text

            journal_element = result.find("div", class_="gs_citi")
            journal = journal_element.text.strip() if journal_element else ""

            if journal in desired_journals:
                # attempt to extract the abstract
                abstract_element = result.find("div", class_="gs_rs")
                abstract = abstract_element.text if abstract_element else ""

                csv_writer.writerow([title, journal, abstract])

        # move to next page of results
        page_number += 1

        time.sleep(0.5)

print(f"Search results for desired papers were saved to {csv_filename}")

I tried looking at the html structure of google scholar's site again, but I think my code is indeed consistent with that. But still, when I run my code, my while loop immediately terminates and my resulting csv file is empty.

Original Q&A

TechQA.

No results when scraping google scholar

There are 0 answers

Related Questions in HTML

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in GOOGLE-SCHOLAR

Popular Questions

Trending Questions