Blocking login overlay window when scraping web page using Selenium

Question

Blocking login overlay window when scraping web page using Selenium

976 views Asked by Ahmad Alghamdi At 12 October 2020 at 20:57

I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements. I have tried all the possible solutions:

Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).

This is my code:

options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')

driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)

review_dict = {'title':[], 'author':[],'rating':[]}


html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')


while True:
   
    table =  driver.find_element_by_xpath('//*[@id="all_votes"]/table')

    for product in table.find_elements_by_xpath(".//tr"):
        
        for td in product.find_elements_by_xpath('.//td[3]/a'):
            title = td.text
            review_dict['title'].append(title)

        for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
            author = td.text
            review_dict['author'].append(author)

        for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
            rating = td.text[0:4]
            review_dict['rating'].append(rating)
            
    try:
        close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
        close.click()
        
    except NoSuchElementException:
        continue
                
    try:
        element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
        element.click()
        
    except TimeoutException:    
        break
    
    
df = pd.DataFrame.from_dict(review_dict) 
df

Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay. Thanks in advance

Original Q&A

There are 1 answers

**Zvjezdan Veselinovic** · Answer 1 · 2020-10-13T06:31:12+00:00

Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects.

1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ).

In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button

def scroll_to_element(self, xpath : str):
        element = self.chrome_driver.find_element(By.XPATH, xpath)
        self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)

def get_book_count(self):
        return self.chrome_driver.find_elements(By.XPATH, "//div[@id='all_votes']//table[contains(@class, 'tableList')]//tbody//tr").__len__()

def click_next_page(self):
        # Scroll to last record and click "next page"
        xpath = "//div[@id='all_votes']//table[contains(@class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
        self.scroll_to_element(xpath)
        self.chrome_driver.find_element(By.XPATH, "//div[@id='all_votes']//div[@class='pagination']//a[@class='next_page']").click()

Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.

def is_displayed(self, xpath: str, int = 5):
        try:
            webElement = DriverWait(self.chrome_driver, int).until(
                DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
            )
            
            return True if webElement != None else False
        except:
            return False

def is_modal_displayed(self):
        return self.is_displayed("//body[@class='modalOpened']")

def close_modal(self):
        self.chrome_driver.find_element(By.XPATH, "//div[@class='modal__content']//div[@class='modal__close']").click()
        if(self.is_modal_displayed()):
            raise Exception("Modal Failed To Close")

I hope this helps you to solve your problem.

TechQA.

Blocking login overlay window when scraping web page using Selenium

There are 1 answers

Related Questions in PYTHON

Related Questions in SELENIUM

Related Questions in WEBDRIVER

Related Questions in OVERLAY

Related Questions in CHROME-OPTIONS

Popular Questions

Popular Tags

Trending Questions