I'm in a scraping project and I'm trying to get a page of course.

Here is the code I'm using to open the page:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("user-data-dir=selenium")
print("Opening browser")
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", options=chrome_options)
print("getting request")
driver.get("http://www.tsetmc.com/Loader.aspx?ParTree=15131F")
print("starting wait")
time.sleep(10)
response = driver.page_source
print("got response, quitting...")
driver.quit()

my problem

The problem is that it does nothing when it reachs driver.get() I mean it neither ends the process nor prints "starting wait". (The problem persists both on my laptop and the server) I have tried removing the --headless option, and it works fine on my laptop (Ubuntu 20.04), but when I upload it to my server and run it there (Ubuntu Server 18.04) it chrome crashes (exception message below)

Message: unknown error: Chrome failed to start: exited abnormally. (unknown error: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

So I have came to conclusion that I have to use --headless option since there is no GUI on my server and chrome crashes when it isn't there.

In conclusion I need help to troubleshoot the problem of infinite waiting on driver.get()

PS: I can run run the code below with no problem which is weird for me:

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("user-data-dir=selenium")
browser = webdriver.Chrome('/usr/lib/chromium-browser/chromedriver', options=chrome_options)
print("open browser")
browser.get("https://www.codal.ir")
print("get")
time.sleep(10)
response = browser.page_source
print("response")
browser.quit()
1

There are 1 answers

0
undetected Selenium On

Assuming you have created a Chrome Profile by the name selenium you need to add -- before the argument user-data-dir and pass the absolute path of the Chrome Profile Directory as follows:

chrome_options.add_argument("--user-data-dir=/path/to/chromium-profile/selenium")

As an alternative, you can also use the argument --profile-directory as follows:

options.add_argument('--profile-directory=selenium')