I am scraping forum post titles using the Firefox gecko driver with selenium in Python and have hit a snag that I can't seem to figure out.
~$ geckodriver --version
geckodriver 0.19.0
The source code of this program is available from
testing/geckodriver in https://hg.mozilla.org/mozilla-central.
This program is subject to the terms of the Mozilla Public License 2.0.
You can obtain a copy of the license at https://mozilla.org/MPL/2.0/.
I am trying to scrape a couple years worth of past post titles from the forum and my code works fine for a while. I've sat and watched it run for about 20-30 minutes and it does exactly what it is supposed to be doing. However then I kick the script off, and go to bed, and when I wake up the next morning I find that it's processed ~22,000 posts. The site I'm currently scraping has 25 posts per page, so it got through ~880 separate URL's before it crashes.
When it does crash it throws the following error:
WebDriverException: Message: Tried to run command without establishing a connection
Initially my code looked like this:
FirefoxProfile = webdriver.FirefoxProfile('/home/me/jupyter-notebooks/FirefoxProfile/')
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
driver.get(url)
### code to process page here ###
driver.close()
I've also tried:
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
for url in urls:
driver.get(url)
### code to process page here ###
driver.close()
and
for url in urls:
driver = webdriver.Firefox(FirefoxProfile, capabilities=firefox_capabilities)
driver.get(url)
### code to process page here ###
driver.close()
I get the same error in all 3 scenerios, but only after it's been running successfully for quite a while, and I'm not sure how to determine why it's failing.
How do I determine why I get this error after it's successfully processed several hundred url's? Or is there some sort of best practice I'm not following with Selenium/Firefox for processing this many pages?
All the 3 code blocks were near perfect but had minor flaws as follows:
Your first code block is :
The code block looks pretty much promising sans one issue. In the last step as per the
Best Practices
we must have invokeddriver.quit()
instead ofdriver.close()
which would have prevented from the danglingwebdriver
instances residing in theSystem Memory
. You can find the difference ofdriver.close()
&driver.quit()
here
.Your second code block is :
This block is error prone. Once the execution enters the
for()
loop and works on anurl
finally we are closing theBrowser Session/Instance
. So when the execution starts the loop for the second iteration, the script errors ondriver.get(url)
as there is noActive Browser Session
.Your third code block is :
The code block looks pretty much composed sans the same issue as the first code block. In the last step we must have invoked
driver.quit()
instead ofdriver.close()
which would have prevented from the danglingwebdriver
instances residing in theSystem Memory
. As the danglingwebdriver
instances creates chores and keeps on occupying the ports at some point of timeWebDriver
is unable to find a free port or unable to open up a newBrowser Session/Connection
. Hence you see the error as WebDriverException: Message: Tried to run command without establishing a connectionSolution :
As per
Best Practices
try to invokedriver.quit()
instead ofdriver.close()
and open a newWebDriver
instance and a newWeb Browser Session
.