How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

Question

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

7.9k views Asked by Imran Rafiq At 13 March 2019 at 09:43

I know the webscraping and I have taken the data from different website and I am using python language and selenium webdriver chrome. But I call a website it is open front page and then I click or go any other page then website restrict me and website know that I am using automated chrome.

Original Q&A

There are 2 answers

**Carlos** · Answer 1 · 2019-03-13T10:00:30+00:00

This may be because the website uses reCAPTCHA v3, which "allows you to verify if an interaction is legitimate without any user interaction". This means that they can identify if you are not a human without asking you to check the famous "I'm not a robot" box. That box is used in the former version of reCAPTCHA, v2.

Read more about reCAPTCHA here: https://developers.google.com/recaptcha/docs/versions

I don't think it's possible to work around this with Selenium. And, as was already mentioned, web scraping is often illegal.

**undetected Selenium** · Answer 2 · 2019-03-13T15:18:04+00:00

These days, websites can detect your program as a BOT pretty easily. Currently Google have 4(four) reCAPTCHA to choose and implement from when creating a new site.

reCAPTCHA v3
reCAPTCHA v2 ("I'm not a robot" Checkbox)
reCAPTCHA v2 (Invisible reCAPTCHA badge)
reCAPTCHA v2 (Android)

Solution

However there are some generic approaches to avoid getting detected while web-scraping:

The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds

Outro

See:

TechQA.

How can I scrape a website without getting detected and bypassing reCAPTCHA using selenium webdriver through Python?

There are 2 answers

Solution

Outro

Related Questions in PYTHON

Related Questions in SELENIUM-WEBDRIVER

Related Questions in WEB-SCRAPING

Related Questions in RECAPTCHA

Related Questions in WEBDRIVER-W3C-SPEC

Popular Questions

Trending Questions