I want to webscrape the website https://www.rome2rio.com. Below is the code that I came up. Sadly I see a captcha 99% of the times I try. Can someone give a hint on what could I add to the code or how could I modify it to improve this and avoid being detected.
Thanks
from selenium import webdriver
import undetected_chromedriver as uc
import time
import random
# Initialize undetected ChromeOptions
chrome_options = uc.ChromeOptions()
# Essential options to avoid detection
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--incognito")
# Correctly setting excludeSwitches within undetected_chromedriver context
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_argument("--start-maximized") # To start maximized
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option('useAutomationExtension', False)
# Rotating User-Agent
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
# Add more as needed
]
random_user_agent = random.choice(user_agents)
chrome_options.add_argument(f"user-agent={random_user_agent}")
# Adjusting viewport size to non-standard dimensions if needed
# chrome_options.add_argument("--window-size=1366,768") # Use only if you don't want to start maximized
# Use undetected_chromedriver to avoid detection
driver = uc.Chrome(options=chrome_options)
# Open the specified website
driver.get("https://www.rome2rio.com/map/Marseille/Paris")
# Mimicking human behavior with random sleep
time.sleep(random.uniform(2, 4))
# Proceed with your script...
# Close the driver after operations are complete
driver.quit()
I believe solving the captcha by using 2Captcha's or some other captcha solving service's API would be a more reliable solution than trying to evade detection. They might not be free, but their pricing is not an issue for most applications at 1-2$/1000 requests depending on the captcha type.