I am trying to scrape a web page that has a reCaptcha v2, the resolution of the reCaptcha locally (windows 10 and chrome PC) is done correctly, but when I run it in production on a CentOS 7 server and when I click on the reCaptcha box I get the message:
Your computer or network may be sending automated queries To protect our users, we can't process your request right now For more detaile vigit pur_heWppge.
I have also created a computing instance in GCP with Ubuntu 20 and the same thing happens (even use const Xvfb = require('xvfb') for headless: false).
What other configurations can be applied to puppeteer?
browser = await puppeteer.launch({
args: ["--no-sandbox", "--disable-setuid-sandbox",
"--disable-blinkfeatures=AutomationControlled",
"--no-first-run", "--no-proxy-server"],
//headless: false
headless: 'new'
});
let page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0');
const customHeaders = {
'Accept-Language': 'es,es-ES;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
'Sec-Ch-UA-Platform': '"Windows"',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0'
};
await page.setExtraHTTPHeaders(customHeaders);
await page.goto(url);
You could try to use residential proxy if Centos server is deployed in a datacenter. Many websites blocks IPs of datacenter.