send python get request with splash and custom headers

936 views Asked by At

I want to use Python requests with splash browser (https://splash.readthedocs.io/en/stable/) and custom headers to crawl some data from a website. However, before starting the crawling itself I decided to check on this website http://xhaus.com/headers what headers I send. As a result, I see that I am not sending those headers I want to send.

import requests

def headers():

    headers = requests.utils.default_headers()

    headers.update({
        'User-Agent': random_user_agent()
        })
    return headers

def random_user_agent():
    with open('user-agents.txt','r') as f:
        user_agents = f.readlines()
        user_agents = [h.rstrip('\n') for h in user_agents]
        random_index = random.randint(0,len(user_agents)-1)
        ua = user_agents[random_index]
        return ua
splash = 'http://localhost:8050/render.html'
headers = headers()
url_h = 'http://xhaus.com/headers'
page = requests.get(splash, params={'url':url_h,},headers=headers)

After I run this code, I have the following user agent:

{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}

However, when I check it by the website I mentioned, it shows me a different user agent:

soup = BeautifulSoup(page.text)
print soup.prettify()

...

<td class="even">
       User-Agent
      </td>
      <td class="even">
       <b>
        Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) splash Safari/538.1
       </b>
      </td>

...
0

There are 0 answers