Python requests not working to download a pdf file via url

56 views Asked by At

I usually use requests library to download pdfs that have an specific url; but this time it is not working and I think it may be related to the website. I've found on the web that adding the headers may work in some cases, but after trying with several of them the result is the same: file is dowloaded but unable to be opened as it seems to be damaged.

Have you an alternative method that may work to successfully download the pdf file from this site? here is the snippet of my latest attempt:

import requests

url = 'https://www.adgm.com/documents/operating-in-adgm/ongoing-obligation/enforcement/alpha-development-middle-east-ltd-penalty-notice-redacted.pdf?la=en&hash=5EA2DA7D1492D105375580EEF2FB088F'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}
response = requests.get(url, stream = True, headers = headers)

with open('sample.pdf', 'wb') as f:
    f.write(response.content)

Thanks,

Any alternative proposals to allow the correct download of the pdf file will be highly appreciated.

1

There are 1 answers

0
SIGHUP On

That particular site requires the Accept-Language and User-Agent headers. To download that document you could do this:

import requests

PDF = "alpha-development-middle-east-ltd-penalty-notice-redacted.pdf"

URL = f"https://www.adgm.com/documents/operating-in-adgm/ongoing-obligation/enforcement/{PDF}"

PARAMS = {
    "la": "en",
    "hash": "5EA2DA7D1492D105375580EEF2FB088F"
}

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_3_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    "Accept-Language": "en-GB,en;q=0.9,en-US;q=0.8,pt;q=0.7"
}

CHUNK = 32 * 1024

with requests.get(URL, headers=HEADERS, params=PARAMS, stream=True) as response:
    response.raise_for_status()
    with open(PDF, "wb") as output:
        for data in response.iter_content(CHUNK):
            output.write(data)