How to use requests.cookies in pdftkit/wkhtmltopdf?

2.3k views Asked by At

I like to print a pdf-version of my mediawikipage using pdfkit. My mediawiki requires a valid login to see any pages. I login to mediawiki using requests, and this works, and I get some cookies. However, I am not able to use these cookies with pdfkit.from_url()

My python-script looks like this:

#!/usr/bin/env python2
import pdfkit
import requests
import pickle

mywiki          = "http://192.168.0.4/produniswiki/"# URL 
username        = 'produnis'                        # Username to login with
password        = 'seeeecret#'                      # Login Password
## Login to MediaWiki
# Login request
payload = {'action': 'query', 'format': 'json', 'utf8': '', 'meta': 'tokens', 'type': 'login'}
r1 = requests.post(mywiki + 'api.php', data=payload)

# login confirm
login_token = r1.json()['query']['tokens']['logintoken']
payload = {'action': 'login', 'format': 'json', 'utf8': '', 'lgname': username, 'lgpassword': password, 'lgtoken': login_token}
r2 = requests.post(mywiki + 'api.php', data=payload, cookies=r1.cookies)
print(r2.cookies)

So, right here I am successfully logged in, and cookies are stored in r2.cookies. The print()-command gives:

<RequestsCookieJar[<Cookie produniswikiToken=832a1f1da165016fb9d9a107ddb218fc for 192.168.0.4/>, <Cookie produniswikiUserID=1 for 192.168.0.4/>, <Cookie produniswikiUserName=Produnis for 192.168.0.4/>, <Cookie produniswiki_session=oddicobpi1d5af4n0qs71g7dg1kklmbo for 192.168.0.4/>]>

I can save the cookies into a file:

def save_cookies(requests_cookiejar, filename):
    with open(filename, 'wb') as f:
        pickle.dump(requests_cookiejar, f)
save_cookies(r2.cookies, "cookies")

This file looks like this: http://pastebin.com/yKyCpPTW

Now I want to print a specific page into PDF using pdfkit. Manpage states, that cookies can be set via a cookie-jar file:

options = {
    'page-size': 'A4',
    'margin-top': '0.5in',
    'margin-right': '0.5in',
    'margin-bottom': '0.5in',
    'margin-left': '0.5in',
    'encoding': "UTF-8",
    'cookie-jar' : "cookies",
    'no-outline': None
}
current_pdf = pdfkit.from_url(pdf_url, the_filename, options=options)

My Problem is: with this code, the "cookies" file becomes 0KB and the PDF states "You must be logged in to view a page..."

So my question is:

How can I use a requests.cookies in pdfkit.from_url()?

1

There are 1 answers

0
Mike Davlantes On

I had the same issue and overcame it with the following:

import requests, pdfkit

# Get login cookie
s = requests.session()  # if you're making multiple calls
data = {'username': 'admin', 'password': 'hunter2'}
s.post('http://example.com/login', data=data)

# Get yourself a PDF
options = {'cookie': s.cookies.items(), 'javascript-delay': 1000}
pdfkit.from_url('http://example.com/report', 'report.pdf', options=options)

Depending on how much javascript you're trying to load you might want to set the javascript-delay to something higher or lower; the default is 200ms.