I like to print a pdf-version of my mediawikipage using pdfkit.
My mediawiki requires a valid login to see any pages.
I login to mediawiki using requests
, and this works, and I get some cookies. However, I am not able to use these cookies with pdfkit.from_url()
My python-script looks like this:
#!/usr/bin/env python2
import pdfkit
import requests
import pickle
mywiki = "http://192.168.0.4/produniswiki/"# URL
username = 'produnis' # Username to login with
password = 'seeeecret#' # Login Password
## Login to MediaWiki
# Login request
payload = {'action': 'query', 'format': 'json', 'utf8': '', 'meta': 'tokens', 'type': 'login'}
r1 = requests.post(mywiki + 'api.php', data=payload)
# login confirm
login_token = r1.json()['query']['tokens']['logintoken']
payload = {'action': 'login', 'format': 'json', 'utf8': '', 'lgname': username, 'lgpassword': password, 'lgtoken': login_token}
r2 = requests.post(mywiki + 'api.php', data=payload, cookies=r1.cookies)
print(r2.cookies)
So, right here I am successfully logged in, and cookies are stored in r2.cookies. The print()-command gives:
<RequestsCookieJar[<Cookie produniswikiToken=832a1f1da165016fb9d9a107ddb218fc for 192.168.0.4/>, <Cookie produniswikiUserID=1 for 192.168.0.4/>, <Cookie produniswikiUserName=Produnis for 192.168.0.4/>, <Cookie produniswiki_session=oddicobpi1d5af4n0qs71g7dg1kklmbo for 192.168.0.4/>]>
I can save the cookies into a file:
def save_cookies(requests_cookiejar, filename):
with open(filename, 'wb') as f:
pickle.dump(requests_cookiejar, f)
save_cookies(r2.cookies, "cookies")
This file looks like this: http://pastebin.com/yKyCpPTW
Now I want to print a specific page into PDF using pdfkit. Manpage states, that cookies can be set via a cookie-jar file:
options = {
'page-size': 'A4',
'margin-top': '0.5in',
'margin-right': '0.5in',
'margin-bottom': '0.5in',
'margin-left': '0.5in',
'encoding': "UTF-8",
'cookie-jar' : "cookies",
'no-outline': None
}
current_pdf = pdfkit.from_url(pdf_url, the_filename, options=options)
My Problem is: with this code, the "cookies" file becomes 0KB and the PDF states "You must be logged in to view a page..."
So my question is:
How can I use a requests.cookies in pdfkit.from_url()?
I had the same issue and overcame it with the following:
Depending on how much javascript you're trying to load you might want to set the
javascript-delay
to something higher or lower; the default is 200ms.