I am trying to use the Wikimedia Commons Query Service[1] programmatically using Python, but am having trouble authenticating via OAuth 1.
Below is a self contained Python example which does not work as expected. The expected behaviour is that a result set is returned, but instead a HTML response of the login page is returned. You can get the dependencies with pip install --user sparqlwrapper oauthlib certifi
. The script should then be given the path to a text file containing the pasted output given after applying for an owner only token[2]. e.g.
Consumer token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Consumer secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access token
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
Access secret
deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
[1] https://wcqs-beta.wmflabs.org/ ; https://diff.wikimedia.org/2020/10/29/sparql-in-the-shadow-of-structured-data-on-commons/
[2] https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers
import sys
from SPARQLWrapper import JSON, SPARQLWrapper
import certifi
from SPARQLWrapper import Wrapper
from functools import partial
from oauthlib.oauth1 import Client
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
?file wdt:P180 wd:Q42 .
}
"""
def monkeypatch_sparqlwrapper():
# Deal with old system certificates
if not hasattr(Wrapper.urlopener, "monkeypatched"):
Wrapper.urlopener = partial(Wrapper.urlopener, cafile=certifi.where())
setattr(Wrapper.urlopener, "monkeypatched", True)
def oauth_client(auth_file):
# Read credential from file
creds = []
for idx, line in enumerate(auth_file):
if idx % 2 == 0:
continue
creds.append(line.strip())
return Client(*creds)
class OAuth1SPARQLWrapper(SPARQLWrapper):
# OAuth sign SPARQL requests
def __init__(self, *args, **kwargs):
self.client = kwargs.pop("client")
super().__init__(*args, **kwargs)
def _createRequest(self):
request = super()._createRequest()
uri = request.get_full_url()
method = request.get_method()
body = request.data
headers = request.headers
new_uri, new_headers, new_body = self.client.sign(uri, method, body, headers)
request.full_url = new_uri
request.headers = new_headers
request.data = new_body
print("Sending request")
print("Url", request.full_url)
print("Headers", request.headers)
print("Data", request.data)
return request
monkeypatch_sparqlwrapper()
client = oauth_client(open(sys.argv[1]))
sparql = OAuth1SPARQLWrapper(ENDPOINT, client=client)
sparql.setQuery(QUERY)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
print("Results")
print(results)
I have also tried without SPARQLWrapper, but just using requests+requests_ouathlib. However, I get the same problem --- HTML for a login page is returned --- so it seems it might actually be a problem with Wikimedia Commons Query Service.
import sys
import requests
from requests_oauthlib import OAuth1
def oauth_client(auth_file):
creds = []
for idx, line in enumerate(auth_file):
if idx % 2 == 0:
continue
creds.append(line.strip())
return OAuth1(*creds)
ENDPOINT = "https://wcqs-beta.wmflabs.org/sparql"
QUERY = """
SELECT ?file WHERE {
?file wdt:P180 wd:Q42 .
}
"""
r = requests.get(
ENDPOINT,
params={"query": QUERY},
auth=oauth_client(open(sys.argv[1])),
headers={"Accept": "application/sparql-results+json"}
)
print(r.text)
Disclaimer: I'm one of the authors of WCQS (and the author of, apparently a bit misleading, article linked in the question).
That way of authenticating is used for apps authenticating with Wikimedia Commons (or any other wikimedia app), but not with WCQS - which, itself, is an app authenticated with Wikimedia Commons. OAuth in this case is used strictly for a web app to authenticate users, but currently, you're unable to authenticate using OAuth for bots and other applications. Any kind of usage will require user login.
This is the limitation comes from our current setup and infrastructure and we plan to overcome that when we go into production (service is currently released in beta status). Unfortunately, I can't tell you when that happens - but it is important to us.
If you want to try out your bot before that happens, you can always log in the browser and use the token in your code, but it is bound to expire and some point and the process will need to be repeated. A simple modification to your second listing does the trick:
Note that asking on the the mailing list, directly on irc (freenode:#wikimedia-discovery) or creating a Phabricator ticket is the best way of getting help with WCQS.