Patentsview API Python 3.4

1.2k views Asked by At

I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org.

Here is my code:

import urllib.parse
import urllib.request

#http://www.patentsview.org/api/patents/query?q={"_and":
[{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o=
{"matched_subentities_only": "true"}
author = "Jobs"
andreq = "_and"
invln = "inventor_last_name"
text = "_text_any"
patent = "patent_title"
match = "matched_subentities_only"
true = "true"
title = "computer"
urlbasic = "http://www.patentsview.org/api/patents/query"
patentall = {patent:title}
textall = {text:patentall}
invall = {invln:author}
andall = invall.copy()
andall.update(textall)
valuesq = {andreq:andall}
valuesqand = {andreq:andall}
valuesq = {andreq:valuesqand}
valueso = {match:true}

#########
url = "http://www.patentsview.org/api/patents/query"
values = {"q":valuesq,
          "o":valueso}
print(values)


data = urllib.parse.urlencode(values)
print(data)
############
data = data.encode("UTF-8")
print(data)
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
saveFile = open("patents.txt", "w")
saveFile.write(str(respData))
saveFile.close()

I think I got the right start for the dynamic URL - but the encoding seems to give me a HTTP Error 400: Bad request. If i dont encode, the url will be like www.somethingsomething.org/o:{....} which obviously produces an error. Here is the error:

Traceback (most recent call last):
  File "C:/Users/Max/PycharmProjects/KlayerValter/testen.py", line 38, in 
<module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

Process finished with exit code 1

If I encode, i get the same error since all brackets get converted. The API of patentsview works as follows:

http://www.patentsview.org/api/patents/query?q={"_or":[{"_and":
[{"inventor_last_name":"Whitney"},{"_text_phrase":{"patent_title":"cotton 
gin"}}]},{"_and":[{"inventor_last_name":"Hopper"},{"_text_all":
{"patent_title":"COBOL"}}]}]}

For dynamic programming I had to come up with all the library names. If there is also a better solution, please help.

Best Regards.

2

There are 2 answers

7
t.m.adam On BEST ANSWER

The api accepts and returns json data, so you should use json.dumps to encode your post data. Then use json.loads on the response if you want a dictionary, or just write to file.

from urllib.request import Request, urlopen
import json

url = "http://www.patentsview.org/api/patents/query"
author = "Jobs"
title = "computer"
data = {
    'q':{
        "_and":[
            {"inventor_last_name":author},
            {"_text_any":{"patent_title":title}}
        ]
    }, 
    'o':{"matched_subentities_only": "true"}
}
resp = urlopen(Request(url, json.dumps(data).encode()))
data = resp.read()
#data = json.loads(data)

As suggested by Christian, you could simply use requests, it's much better than urllib.

data = requests.post(url, json=data).json()

As for all those variables in your code, they compose a dictionary like the one below:

values = {"q":{andreq:{andreq:{invln:author, text:{patent:title}}}}, "o":{match:true}}

I don't see why you would go through all that trouble to build a dictionary but i could be wrong. However you could wrap your code in a function with author and title as arguments.


With requests you don't have to use json.dumps on your data, just use the json parameter. If you want to save the response content to file you should use the content or text attribute.

import requests

title = "computer" 
author = "Jobs" 
url = "http://www.patentsview.org/api/patents/query" 
data = { 
    "q":{ "_and":[ {"inventor_last_name":author}, {"_text_any":{"patent_title":title}}] }, 
    "o":{"matched_subentities_only":"true"} 
} 
resp = requests.post(url, json=data) 
with open("patents.txt", "w") as f:
    f.write(resp.text)
0
Parker Hancock On

As an alternative to PatentsView, take a look at patent_client! It's a python module that searches the live USPTO and EPO databases using a Django-style API. This includes the Patent Examination Data Set that backs the PatentsView API. The results from any query can then be cast into pandas DataFrames or Series with a simple .to_pandas() call.

from patent_client import USApplication

result = USApplication.objects.filter(first_named_inventor="<Name>")

# Returns an iterator of application objects matching the value.
# You can also go directly to a Pandas dataframe with:

result.to_pandas()

A great place to start is the User Guide Introduction

Patent Client Logo

PyPI | GitHub | Docs

(Full disclosure - I'm the author and maintainer of patent_client)