I am trying to scrape data from Google Patents with Beautiful Soup and add some columns to an existing csv. Here is an example of patent result. Here is my code:
with open ('patentdatacleaned.csv', 'r', encoding="ISO-8859-1") as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
for row in line[13].split():
r = requests.get(row)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class":"description"})
#with open('newpatentdata_class.csv', 'w', newline='', encoding="UTF-8") as write_obj:
# csv_writer = writer(write_obj)
for item in g_data:
print(item)
break
I managed this with the Claims, Description and Abstract, but I am not able to extract the Classification codes with the description. I tried various classes and div's and looked in detail at the child div's, but I can't find the problem. Please help.
To get codes from the Google patent page, you can use this example:
Prints:
EDIT: For status of the applications:
Prints: