I am trying to use the patent_client
package in order to retrieve some information on patents's Assigments. Specifically, in the example below I am doing it patent by patent. However, I noticed that the API has increasingly larger pauses from time to time (I guess due to overload or something). The code I am implementing is as follows:
#main_df=main_df1.head(100)
import time
import numpy as np
pat_list = main_df1.patent_x.to_list()
from patent_client import Inpadoc, Assignment, USApplication
import numpy as np
# Lists to store data for –plotting
counts = []
durations = []
df = pd.DataFrame(columns=['patent', 'trans_date', 'trans_id', 'assignee'])
count = 0
for patent in pat_list:
start = time.time()
count = count + 1
try:
assignments = Assignment.objects.filter(patent_number=patent)
assignments_df = assignments.to_pandas()
rows = []
for _, row in assignments_df.iterrows():
trans_date = row.get('transaction_date', np.nan)
trans_id = row.get('id', np.nan)
assignee = row['assignees'][0]['name'] if 'assignees' in row and row['assignees'] and 'name' in row['assignees'][0] else ''
rows.append((patent, trans_date, trans_id, assignee))
df1=pd.DataFrame(rows,columns=['patent', 'trans_date', 'trans_id', 'assignee'])
df_complete = pd.concat([df,df1])
except Exception as e:
print(f"Error processing patent {patent}: {e}")
if count%300000==0:
print(count)
df_complete.to_csv(f"until_pat_{count}.csv", index=False)
stop = time.time()
duration = stop - start
# Store data for plotting
counts.append(count)
durations.append(duration)
print(count)
Is there a way to speed up the API (e.g. by putting some strategic sleeping time)? I read the documentation here but did not manage to find an API limit. Thank you