Speedd up the API retrieval

25 views Asked by At

I am trying to use the patent_client package in order to retrieve some information on patents's Assigments. Specifically, in the example below I am doing it patent by patent. However, I noticed that the API has increasingly larger pauses from time to time (I guess due to overload or something). The code I am implementing is as follows:

#main_df=main_df1.head(100)
import time
import numpy as np
pat_list = main_df1.patent_x.to_list()

from patent_client import Inpadoc, Assignment, USApplication
import numpy as np
# Lists to store data for –plotting
counts = []
durations = []
df = pd.DataFrame(columns=['patent', 'trans_date', 'trans_id', 'assignee'])


count = 0
for patent in pat_list:
    start = time.time()
    count = count + 1
    try:
        assignments = Assignment.objects.filter(patent_number=patent)
        assignments_df = assignments.to_pandas()
        rows = []
        for _, row in assignments_df.iterrows():
            trans_date = row.get('transaction_date', np.nan)
            trans_id = row.get('id', np.nan)
            assignee = row['assignees'][0]['name'] if 'assignees' in row and row['assignees'] and 'name' in row['assignees'][0] else ''
            rows.append((patent, trans_date, trans_id, assignee))
        df1=pd.DataFrame(rows,columns=['patent', 'trans_date', 'trans_id', 'assignee'])
        df_complete = pd.concat([df,df1])
    except Exception as e:
        print(f"Error processing patent {patent}: {e}")
    if count%300000==0:
        print(count)
        df_complete.to_csv(f"until_pat_{count}.csv", index=False)
        


stop = time.time()
duration = stop - start

# Store data for plotting
counts.append(count)
durations.append(duration)

print(count)

Is there a way to speed up the API (e.g. by putting some strategic sleeping time)? I read the documentation here but did not manage to find an API limit. Thank you

0

There are 0 answers