In Pandas, I am creating a dataframe that merges data from two different Beatbox queries. First, I pull all my Opportunity data, then I pull all my Account data, and then I merge.
However I would like to optimize this process by only pulling data for account['ID'] that exists in the oppty['AccountID'] column, as opposed to pulling the entirety of the Account data before a merge. How do I do this?
Current State:
query_result = svc.query("SELECT ID, AccountID FROM Opportunity")
records = query_result['records']
oppty = pd.DataFrame(records)
query_result = svc.query("SELECT ID, Website FROM Account")
records = query_result['records']
account = pd.DataFrame(records)
mylist = pd.merge(oppty, account, left_on='AccountID', right_on='ID', how='left')
You can use a SOQL semi-join to restrict the Account query to only those accounts with opportunities, e.g.