Stacked data using python IB API

668 views Asked by At

I am downloading 15Y of data (daily close) for 5 stocks ('A','AAP','AAPL','ABBV','ABC'). The issue is that I got some repetitions. No issue for the first one ,'A', I got the right amount of data. For the second one,'AAP', I have twice the right number of rows, it seems the data were downloaded twice. Same issue for the last 3 stocks for which I have three times the right number of rows. I have attached a screenshot showing the size of the csv files, these files should have the same size if everything was fine.
I suspect that the issue comes from the 10 seconds pause after calling reqHistoricalData; it may be too long. How could I avoid having duplicated rows and how to pause the right amount of time (not too long and not too short)?

enter image description here

import pandas as pd
import datetime as dt
import time
import collections
import threading
import os

from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from ibapi.common import BarData

path = r"D:\trading\data\debug\\" 

class IBapi(EWrapper, EClient):
    def __init__(self):
        EClient.__init__(self, self)
        self.data=collections.defaultdict(list)

    def nextValidId(self, orderId: int):
        super().nextValidId(orderId)
        self.nextorderId = orderId
        print('The next valid order id is: ', self.nextorderId)

    def error(self, reqId, errorCode, errorString):
        super().error(reqId, errorCode, errorString)
        print("Error. Id:", reqId, "Code:", errorCode, "Msg:", errorString)
        
    def historicalData(self, reqId:int, bar:BarData):
        self.data["date"].append(bar.date)
        self.data["close"].append(bar.close)
        self.df = pd.DataFrame.from_dict(self.data)

tickers = ["A","AAP","AAPL","ABBV","ABC"]

def run_loop():
    app.run()


app = IBapi()
app.connect("127.0.0.1", 7496, 5)
app.nextorderId = None

# Start the socket in a thread
api_thread = threading.Thread(target=run_loop, daemon=True)
api_thread.start()

# Check if the API is connected via orderid
while True:
    if isinstance(app.nextorderId, int):
        print('connected')
        break
    else:
        print('waiting for connection')
        time.sleep(1)
        
n_id = app.nextorderId

for ticker in tickers:    
    contract = Contract()
    contract.symbol = ticker
    contract.secType = "STK"
    contract.exchange = "SMART"
    contract.currency = "USD" 

    app.reqHistoricalData(n_id, contract, "","15 Y", "1 day", "TRADES", 1, 1, False, [])
    time.sleep(10)
    app.df.to_csv(path + ticker + ".csv")
        
    n_id = n_id + 1
    

app.disconnect()

2

There are 2 answers

3
brian On

You don't clear the list in between requests.

def historicalData(self, reqId:int, bar:BarData):
        # just keeps adding data to list
        self.data["date"].append(bar.date)
        self.data["close"].append(bar.close)
        # makes a new dataframe on every single bar
        self.df = pd.DataFrame.from_dict(self.data)

In the historicalDataEnd method you can make a dataframe and save it to a file. Make a dict of tickers and reqId's so you know which ticker is finished.

You should still have a 10 second delay in between calls for pacing but do not count on data being returned within 10 seconds. If it doesn't arrive, you will get an empty file (or in your case, all the previous tickers data, which seems to have happened with ABC).

0
Nick On

Your duplicates come every Friday. You make a request for, say Friday (1st iteration) and in the next 2 iterations (which are Saturday and Sunday) the API returns data from the first possible trading day (last Friday). Otherwise 5 seconds is enough time to wait.