I made a simple program that uses the Google Finance API to grab stock data through HTTP requests and does some calculations on them.
The google-api looks like this(adds a new block of data every minute during trading hours):
https://www.google.com/finance/getprices?i=60&p=0d&f=d,o,h,l,c,v&df=cpct&q=AAPL
This works fine, however I have a huge list of stock-tickers I need to get data from. In order to loop through them without hitting a request limit I set a time interval of 2 seconds between the requests. There's over 5000 stocks, so this takes forever and I need it to get done in < 5 minutes in order for the algorithm to be useful.
I was wondering if there is a way to achieve this with HTTP requests? Or if I'm tackling this the wrong way. I can't download the data beforehand to do it on the client-side as I need to get the data as soon as the first quotes come out in the morning.
Programmed in JavaScript (nodejs), but answers in any language is fine. Here's the function that I call with 2 second intervals:
var getStockData = function(ticker, day, cb){
var http = require('http');
var options = {
host: "www.google.com",
path: "/"
};
ticker = ticker.replace(/\s+/g, '');
var data = '';
options.path = "/finance/getprices?i=60&p=" +day+"d&f=d,o,h,l,c,v&df=cpct&q=" + ticker;
var callback = function(response){
response.on('data', function(chunk){
data +=chunk;
});
response.on('end', function(){
var data_clean = cleanUp(data);
if(data_clean === -1) console.log('we couldnt find anything for this ticker');
cb(data_clean);
})
};
http.request(options, callback).end();
};
Any help would be greatly appreciated.
If designing against a certain API
with policy threshold ( refresh-rate ceiling, bandwidth limit, etc. )
using the as-is URL above, a huge block of data is being (re)-fetched, most rows of which, if not all, were already known from an "identical URL" call just 2 seconds before:
As noted below, in the second comment, the inherent inefficiency of re-fetching repetitively the growing block of already known data is to be avoided.
A professional DataPump design ought use API details for doing this:
ts=1482330600
aTimeSTAMP ( unix-format [s] ) to define a start of "new" data to be retrieved, leaving those already seen before the time-stamp, out of the transmitted block.