I made a simple program that uses the Google Finance API to grab stock data through HTTP requests and does some calculations on them.

The google-api looks like this(adds a new block of data every minute during trading hours):

https://www.google.com/finance/getprices?i=60&p=0d&f=d,o,h,l,c,v&df=cpct&q=AAPL

This works fine, however I have a huge list of stock-tickers I need to get data from. In order to loop through them without hitting a request limit I set a time interval of 2 seconds between the requests. There's over 5000 stocks, so this takes forever and I need it to get done in < 5 minutes in order for the algorithm to be useful.

I was wondering if there is a way to achieve this with HTTP requests? Or if I'm tackling this the wrong way. I can't download the data beforehand to do it on the client-side as I need to get the data as soon as the first quotes come out in the morning.

Programmed in JavaScript (nodejs), but answers in any language is fine. Here's the function that I call with 2 second intervals:

var getStockData = function(ticker, day, cb){
    var http = require('http');
    var options = {
        host: "www.google.com",
        path: "/"
    };

    ticker = ticker.replace(/\s+/g, '');

    var data = '';
    options.path = "/finance/getprices?i=60&p=" +day+"d&f=d,o,h,l,c,v&df=cpct&q=" + ticker;

    var callback = function(response){
        response.on('data', function(chunk){
            data +=chunk;
        });

        response.on('end', function(){
            var data_clean = cleanUp(data);
            if(data_clean === -1) console.log('we couldnt find anything for this ticker');

            cb(data_clean);
        })
    };

    http.request(options, callback).end();

};

Any help would be greatly appreciated.

1

There are 1 answers

3
user3666197 On

If designing against a certain API
with policy threshold ( refresh-rate ceiling, bandwidth limit, etc. )

  1. avoid refetching data the node has already received

using the as-is URL above, a huge block of data is being (re)-fetched, most rows of which, if not all, were already known from an "identical URL" call just 2 seconds before:

EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=60
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1482330600,116.84,116.84,116.8,116.8,225329
1,116.99,117,116.8,116.84,81304
2,117.26,117.28,116.99,117,225262
3,117.32,117.35,117.205,117.28,153225
4,117.28,117.33,117.22,117.32,104072
.
..
...
..
.
149,116.98,117,116.98,116.98,8175
150,116.994,117,116.98,116.99,2751
151,117,117.005,116.9901,116.9937,7774
152,117.01,117.02,116.99,116.995,13011
153,117.0199,117.02,117.005,117.02,9313
  1. review carefully API-specifications to send smarter requests, yielding minimum-footprint data
  2. watch API End-Of-Life signals, to find another source before API stops provisioning data

(cit.:) The Google Finance APIs are no longer available. Thank you for your interest.

As noted below, in the second comment, the inherent inefficiency of re-fetching repetitively the growing block of already known data is to be avoided.

A professional DataPump design ought use API details for doing this:

  • adding ts=1482330600 aTimeSTAMP ( unix-format [s] ) to define a start of "new" data to be retrieved, leaving those already seen before the time-stamp, out of the transmitted block.