How do I speed up profiled NumPy code - vectorizing, Numba?

379 views Asked by At

I am running a large Python program to optimize portfolio weights for (Markowitz) portfolio optimization in finance. When I Profile the code, 90% of the run time is spent calculating the portfolio return, which is done millions of times. What can I do to speed up my code? I have tried:

  • vectorizing the calculation of returns: made the code slower, from 1.5 ms to 3 ms
  • used the function autojit from Numba to speed up the code: no change

See example below - any suggestions?

import numpy as np


def get_pf_returns(weights, asset_returns, horizon=60):
    '''
    Get portfolio returns: Calculates portfolio return for N simulations,
    assuming monthly rebalancing.

    Input
    -----
    weights: Portfolio weight for each asset
    asset_returns: Monthly returns for each asset, potentially many simulations
    horizon: 60 months (hard-coded)

    Returns
    -------
    Avg. annual portfolio return for each simulation at the end of 5 years
    '''
    pf = np.ones(asset_returns.shape[1])
    for t in np.arange(horizon):
        pf *= (1 + asset_returns[t, :, :].dot(weights))
    return pf ** (12.0 / horizon) - 1


def get_pf_returns2(weights, asset_returns):
    ''' Alternative '''
    return np.prod(1 + asset_returns.dot(weights), axis=0) ** (12.0 / 60) - 1

# Example
N, T, sims = 12, 60, 1000  # Settings
weights = np.random.rand(N)
weights *= 1 / np.sum(weights)  # Sample weights
asset_returns = np.random.randn(T, sims, N) / 100  # Sample returns

# Calculate portfolio risk/return
pf_returns = get_pf_returns(weights, asset_returns)
print np.mean(pf_returns), np.std(pf_returns)

# Timer
%timeit get_pf_returns(weights, asset_returns)
%timeit get_pf_returns2(weights, asset_returns)

EDIT

Solution: Matmul was fastest on my machine:

def get_pf_returns(weights, asset_returns):
    return np.prod(1 + np.matmul(asset_returns, weights), axis=0) ** (12.0 / 60) - 1
2

There are 2 answers

1
hpaulj On BEST ANSWER

In my environment, mutmul (@) has a modest time advantage over einsum and dot:

In [27]: np.allclose(np.einsum('ijk,k',asset_returns,weights),asset_returns@weig
    ...: hts)
Out[27]: True
In [28]: %timeit asset_returns@weights
100 loops, best of 3: 3.91 ms per loop
In [29]: %timeit np.einsum('ijk,k',asset_returns,weights)
100 loops, best of 3: 4.73 ms per loop
In [30]: %timeit np.dot(asset_returns,weights)
100 loops, best of 3: 6.8 ms per loop

I think times are limited by the total number of calculations, more than the coding details. All of these pass the calculation to compiled numpy code. The fact that your original looped version is relatively fast probably has to do with the small number of loops (only 60), and memory management issues in the fuller dot.

And numba is probably not replacing the dot code.

So a tweak here or there might speed up your code by a factor of 2, but don't expect an order of magnitude improvement.

0
JoshAdel On

Here's a version that uses np.einsum to get a little bit of a speed-up:

def get_pf_returns3(weights, asset_returns, horizon=60):
    pf = np.ones(asset_returns.shape[1])
    z = np.einsum("ijk,k -> ij",asset_returns[:horizon,:,:], weights)
    pf = np.multiply.reduce(1 + z)
    return pf ** (12.0 / horizon) - 1

And then timings:

%timeit get_pf_returns(weights, asset_returns)
%timeit get_pf_returns3(weights, asset_returns)
print np.allclose(get_pf_returns(weights, asset_returns), get_pf_returns3(weights, asset_returns))

# 1000 loops, best of 3: 727 µs per loop
# 1000 loops, best of 3: 638 µs per loop
# True

The timings on your machine could be different depending on hardware and the libraries numpy is compiled against.