How can I speed up the process of integrating a complex function?

86 views Asked by At
import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt
import datetime
start = datetime.datetime.now()
plt.rcParams['axes.grid'] = True
XX=[None, 11.3, 14.8, 7.6, 10.5, 12.7, 3.9, 11.2, 5.4, 8.5, 5.0, 4.4, 7.3, 2.9, 5.7, 6.2, 7.3, 3.3, 4.2, 5.5, 4.2]
I = [None,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
N=20
    # Задаем G-ю функцию которая включает в себя произведения плотностей вероятности
    # каждого независимого результата Xi в которых МО и СКО изменяются по линейной зависимости
def G(M1, Mn, S1, Sn):
    def loc (M1, Mn, i, n):
        return (M1*(n-i)/(n-1)) + (Mn*(i-1)/(n-1)) 
    def scale (S1, Sn, i, n):
        return (S1*(n-i)/(n-1)) + (Sn*(i-1)/(n-1))
    def fnorm (x):
        return (np.exp((-x**2)/2))/(np.sqrt(2*np.pi))
    def y (M1, Mn, S1, Sn, i, n, x):
        return (x-loc(M1,Mn,i,n))/scale(S1,Sn,i,n)
    def F (M1, Mn, S1, Sn, i, n, x):
        return (fnorm(y(M1, Mn, S1, Sn, i, n, x)))/scale(S1, Sn, i, n)
    # Распаковка значений x и i из глобальных переменных
    x = XX[1:]
    i = I[1:]
    n=N
    # Вычисляем значение функции F для каждого x и i
    values=[F(M1, Mn, S1, Sn, i_val, n, x_val) for i_val, x_val in zip(i, x)]
    
    # Вычисляем произведение всех значений
    result = np.prod(values)
    
    return result
    # находим сомножитель К для получения общей ПВ оценок
options={'epsrel':1e-20, 'epsabs':1e-20}
K, errorK = integrate.nquad(G, ranges=[[0, 30],[0, 10],[0, 10],[0, 10]], opts=[options, options, options, options])
# K = 2.9613332457351404e-18    errorK = 9.999171231431291e-21
    #  формируем ПВ оценок
def pdf(Mn, S1, Sn,       M1):
    return (G(M1, Mn, S1, Sn) / K)
    #  строим график автономной ПВ оценок для параметра М1 (уменьшаем значения ошибок для оперативности)
def pdf_m1 (M1):
    return [(integrate.tplquad(pdf, 0, 10, 0, 10, 0, 10, args=(m,), epsabs=0.1, epsrel=0.1)[0]) for m in M1]
x = np.arange(4, 16, 0.2)
plt.plot(x, pdf_m1(x), 'k', lw=3)
plt.ylim(bottom=0)
plt.title('Fm1(m1)')
plt.show()
    # находим несмещенную оценку М1 и её ско sm1 (примерные значения по методу ММП М1 = 9.66)
def F1 (M1):
    return integrate.tplquad(pdf, 0, 10, 0, 10, 0, 10, args=(M1,))[0]
M1, _ = integrate.quad(lambda M1: M1 * F1(M1), 4, 16)
print(M1)
Dm1, _ = integrate.quad (lambda x: ((x-M1)**2) * F1(x), 4, 16)
sm1 = np.sqrt(Dm1)
print(sm1)
print("Время вычисления М1 и СКОм1:", datetime.datetime.now()-start)

The problem is efficiency. The code is running for so long, taking up to 15 hours! The task is mostly based on integrating functions. The example only considers integration for one variable, but in reality, you need to integrate for every variable specified in the PDF.

I use the Spyder IDE for this project.

I need to integrate functions with 4 variables or more.

1

There are 1 answers

3
Matt Haberland On

TLDR: avoid impossible tolerances and consider QMC integration.


One thing that would help is to avoid specifying tolerances that cannot possibly be achieved. Double precision floats have an epsilon on the order of 1e-16, so you cannot possibly hope to achieve 1e-20 relative error, but it has a huge impact on your runtime.

Consider, for instance:

from time import perf_counter
from scipy.stats import multivariate_normal
from scipy.integrate import nquad
rng = np.random.default_rng(23925340854238936)
options={'epsrel':1e-20, 'epsabs':1e-20}

for d in range(1, 4):
    cov = rng.random((d, d))
    cov = cov @ cov.T
    dist = multivariate_normal(cov=cov)
    x = np.asarray([(0, 1)]*d)
    tic = perf_counter()
    res = nquad(lambda *x: dist.pdf(x), x)
    # res = nquad(lambda *x: dist.pdf(x), x, opts=[options]*d)
    toc = perf_counter()
    print(toc - tic)
    ref = dist.cdf(x[:, 1], lower_limit=x[:, 0])

# 0.0012877020001269557
# 0.08465272499984167
# 0.5809959000000617

The execution time of numerical integration scales very poorly with dimensionality, and the curse is exacerbated by the tight tolerances. When we use your tolerances, the runtimes look like:

# 0.004921189000015147
# 0.4714967869999782
# 241.41230427999994

And all that time gave very little in terms of improved accuracy. With no option, the error of the 3d integral is res[0] - ref = 4.969670549248573e-07, and when we use the tolerances, the error is -3.4931814399397076e-07.

In fact, for your quadruple integral, we can already see: # K = 2.9613332457351404e-18 errorK = 9.999171231431291e-21 The second number is an absolute error estimate. I'm not sure what the true error is, but this means that the estimated relative error is around 3e-3. If you're not achieving great accuracy anyway, you might as well use scipy.integrate.qmc_quad, which can get reasonable accuracy in much less time.

from time import perf_counter
import numpy as np
from scipy.integrate import nquad, qmc_quad
from scipy.stats import multivariate_normal
rng = np.random.default_rng(23925340854238936)

d = 4
cov = rng.random((d, d))
cov = cov @ cov.T
dist = multivariate_normal(cov=cov)
x = np.asarray([(0, 1)]*d)

ref = dist.cdf(x[:, 1], lower_limit=x[:, 0])

tic = perf_counter()
res = nquad(lambda *x: dist.pdf(x), x)
toc = perf_counter()
print(f"nquad    | Time: {toc - tic}, Error: {abs((res[0] - ref)/ref)})")

tic = perf_counter()
res = qmc_quad(lambda x: dist.pdf(x.T), *x.T)
toc = perf_counter()
print(f"qmc_quad | Time: {toc - tic}, Error: {abs((res[0] - ref)/ref)})")
# nquad    | Time: 13.483493682000244, Error: 2.8983787517192293e-05)
# qmc_quad | Time: 0.0281571279997479, Error: 0.000120747399615769)

If you need better accuracy from qmc_quad in the same amount of time (at the expense of error estimate accuracy), decrease n_estimates and increase n_points by the same factor. If you can spare time for increased accuracy - since this is already a massive speedup - just increase n_points.