Python scipy GEV fit does not match distribution

Question

Python scipy GEV fit does not match distribution

1.9k views Asked by N1B4 At 28 December 2016 at 17:53

I have an array of 240 monthly maximum tides and am trying to fit a GEV curve to that data for return period calculations. However, the fitted GEV curve does not resemble the histogram of tides inputted to the GEV function.

import numpy as np
import matplotlib.pyplot as plt 
from scipy.stats import genextreme as gev

tides = np.array([204.25, 184.87, 164.15, 158.54, 194.47, 206.31, 212.04,
209.24, 186.28, 176.27, 181.72, 199.49, 205.97, 198.42, 187.2, 170.42, 188.22, 
193.66, 206.12, 204.03, 187.64, 188.66, 190.92, 191.3, 196.25, 191.91, 166.42, 
188.73, 192.57, 199.81, 193.57, 193.28, 198.45, 192.17, 200.9, 212.57, 205.65,
188.84, 175.5, 180.52, 199.2, 202.07, 209.27, 202.07, 187.95, 199.11, 206.81, 
235.44, 204.04, 195.15, 173.85, 163.17, 191.7, 201.87, 212.38, 207.92, 171.61,
186.32, 201.58, 222.89, 206.96, 200.68, 178.82, 183.91, 198.82, 209.23, 
224.03, 230.06, 199.87, 201.07, 205.59, 211.58, 210.78, 205.9, 182.66, 199.49, 
195.04, 196.12, 197.82, 203.91, 188.28, 196.81, 200.88, 201.25, 212.27, 
178.33, 173.86, 185.71, 191.83, 202.56, 195.54, 189.08, 184.48, 199.92,
206.66, 198.95, 188.12, 176.24, 161.95, 172.67, 196.1, 207.34, 208.96, 209.65,
178.95, 188.49, 211.91, 218.64, 201.82, 193.37, 170.33, 185.98, 201.05, 
212.28, 213.93, 204.78, 195.17, 196.68, 210.0, 211.09, 208.75, 191.5, 201.17,
190.19, 195.78, 197.68, 209.58, 205.62, 190.79, 198.04, 206.89, 210.84,
202.58, 180.44, 178.58, 191.25, 209.43, 205.74, 194.24, 192.74, 193.11, 
209.92, 214.03, 220.04, 187.46, 191.46, 161.37, 180.56, 192.58, 205.59, 208.1,
192.8, 180.27, 195.74, 201.17, 209.86, 201.87, 179.38, 167.11, 179.99, 208.07,
212.23, 205.14, 201.21, 180.63, 176.36, 190.89, 206.73, 205.34, 188.07, 
169.57, 176.18, 191.82, 194.07, 205.99, 204.98, 200.29, 190.52, 189.14, 
194.65, 188.97, 198.19, 178.03, 182.65, 194.29, 196.0, 193.19, 194.43, 179.63,
197.73, 204.24, 199.32, 209.48, 204.62, 193.44, 181.99, 196.02, 204.84, 209.4,
194.12, 175.39, 194.88, 208.65, 205.94, 197.69, 184.47, 172.59, 183.86,
199.14, 213.82, 206.46, 194.48, 175.3, 176.1, 194.91, 208.59, 209.01, 190.92,
191.17, 175.59, 195.32, 206.8, 217.82, 212.64, 195.08, 180.13, 190.87, 203.0, 
196.91, 189.42, 170.31, 170.07, 181.7, 187.96, 194.01, 207.64, 194.11, 192.11,
202.95, 197.85])

gev_fit = gev.fit(tides)

x = np.linspace(np.min(tides)-10,np.max(tides)+10,1000)
gev_pdf = gev.pdf(x, gev_fit[0], gev_fit[1], gev_fit[2])

plt.subplot(1,2,1)
plt.hist(tides, normed=True, alpha=0.2, label='Data')  
plt.xlabel('Tides (cm)')

plt.subplot(1,2,2)
plt.hist(tides, normed=True, alpha=0.2, label='Data')
plt.plot(x, gev_pdf, 'r--', label='GEV Fit')
plt.xlabel('Tides (cm)')
plt.legend(loc='upper left')
plt.show()

As you can see, the GEV fit does not match the original data's distribution at all. Do you have any suggestions to improve the fit?

Original Q&A

There are 1 answers

**Bill Bell** · Accepted Answer · 2016-12-30T21:52:46+00:00

I admit it: after a couple of hours of sweated effort this is what I have. It's a crude example of maximum spacing estimation that does seem to converge to a reasonable estimate vector.

import numpy as np
import matplotlib.pyplot as plt 
from scipy.stats import genextreme 
from scipy.optimize import minimize
from collections import Counter

tides = [204.25, 184.87, 164.15, 158.54, 194.47, 206.31, 212.04, 209.24, 186.28, 176.27, 181.72, 199.50, 205.97, 198.42, 187.2, 170.42, 188.22,  193.66, 206.12, 204.03, 187.64, 188.66, 190.92, 191.3, 196.25, 191.91, 166.42,  188.73, 192.57, 199.81, 193.57, 193.28, 198.45, 192.17, 200.9, 212.57, 205.65, 188.84, 175.5, 180.52, 199.2, 202.08, 209.27, 202.07, 187.95, 199.11, 206.81,  235.44, 204.04, 195.15, 173.85, 163.17, 191.7, 201.88, 212.38, 207.92, 171.61, 186.32, 201.58, 222.89, 206.96, 200.68, 178.82, 183.91, 198.82, 209.23,  224.03, 230.06, 199.87, 201.07, 205.60, 211.58, 210.78, 205.9, 182.66, 199.49,  195.04, 196.12, 197.82, 203.91, 188.28, 196.81, 200.88, 201.25, 212.27,  178.33, 173.86, 185.71, 191.83, 202.56, 195.54, 189.08, 184.48, 199.92, 206.66, 198.95, 188.12, 176.24, 161.95, 172.67, 196.1, 207.34, 208.96, 209.65, 178.95, 188.49, 211.91, 218.64, 201.82, 193.37, 170.33, 185.98, 201.05,  212.28, 213.93, 204.78, 195.17, 196.68, 210.0, 211.09, 208.75, 191.5, 201.17, 190.19, 195.78, 197.68, 209.58, 205.62, 190.79, 198.04, 206.89, 210.84, 202.58, 180.44, 178.58, 191.25, 209.43, 205.74, 194.24, 192.74, 193.11,  209.92, 214.03, 220.04, 187.46, 191.46, 161.37, 180.56, 192.58, 205.59, 208.1, 192.8, 180.27, 195.74, 201.18, 209.86, 201.87, 179.38, 167.11, 179.99, 208.07, 212.23, 205.14, 201.21, 180.63, 176.36, 190.89, 206.73, 205.34, 188.07,  169.57, 176.18, 191.82, 194.07, 205.99, 204.98, 200.29, 190.52, 189.14,  194.65, 188.97, 198.19, 178.03, 182.65, 194.29, 196.0, 193.19, 194.43, 179.63, 197.73, 204.24, 199.32, 209.48, 204.62, 193.44, 181.99, 196.02, 204.84, 209.4, 194.12, 175.39, 194.88, 208.65, 205.94, 197.69, 184.47, 172.59, 183.86, 199.14, 213.82, 206.46, 194.48, 175.3, 176.1, 194.91, 208.59, 209.01, 190.93, 191.17, 175.59, 195.32, 206.8, 217.82, 212.64, 195.08, 180.13, 190.87, 203.0,  196.91, 189.42, 170.31, 170.07, 181.7, 187.96, 194.01, 207.64, 194.11, 192.11, 202.95, 197.85] 
tides.sort()

#~ counts = Counter(tides)
#~ print (counts)

def fun(params,tides):
    F = lambda x: genextreme.cdf(x,*params)
    result = -sum([np.log(d) for d in np.diff([0]+[F(_) for _ in tides]+[1])])
    return result

shape = 0.5
location = 234
scale = 50
x0 = (shape, location, scale)

#~ fun(x0,tides)

result = minimize(fun, x0, args=tides, method='Nelder-Mead')
print (result)

x = np.linspace(np.min(tides)-10,np.max(tides)+10,1000)
#~ gev_pdf = gev.pdf(x, gev_fit[0], gev_fit[1], gev_fit[2])
f = lambda x: genextreme.pdf(x,*result.x)

plt.subplot(1,2,1)
plt.hist(tides, normed=True, alpha=0.2, label='Data')  
plt.xlabel('Tides (cm)')

plt.subplot(1,2,2)
plt.hist(tides, normed=True, alpha=0.2, label='Data')
plt.plot(x, [f(_) for _ in x], 'r--', label='GEV Fit')
plt.xlabel('Tides (cm)')
plt.legend(loc='upper left')
plt.show()

The debris in the code, such as Counter, represents work I put in to identify duplicated data. They result in inter-item spacings of zero which result in attempts to take the log of zero. What I did was to perturb items slightly. For instance, one of the items like 201.17 became 201.18.

This is the resulting graph.

Anyway, I'm glad you asked about this. Interesting problem.

TechQA.

Python scipy GEV fit does not match distribution

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in STATISTICS

Related Questions in DATA-FITTING

Popular Questions

Popular Tags

Trending Questions