I have list of degrees of nodes in a network and I want to fit a powerlaw distribution to this empirical degree distribution. I am able to do it using the powerlaw library but for some reason get bad results when trying using the scipy curve_fit function.
I have a list of degrees of a network called degree
. I am able to fit a powerlaw distribution to these degrees via the powerlaw library. I do so as follows:
import powerlaw as pwl
fit_function = pwl.Fit(degree,xmin=1)
alpha = fit_function.power_law.alpha
sigma = fit_function.power_law.sigma
xmin = fit_function.power_law.xmin
print("alpha = ", alpha)
I end up obtaining alpha = 2.5 which is exactly in the range one would expect from a social network. I am then able to plot the data against this fitted distribution on a log log plot and get a nice looking result:
def powerlaw(x,alpha):
y = (alpha - 1) *(x)**(-alpha)
return y
x = np.linspace(min(unique_degrees), max(unique_degrees), 1000)
y = powerlaw(x, alpha)
plt.loglog(unique_degrees, counts_pdf, 'o', label='Degree Distribution')
plt.loglog(x, y, c='r', label='Fitted Power Law')
plt.legend()
plt.show()
Now when I try this using the scipy curve_fit
problem:
def powerlaw(x,alpha):
y = (alpha - 1) *(x)**(-alpha)
return y
bounds = ([2, 3])
p0 = [2.5]
params, _ = curve_fit(powerlaw,unique_degrees, counts_pdf,p0=p0,bounds=bounds)
print("alpha = ", params[0]
When I do this without the bounds I get a value of alpha = 1.46. When I include the bound, provided that the lower bound is always greater than 1.46, my alpha always equals this lower bound. I've tried plotting the data on a loglog plot and the results do not look good:
x = np.linspace(min(unique_degrees), max(unique_degrees), 1000)
y1 = powerlaw(x,\*params)
plt.loglog(unique_degrees, counts_pdf, 'o', label='Degree Distribution')
plt.loglog(x, y1, c='g', label='Fitted Power Law 1')
plt.legend()
plt.show()
Am I doing something wrong?
When you formulate a regression problem in a non linear fashion you probably shape multiple minima on the error function to be minimized.
Below an example of such case:
MCVE
Let's create a plausible dataset:
If we fit without precaution, we got:
If we draw the error function, we see the existence of two minima in your NLLS formulation:
To reach the optimum it is sufficient to move the initial parameters to the right of the peak: