Smooth Approximation of KDE in python

2.5k views Asked by At

I am trying to get only non-negative values on the x-axis on the plot for my KDE. I know I can limit the x-axis values but I do not want that. Is there way to smoothly approximate the KDE such that there are no non-negative value? All my data are non-negative but I do not have a lot of sample points(max 500 and I cannot get more). I have also tried to adjust the bandwidth and its not looking nice.

for i in range(len(B)):
    ax = sns.kdeplot(data[i],shade=True)   
ax.set_xlabel('Maimum detection time')
ax.legend(['N=25,R=20', 'N=30,R=20', 'N=35,R=20'],fontsize=5)
plt.show()

plot figure

1

There are 1 answers

0
StupidWolf On

What goes on behind kdeplot is that a kernel density is fitted with many little normal density (see this illustration) and the densities at the very edge of the truncation cutoff spill over.

Using an example data:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy.stats import norm

np.random.seed(999)

data = pd.DataFrame({'a':np.random.exponential(0.3,100),
                     'b':np.random.exponential(0.5,100)})  

If you use clip=, it doesn't stop the evaluation at negative values:

for i in data.columns:
    ax = sns.kdeplot(data[i],shade=True,gridsize=200)

enter image description here

If you add cut=0, it will look odd. As you pointed out, you can truncate it at 0:

enter image description here

There are two solutions proposed in this post on cross-validated. I write a python implementation of the R code provided by @whuber:

def trunc_dens(x):
    kde = sm.nonparametric.KDEUnivariate(x)
    kde.fit()
    h = kde.bw
    w = 1/(1-norm.cdf(0,loc=x,scale=h))
    d = sm.nonparametric.KDEUnivariate(x)
    d = d.fit(bw=h,weights=w / len(x),fft=False)
    d_support = d.support
    d_dens = d.density
    d_dens[d_support<0] = 0
    return d_support,d_dens

We can check how it looks for data['a'] :

kde = sm.nonparametric.KDEUnivariate(data['a'])
kde.fit()
plt.plot(kde.support,kde.density)
_x,_y = trunc_dens(data['a'])
plt.plot(_x,_y)

enter image description here

You can plot it for both:

fig,ax = plt.subplots()
for i in data.columns:
    _x,_y = trunc_dens(data[i])
    ax.plot(_x,_y)

enter image description here