Drawing new data from KDE scikit-learn

41 views Asked by Edoardo Taccaliti At 14 January 2024 at 12:27

I am fitting a Kernel Density Estimation instance on multi-variate data using scikit-learn implementation. As parameters I am using a 'gaussian' kernel and as bandwidth estimator 'silverman'

.fit() is used to fit KDE on training data
.sample(n_istances) is used to generate n new data points from the distribution

However, i have noticed that many of the values generated are outside the ranges of the original variables. As example, i attach the output of the code used. where, i compare ranges from original training data and newly generated datapoints.

kde = KernelDensity(kernel = kernel, bandwidth= 'silverman').fit(data)
print('dat:',data)
examples = kde.sample(n_istances, random_state=0)

Each tuple represents a variable, where: index column, min variable, max variable and average

Imagine this, behaviour replicated to n dimensions since i am working with many variables. Its problematic. Is there a way to select a random sample from the distribution that allows to have data within the ranges of my original ones?

Original Q&A

TechQA.

Drawing new data from KDE scikit-learn

There are 0 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in SAMPLE

Related Questions in KDEPLOT

Popular Questions

Trending Questions