How much data can sklearn handle with kernel density estimation

1.5k views Asked by At

I have a data set with 40 million line (about 8Mb) while each line is of float type. I want to use sklearn kernel density estimation to fit this data set with gaussian kernel. But it's too slow on my pc (4GB RAM, 256GB SSD). So, can sklearn kde handle data set with million or more samples?

1

There are 1 answers

0
Hugues Fontenelle On BEST ANSWER

Yes, sci-kit can handle a lot of data. But as you found out, it might be that your machine is not enough. Alternatively you may need to use the software better. Read Strategies to scale computationally: bigger data from the sci-kit documentation.

Edit: Density estimation for large dataset on Cross Validated is quite relevant.