how to quantile-discretize on spark?

Question

how to quantile-discretize on spark?

505 views Asked by Dylan Wang At 14 September 2017 at 14:16

i want to quantile-discretize RDD[Float] to 10 pieces without Spark.ML, so i need to calculate 10th-Percentile, 20th-Percentile...80th-Percentile,90th-Percentile

data-set is very big, can't collect to local!

have any efficient algorithm to solve this problem?

Original Q&A

There are 1 answers

**dumitru** · Accepted Answer · 2017-09-14T16:35:02+00:00

There is already provided this capability is your are using Spark version > 2.0. You have to convert your RDD[Float] to a dataframe. Use approxQuantile(String col, double[] probabilities, double relativeError) from DataFrameStatFunctions. From the documentation is says:

This method implements a variation of the Greenwald-Khanna algorithm (with some speed optimizations). The algorithm was first present in Space-efficient Online Computation of Quantile Summaries by Greenwald and Khanna

TechQA.

how to quantile-discretize on spark?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in MACHINE-LEARNING

Related Questions in FEATURE-ENGINEERING

Related Questions in BIGDATA

Popular Questions

Popular Tags

Trending Questions