Transforming less frequent values

Question

Transforming less frequent values

53 views Asked by igorvpcleao At 19 December 2016 at 04:43

Suppose I have the following columns for a csv that I read through a 'File Reader' node:

id, name, city, income

After reading it, I notice that the column 'city' contains a huge number of unique values. I want to:

Know which values are the 'k' most frequent for 'city'
Modify those which are not the 'k' most frequent to hold something like 'other'

Example:

id, name, city, income
1, Person 1, New York, 100.000 
2, Person 2, Toronto, 90.000
3, Person 3, New York, 50.000
4, Person 4, Seattle, 60.000

Choosing k to be 1, I want to produce the following table:

id, name, city, income
1, Person 1, New York, 100.000 
2, Person 2, Other, 90.000
3, Person 3, New York, 50.000
4, Person 4, Other, 60.000

It happens because 'New York' is the '1' most frequent value for 'city' in the original table.

Do you know how I can do that using Knime?

Thanks a lot!

Original Q&A

There are 1 answers

**Gábor Bakos** · Accepted Answer · 2016-12-19T05:41:03+00:00

You can use the CSV Reader to read the data. With the Statistics and Row Filter nodes you can find the k most frequent values. From those, you can create a collection cell using GroupBy. With that collection value, you can use Rule Engine with a similar ruleset:

$city$ IN $most frequent cities$ => $city$
TRUE => "Other"

TechQA.

Transforming less frequent values

There are 1 answers

Related Questions in KNIME

Popular Questions

Popular Tags

Trending Questions