Sort Flink DataSet based on multiple KeySelectors

156 views Asked by At

I want to sort a POJO DataSet based on multiple values with therefore Multiple KeySelector Functions:

    DataSet<PoJo> data = input
            .sortPartition(new KeySelector<PoJo, Integer>() { 
                 public Integer getKey(PoJo element) { return someKeyFromPojo(element); }}, Order.Descending)
            .sortPartition(new KeySelector<PoJo, Integer>() { 
                 public Integer getKey(PoJo element) { return anotherKeyFromPojo(element); }}, Order.Ascending);

This yields an "KeySelector cannot be chained" error. According to the Flink Documentation it should be possible to chain sortPartition Functions.

Is there a way to solve this without using Field expressions?

1

There are 1 answers

2
David Anderson On BEST ANSWER

You can chain sortPartition calls if and only if you use sortPartition(int field, Order order) or sortPartition(String field, Order order). sortPartition(KeySelector<T, K> keyExtractor, Order order) does not allow chaining.

If you must compute the key, your KeySelector can return whatever you like, so long as it is hashable and comparable. E.g., a Tuple such as (someKey, anotherKey).