How to group arrow table by column value in C++?

43 views Asked by At

I have a large arrow table with a string column called 'name' plus other columns. I want to split that into separate arrow tables, one for each value in name, holding the other data for that name.

For example,

input:
name A B
foo  1 2
bar  3 4
foo  5 6
bar  7 8

output:
table foo:
A B
1 2
5 6

table bar:
A B
3 4
7 8

I am currently doing this manually by iterating over all the rows and generating new builders in a hash table. But this is slow and is usually the wrong way to operate on an arrow table. How can I do this quickly using the arrow API?

1

There are 1 answers

0
Mohamed Mahmoud On

You can use a map like this

struct Row {
    std::string name;
    int A;
    int B;
};

// Example input data
std::vector<Row> data = {
    {"foo", 1, 2},
    {"bar", 3, 4},
    {"foo", 5, 6},
    {"bar", 7, 8}
};

// Map to store tables
std::map<std::string, std::vector<std::pair<int, int>>> tables;

// Iterate over the data and group rows by name
for (const auto& row : data) {
    tables[row.name].push_back({row.A, row.B});
}

You can also iterate over the map later to separate it into any other data structure. like vectors or vectors of pairs or whatever you prefer.