I have a large arrow table with a string column called 'name' plus other columns. I want to split that into separate arrow tables, one for each value in name, holding the other data for that name.
For example,
input:
name A B
foo 1 2
bar 3 4
foo 5 6
bar 7 8
output:
table foo:
A B
1 2
5 6
table bar:
A B
3 4
7 8
I am currently doing this manually by iterating over all the rows and generating new builders in a hash table. But this is slow and is usually the wrong way to operate on an arrow table. How can I do this quickly using the arrow API?
You can use a map like this
You can also iterate over the map later to separate it into any other data structure. like vectors or vectors of pairs or whatever you prefer.