Which datastructure to use in big values?

118 views Asked by At

I am writting a programme in mapreduce. I need to save a big value for each key. In detail for each id(key), I want to save a value that consists of large numbers. I used numbers from 1 to 100000000. for example:

id       value
1        1,3,9,23,56,345,.......,10000000000
2        6,8,45,321,876,.........,98760000876
.
.
.
100000000   1,2,6.83,90,126,567,.......,7632786765643

In each iteration the amount of numbers in each value increases. Firstly, I choose Text type for value, but in the results I saw that shuffle size became very big and I couldn't get answer. Then i choosed BitSet Type but the process of BitSet was very slow.I don't know which data structure, I can use that can provide me with size and process speed.Can anyone help? Thanks.

2

There are 2 answers

5
Patrick On

I think that you can associate a List for each key. So you can use a Map wich associates an ID to a List of numbers : Map<Integer, List<Long>>

3
Lahniep On

in Java the int data type is a 32-bit signed integer. It has a range of -2,147,483,648 to 2,147,483,647 which is not enough in your case. If you have a 64-bit machine, you can use 'int' type.

Otherwise, you can use a BigInteger

for me the appropriate data structure is a:

Map<Integer, List<BigInteger>>