I'm working with Apache Accumulo and I need to create a iterator that every minor compaction scans all whole row and create a MD5 of two column and save it as another column of my row...
Example:
I insert this data:
|| Row || colFam || colQual || value ||
||=====||========||=========||=======||
|| A || person || name || Bob ||
|| A || person || surname || Smith ||
|| A || work || place || Bank ||
|| B || person || name || Jhon ||
|| B || person || surname || Allen ||
|| B || work || place || Pub ||
...
...
I need a iterator that every time I write a row (A or B with all its colFam e colQual) it get the value of two column (name and surname) and calculate the MD5 of the resulting string (name + surname) and save it as a column of my row.
The result should be like that:
|| Row || colFam || colQual || value ||
||=====||========||=========||==============||
|| A || person || name || Bob ||
|| A || person || surname || Smith ||
|| A || work || place || Bank ||
|| A || MD5 || MD5 || <MD5 result> || <--
|| B || person || name || Jhon ||
|| B || person || surname || Allen ||
|| B || work || place || Pub ||
|| B || MD5 || MD5 || <MD5 result> || <--
....
....
I think that I can put this iterator on a minor(/major) compaction of a table.
Any idea? Which one of the Built-In iterator I have to extend to do that?
Thank you so much
Check out the TransformingIterator. This iterator tries to hide some of the complexities behind row-level operations.
The general strategy is that, when iterating over a row, you have to buffer the row in memory, perform your computation, and then write out the row in the correct sorted order.