Row edit iterator in Apache Accumulo

224 views Asked by At

I'm working with Apache Accumulo and I need to create a iterator that every minor compaction scans all whole row and create a MD5 of two column and save it as another column of my row...

Example:

I insert this data:

|| Row || colFam || colQual || value ||
||=====||========||=========||=======||
|| A   || person || name    || Bob   ||
|| A   || person || surname || Smith ||
|| A   || work   || place   || Bank  ||
|| B   || person || name    || Jhon  || 
|| B   || person || surname || Allen ||
|| B   || work   || place   || Pub   ||
...
...

I need a iterator that every time I write a row (A or B with all its colFam e colQual) it get the value of two column (name and surname) and calculate the MD5 of the resulting string (name + surname) and save it as a column of my row.

The result should be like that:

|| Row || colFam || colQual || value        ||
||=====||========||=========||==============||
|| A   || person || name    || Bob          ||
|| A   || person || surname || Smith        ||
|| A   || work   || place   || Bank         ||
|| A   || MD5    || MD5     || <MD5 result> || <--
|| B   || person || name    || Jhon         || 
|| B   || person || surname || Allen        ||
|| B   || work   || place   || Pub          ||
|| B   || MD5    || MD5     || <MD5 result> || <--
....
....

I think that I can put this iterator on a minor(/major) compaction of a table.

Any idea? Which one of the Built-In iterator I have to extend to do that?

Thank you so much

1

There are 1 answers

0
elserj On

Check out the TransformingIterator. This iterator tries to hide some of the complexities behind row-level operations.

The general strategy is that, when iterating over a row, you have to buffer the row in memory, perform your computation, and then write out the row in the correct sorted order.