How does one handle denormalized data with kettle?

1.6k views Asked by At

Kettle has "row normalizer" and "row denormalizer" steps, e.g.

http://wiki.pentaho.com/display/EAI/Row+Normalizer

but they require that you manually configure the fields in the denormalized table. I don't understand how this can be used practically, since the number of fields in the denormalized table depends on the number of rows in the normalized table, which is dynamic. E.g. in their example, there are three columns for three products in the denormalized input table, and the user must manually tell the transform how to handle each one. But in a real application the number of products will change dynamically. So this transform will only work with one table, at one moment in time. Anything with a different column count will fail.

I have dozens or hundreds of denormalized input files that look very much like their example, all with different column counts.

1

There are 1 answers

0
Notinlist On

I had a similar problem with denormalization. I had an /etc/group file with a structure like group:gid:member1,member2,...., and I denormalized it with a User Defined Java Class component, so finally I have fields group,gid,member. I know you need the other direction, but it may be a good starting point for you. Here is the source:

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    if(first)
        first = false;

    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
        for (int j=0; j<r.length; ++j)
            out_row[j] = r[j];
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}