I have been following a tutorial on creating a data warehouse using Pentaho Data Integration/Kettle.

The tutorial is based off of a CSV file but I am practicing with the northwinds database and postgresql I am trying to figure out how to select values from more than one table then output them into a single table.

My ETL process goes like this: I have several stages for each table, values are selected from each table and stored in a stage table for each table in the database, from there I have my dimensions table set up but I am trying to figure out the step between the stages and the dimensions which is where I am trying to select the values to update the dimensions table.

I have several stages set up for each of my tables at this point I am not sure if I should create a separate values table for each table or a single values table. Any help would be greatly appreciated. Thanks

When I try to select values from multiple tables I get an error that says "we detected rows with varying number of fields" It' seems I would need to create separate tables with

1 Answers

0
nsousa On

In kette, the metadata structure of the data stream cannot change. As such, if row 1 has 3 columns, one integer and two strings, for example, all rows must have the same structure.

If you're combining rows coming from different sources, you must ensure the structure is the same. That error is telling you that some of the incoming streams of data have a different number of fields.