Is there any form to write to BigQuery specifying the name of destination tables dynamically?

Question

Is there any form to write to BigQuery specifying the name of destination tables dynamically?

762 views Asked by bsmarcosj At 05 June 2015 at 11:57

Now I have:

bigQueryRQ
.apply(BigQueryIO.Write
    .named("Write")
    .to("project_name:dataset_name.table_name")
    .withSchema(Table.create_auditedTableSchema())
    .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
    .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));

But I need the "table_name" as a dynamic table name that depends on the "tablerow" data that I want to write.

Original Q&A

There are 2 answers

KrasK On 28 June 2015 at 14:10

I have the same problem. How about to group rows by tags, and apply BigQueryIO.Write for every group separately?

    public static class TagMarker extends DoFn<TableRow, TableRow> {

    private Map<String, TupleTag<TableRow>> tagMap;

    public TagMarker(Map<String, TupleTag<TableRow>> tagMap) {
        this.tagMap = tagMap;
    }

    @Override
    public void processElement(ProcessContext c) throws Exception {
        TableRow item = c.element();
        c.sideOutput(tagMap.get(getTagName(item)), item);
    }

    private String getTagName(TableRow row) {
        // There will be your logic of determinate table by row
        return "table" + ((String)row.get("msg")).substring(0, 1);
    }

}


private static class GbqWriter extends PTransform<PCollection<TableRow>, PDone> {

    @Override
    public PDone apply(PCollection<TableRow> input) {

        TupleTag<TableRow> mainTag = new TupleTag<TableRow>();
        TupleTag<TableRow> tag2 = new TupleTag<TableRow>();
        TupleTag<TableRow> tag3 = new TupleTag<TableRow>();

        Map<String, TupleTag<TableRow>> tagMap = new HashMap<String, TupleTag<TableRow>>();
        tagMap.put("table1", mainTag);
        tagMap.put("table2", tag2);
        tagMap.put("table3", tag3);

        List<TupleTag<?>> tags = new ArrayList<TupleTag<?>>();
        tags.add(tag2);
        tags.add(tag3);

        PCollectionTuple result = input.apply(
            ParDo.withOutputTags(mainTag, TupleTagList.of(tags)).of(new TagMarker(tagMap))
        );

        PDone done = null;
        for (String tableId : tagMap.keySet()) {
            done = writeToGbq(tableId, result.get(tagMap.get(tableId)).setCoder(TableRowJsonCoder.of()));
        }

        return done;
    }


    private PDone writeToGbq(String tableId, PCollection<TableRow> rows) {

        PDone done = rows
                .apply(BigQueryIO.Write.named("WriteToGbq")
                .to("<project>:<dataset>." + tableId)
                .withSchema(getSchema())
                .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
        );

        return done;
    }

}

I am not sure about rewriting variable done. Is it correct? Can it brake rewriting to GBQ after fail.

And this way is suitable only if you know list of tables which we want write to before parsing rows.

**Davor Bonaci** · Accepted Answer · 2015-06-05T20:58:29+00:00

Davor Bonaci On 05 June 2015 at 20:58 BEST ANSWER

Unfortunately, we don't provide an API to name the BigQuery table in a data-dependent way. Generally speaking, data-dependent BigQuery table destination(s) may be error prone.

That said, we are working on improving flexibility in this area. No estimates at this time, but we hope to get this soon.

TechQA.

Is there any form to write to BigQuery specifying the name of destination tables dynamically?

There are 2 answers

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-CLOUD-DATAFLOW

Popular Questions

Popular Tags

Trending Questions