I have a pipeline execution with the below code:
PCollection<TableRow> test1 = ...
test1
.apply(BigQueryIO.Write
.named("test1 write")
.to("project_name:dataset_name.test1")
.withSchema(tableSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
PCollection<TableRow> test2 = ...
test2
.apply(BigQueryIO.Write
.named("test2 write")
.to("project_name:dataset_name.test2")
.withSchema(tableSchema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
If I execute the pipeline and neither table "test1" nor "test2" exists, I obtain the below information:
jun 09, 2015 12:29:24 PM com.google.cloud.dataflow.sdk.util.BigQueryTableInserter tryCreateTable
INFORMACIÓN: Trying to create BigQuery table: project_name:dataset_name.test1
jun 09, 2015 12:29:27 PM com.google.cloud.dataflow.sdk.util.RetryHttpRequestInitializer$LoggingHttpBackoffUnsuccessfulResponseHandler handleResponse
ADVERTENCIA: Request failed with code 404, will NOT retry: https://www.googleapis.com/bigquery/v2/projects/pragmatic-armor-455/datasets/audit/tables/project_name:dataset_name.test2/insertAll
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table project_name:dataset_name.test2",
"reason" : "notFound"
} ],
"message" : "Not found: Table project_name:dataset_name.test2"
}
Why only the first table is created?
Thanks in advance.
Thanks for reporting this. The cause was a bug in BigQueryIO that caused the second table to occasionally not be created. This bug has now been fixed in github with this commit. The fix will be pushed to maven later this month. Sorry for the trouble!