What is the most efficient way to insert single rows into bigquery?

125 views Asked by At

I'm currently using Bigquery in a Django application for logging data like events and those are mostly request based. With different requests, I insert data into different Bigquery tables. For example user login logs are kept in Bigquery.

I'm currently using bigquery.Client.insert_rows_json but only inserting a single JSON in each request and I want to optimize latency, cost and performance (and increase the rate limit).

I'm considering using the Bigquery Storage Write API with the default stream.

I wrote a working demo like

...
write_client = bigquery_storage_v1.BigQueryWriteClient()
...
request = types.AppendRowsRequest()
proto_data = types.AppendRowsRequest.ProtoData()
proto_data.rows = proto_rows
proto_data.writer_schema = proto_schema
request.proto_rows = proto_data
request.write_stream = stream_name

requests = [request]
write_client.append_rows(iter(requests))

With this implementation though I'm creating WriteClient for each insert and setting the writer_schema in each insert too (because I'm inserting into different tables depending on the request). I'm not sure if this will be any better than insert_rows_json.

What's the most efficient way to insert many single rows from multiple Django instances one by one.

Note: Up to 5 minute insertion latency is ok but I want to avoid data loss and duplication.

0

There are 0 answers