Ruby: Cloud Spanner insert performance

524 views Asked by At

I came across the google-cloud-spanner gem for Ruby.

Using session.commit { |c| c.insert(table, row) } I can insert into Cloud Spanner easily.

However, I cannot exceed more than ~200 inserts per Second this way (from a computing instance in the same region). In order to increase performance, I would have to pass an Array of rows to the insert method: c.insert(table, [row, row, row,...]).

Why is Cloud Spanner working this way? Could this be due to networking overheads?

Inserting multiple records together is not always practical on my application layer.

EDIT:

Full example that shows creation of spanner client, etc:

spanner = Google::Cloud::Spanner.new(project: ..., keyfile: ...)
session = spanner.client(instance, database)

# Insert:
session.commit { |c| c.insert(table, row) }
1

There are 1 answers

0
mpp On

The performance issue you experiencing is not caused by ruby itself its just hows spanner works,

you commiting each row per commit, which will take ages try to do that as a package of each 500rows, it will speed up signifficantly

and be aware of mutations, in general your program needs to calculate it otherwise library will throw an exception that you hit upper limit

I recently tested loading 100mb/(110k rows) csv file into spanner as you did, and it takes 1h30min

when I rewrited my code to commit every 500rows its ended loading in 3minutes.

(do a reading about mutations, in my case 500rows was ok, in yours can be different )

that formula seems to be legit: https://github.com/googleapis/google-cloud-go/issues/1721#issuecomment-572273387