Google Spanner - How do you copy data to another table?

5.5k views Asked by At

Since spanner does not have ddl feature like

insert into dest as (select * from source_table)

How do we select subset of a table and copy that rows into another table ?

I am trying to write data to temporary table and then move data to archive table at the end of day. But only solution i could find so far is, select rows from source table and write them to new table. Which is done using java api, and it does not have a ResultSet to Mutation converter, so i need to map every column of table to new table, even they are exactly same.


Another thing is updating just one column data, like there is no way of doing "update table_name set column= column-1 "

Again to do that, i need to read that row and map every field to update Mutation, but this is not useful if have many tables, i need to code for all of them, a ResultSet -> Mutation converted would be nice too.

Is there any generic Mutation cloner and/or any other way to copy data between tables?

3

There are 3 answers

2
Knut Olav Løite On

As of version 0.15 this open source JDBC Driver supports bulk INSERT-statements that can be used to copy data from one table to another. The INSERT-syntax can also be used to perform bulk UPDATEs on data.

Bulk insert example:

INSERT INTO TABLE  
(COL1, COL2, COL3)  
SELECT C1, C2, C3  
FROM OTHER_TABLE  
WHERE C1>1000  

Bulk update is done using an INSERT-statement with the addition of ON DUPLICATE KEY UPDATE. You have to include the value of the primary key in your insert statement in order to 'force' a key violation which in turn will ensure that the existing rows will be updated:

INSERT INTO TABLE  
(COL1, COL2, COL3)  
SELECT COL1, COL2+1, COL3+COL2  
FROM TABLE  
WHERE COL2<1000  
ON DUPLICATE KEY UPDATE  

You can use the JDBC driver with for example SQuirreL to test it, or to do ad-hoc data manipulation.

Please note that the underlying limitations of Cloud Spanner still apply, meaning a maximum of 20,000 mutations in one transaction. The JDBC Driver can work around this limit by specifying the value AllowExtendedMode=true in your connection string or in the connection properties. When this mode is allowed, and you issue a bulk INSERT- or UPDATE-statement that will exceed the limits of one transaction, the driver will automatically open an extra connection and perform the bulk operation in batches on the new connection. This means that the bulk operation will NOT be performed atomically, and will be committed automatically after each successful batch, but at least it will be done automatically for you.

Have a look here for some more examples: http://www.googlecloudspanner.com/2018/02/data-manipulation-language-with-google.html

0
abcdn On

Spanner supports expression in the SET section of an UPDATE statement which can be used to supply a subquery fetching data from another table like this:

UPDATE target_table
SET target_field = (
    -- use subquery as an expression (must return a single row)
    SELECT source_table.source_field
    FROM source_table
    WHERE my_condition IS TRUE
) WHERE my_other_condition IS TRUE;

The generic syntax is:

UPDATE table SET column_name = { expression | DEFAULT } WHERE condition
0
Aditya Rewari On

Another approach to perform Bulk update can be using LIMIT & OFFSET


insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000);

insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000 OFFSET 1001);

insert into dest(c1,c2,c3) (select c1,c2,c3 from source_table LIMIT 1000 OFFSET 2001);

.

.

.

reach till where required.

PS: This is more of a trick. But will definitely save you time.