Upsert into Splice Machine

248 views Asked by At

I'm writing a streaming job in Spark to load data into Splice Machine. I've followed the community tutorial using VTI to insert data into Splice, but all the examples perform INSERTs. On the contrary, I should perform UPSERTs of the records. Is there any way to achieve this?

Thank you.

1

There are 1 answers

2
Erin On BEST ANSWER

Yes you can do an upsert by changing the VTI statement to use the insertMode hint. Your statement would look something like the following:

INSERT INTO IOT.SENSOR_MESSAGES --splice-properties insertMode=UPSERT select s.* from new com.splicemachine.tutorials.sparkstreaming.kafka.SensorMessageVTI(?) s ( id varchar(20), enter code here location varchar(50), temperature decimal(12,5), humidity decimal(12,5), recordedtime timestamp );

Note that in your java code you would need to add a new line character (\n) after the hint, otherwise it thinks everything that is part of the hint. There was an issue with using hints with VTIs that has been fixed in the next release of splice machine if you are on a 2.0.x version, we can get the fix back ported to that version.

Two other hints that you may find useful when using a VTI are:

  • statusDirectory: Which puts the import / upsert messages in a bad directory on HDFS much like the SYSCS_UTIL.IMPORT_DATA statement does
  • badRecordsAllowed: Which indicates the number of bad records that are allowed before the process fails