Bigtable secondary indexes : Best practices/Recommended-ways

165 views Asked by At

What are the alternatives for powering secondary indexes on Google Bigtable (or any other distributed data-base which natively doesn't has any secondary indexing support).

Usecase :

  • I have ~5 billion rows on my 'Orders' table (row-key is orderId), and want to add a secondary index on the 'customerID' attribute
  • I can live with secondary index being eventually consistent (Would be great of strong consistency can be achieved).

Would like to know the possible ways by which this can be achieved. (pros/cons)

(I can think of a way of maintaining this secondary index using a separate table and managing it from the application layer itself. But want to understand the pros/cons around it, or any other recommended patterns).

2

There are 2 answers

0
Sathi Aiswarya On

Bigtable doesn’t have explicit support for secondary indexing.In Bigtable, schema design is driven primarily by the queries, or read requests, that you plan to send to the table. Because reading a row range is the fastest way to read your Bigtable data.

If you are open to other cloud services you can check Google's Cloud Spanner,which has built-in support for secondary indexes.you can refer to this document

In Spanner, you can also add a new secondary index to an existing table while the database continues to serve traffic. Like any other schema changes in Spanner, adding an index to an existing database does not require taking the database offline and does not lock entire columns or tables.

As mentioned here

Bigtable does not support coprocessors. You cannot create classes that implement the interface org.apache.hadoop.hbase.coprocessor.

0
Bora On

I am assuming you'll be querying for all orders for a given userid, as well as just looking up the order.

One option would be to bake the userid into the orderid. E.g. when you create an orderid, make sure the first few characters of the orderid is the userid or a hash of it. I believe UPS tracking codes include a 6 character shipper number for example.

If you have no control over orderid structure e.g. a third party assigns the orderid and provides it to the user then you can dual write in your client when the order is placed where one table has a key like userid-orderid and has the information about each order, and the other one is just orderid that has the userid associated with the order. Alternatively you can write into one table with userid-orderid then use Bigtable changestreams to catch it downstream and write the second entry.