How to use kdb+ to track an arbitrary number of IOT scalar streams?

125 views Asked by At

I am trying to use kdb+ to capture and do aggregations on a number of sensory streams collated from iot sensors.

Each sensor has a unique identifier a time component (.z.z) and a scalar value:

percepts:([]time:`datetime$(); id:`symbol$(); scalar:`float$())

However because the data is temporal in nature, it would seem logical to maintain separate perceptual/sensory streams in different columns, i.e.:

time  id_1    id_2 ...
15    0.15     ...
16    ...      1.5

However appending to a table indicatively only supports row operations in the insert fashion i.e. percepts insert (.z.z; `id_1; 0.15)

Seen as though I would like to support an large and non-static number of sensors in this setup, it would seem like an anti-pattern to append rows of the aforementioned format, before doing a transformation thereafter to turn the rows into columns based on their id. Would it be possible/necessary to create a table with a dynamic (growing) number of columns based upon new feature streams?

How would one most effectively implement logic that allows the insertion of columnar time series data averting the need to do a transform on row based data?

1

There are 1 answers

3
Rahul On BEST ANSWER

You can add data to a specific column. For that make following changes:

  • Make time column as key either permanently or during an update operation.
  • Use upsert to add data and pass data in table format.

Update function that I have mentioned below is specific to your example but you can make it more generic. It takes sensor name and sensor data as input. It performs 3 steps:

  • It first checks if the table is empty, in that case, set table schema as input dataset schema(which according to your example should be time and sensor name columns) and also make time as a primary key.
  • If the table has data but the column is missing for new sensor then first add a column with null float values and then upsert the data.
  • If a column is already there then just upsert the data.

    q)t:() / table to store all sensors data
    q)upd:{[s;tbl] `t set $[0=count t;`time xkey 0#tbl;not s in cols t;![t;();0b;enlist[s]!enlist count[t]#0Nf];t] upsert tbl}
    
    q)upd[`id1;([]time:1#.z.z;id1:1#14.4)]
    q)upd[`id2;([]time:1#.z.z;id2:1#2.3)]
    
time                    id1  id2
--------------------------------
2019.08.26T13:35:43.203 14.4    
2019.08.26T13:35:46.861      2.3

Some points regarding your design:

If all sensors are not sending data for each time entry then the table will have a lot of null values (similar to the sparse matrix) which would be waste of memory and will have some impact on queries as well. In that case, you can consider other design depending on your use case. For example, instead of storing each time entry, store data in time buckets. Another option is to group related sensors in a different table instead of storing all in one.

Another point you need to consider is you will have a fat table if you keep on adding sensors to it and that has its own issues. Also, it will become a single bottleneck point which could be an issue in the future and scaling it would be hard.

For small sensor sets, the current design is good but if you are planning to add many sensors in future then look into other design options.