AWS S3 storage and schema

3.7k views Asked by At

I have an IOT sensor which sends the following message to IoT MQTT Core topic:

{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}

I have added ACT/RULE which stores the incoming message in an S3 Bucket with the timestamp as a key(each message is stored as a seperate file/row in the bucket).

I have only worked with SQL databases before, so having them stored like this is new to me.

1) Is this the proper way to work with S3 storage?

2) How can I visualize the values in a schema instead of separate files?

3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:

"Amazon ML can't retrieve the schema. If you've just created this datasource, wait a moment and try again."

Appreciate all advice there is!

1

There are 1 answers

0
Dave On

1) Is this the proper way to work with S3 storage?

With only one sensor, using the [timestamp](https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html#iot-function-timestamp function in your IoT rule would be a way to name unique objects in S3, but there are issues that might come up.

  1. With more than one sensor, you might have multiple messages arrive at the same timestamp and this would not generate unique object names in S3.

  2. Timestamps from nearly the same time are going to have similar prefixes and designing your S3 keys this way may not give you the best performance at higher message rates.

Since you're using MQTT, you could use the traceId function instead of the timestamp to avoid these two issues if they come up.

2) How can I visualize the values in a schema instead of separate files?

3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:

For the third question, I think you could be running into a data format problem in ML because your S3 objects contain the JSON data from your messages and not a CSV.

For the second question, I think you're trying to combine message data from successive messages into a CSV, or at least output the message data as a single line of a CSV file. I don't think this is possible with just the Iot SQL language since it's intended to produce JSON.

One alternative is to configure your IoT SQL rule with a Lambda action and use a lambda function to make your JSON to CSV conversion and then write the CSV to your S3 bucket. If you go this direction, you may have to enrich your IoT message data with the timestamp (or traceId) as you call the lambda.

A rule like select timestamp() as timestamp, traceid() as traceid, concat(ID1, ID2, ID3, ValueMax) as values, * as message would produce a JSON like

{"timestamp":1538606018066,"traceid":"abab6381-c369-4a08-931d-c08267d12947","values":[10001,1001,101,123],"message":{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}}

That would be straightforward to use as the source for a CSV row with the data from its values property.