Getting an anomaly score for every datapoint in SageMaker?

155 views Asked by At

I'm very new to SageMaker, and I've run into a bit of confusion as to how to achieve the output I am looking for. I am currently attempting to use the built-in RCF algorithm to perform anomaly detection on a list of stock volumes, like this:

apple_stock_volumes = [123412, 465125, 237564, 238172]

I have created a training job, model, and endpoint, and I'm trying now to invoke the endpoint using boto3. My current code looks like this:

apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
    client = boto3.client('sagemaker-runtime')
    
    body = " ".join(apple_stock_volumes)
    response = client.invoke_endpoint(
        EndpointName='apple-volume-endpoint',
        Body=body,
        ContentType='text/csv'
    )
    inference = json.loads(response['Body'].read())
    print(inference)

inference()

What I wanted was to get an anomaly score for every datapoint, and then to alert if the anomaly score was a few standard deviations above the mean. However, what I'm actually receiving is just a single anomaly score. The following is my output:

{'scores': [{'score': 0.7164874384}]}

Can anyone explain to me what's going on here? Is this an average anomaly score? Why can't I seem to get SageMaker to output a list of anomaly scores corresponding to my data? Thanks in advance!

Edit: I have already trained the model on a csv of historical volume data for the last year, and I have created an endpoint to hit.

Edit 2: I've accepted @maafk's answer, although the actual answer to my question was provided in one of his comments. The piece I was missing was that each data point must be on a new line in your csv input to the endpoint. Once I substituted body = " ".join(apple_stock_volumes) for body = "\n".join(apple_stock_volumes), everything worked as expected.

1

There are 1 answers

7
maafk On BEST ANSWER

In your case, you'll want to get the standard deviation from getting the scores from historical stock volumes, and figuring out what your anomaly score is by calculating 3 * standard deviation

Update your code to do inference on multiple records at once

apple_stock_volumes = [123412, 465125, 237564, 238172]
def inference():
    client = boto3.client('sagemaker-runtime')
    
    body = "\n".join(apple_stock_volumes). # New line for each record
    response = client.invoke_endpoint(
        EndpointName='apple-volume-endpoint',
        Body=body,
        ContentType='text/csv'
    )
    inference = json.loads(response['Body'].read())
    print(inference)

inference()

This will return a list of scores

Assuming apple_stock_volumes_df has your volumes and the scores (after running inference on each record):

score_mean = apple_stock_volumes_df['score'].mean()
score_std = apple_stock_volumes_df['score'].std()
score_cutoff = score_mean + 3*score_std

There is a great example here showing this