Sagemaker Monitor - MonitoringDatasetFormat as gz

48 views Asked by At

I have created a monitoring schedule to monitor predictions from a Batch Transform job. The schedule runs fine when the input dataset_format in BatchTransformInput is csv. However, my batch job is part of a workflow that takes as an input gz format.

Documentation suggests that MonitoringDatasetFormat only supports csv, json and parquet, can I defined it as gz?

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor import CronExpressionGenerator
from sagemaker.model_monitor import BatchTransformInput
from sagemaker.model_monitor import MonitoringDatasetFormat
from time import gmtime, strftime

my_monitor= DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_monitor.create_monitoring_schedule(
monitor_schedule_name=mon_schedule_name,

    # Inputs to run the monitoring schedule on the batch transform
    batch_transform_input=BatchTransformInput(
        data_captured_destination_s3_uri=s3_capture_upload_path,      
        destination="/opt/ml/processing/input",
        dataset_format=MonitoringDatasetFormat.csv(header=False),
    ),
    output_s3_uri=s3_report_path,
    statistics=statistics_path,
    constraints=constraints_path,
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,

)

1

There are 1 answers

0
Arun Lokanatha On

The default model monitor supports only these formats. I think you can do post processing to change form gz to one of these formats. Please refer the link below for post processing - https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html