SageMaker ModelExplainabilityMonitor baseline job gives error ValueError: Expected 2D array, got 1D array instead:

297 views Asked by At

I am trying to create a SageMaker ModelExplainabilityMonitor for one of my ML model. In order to run the baseline for ModelExplaniabilityMonitor the suggest_baseline() method requires DataConfig, ModelConfig and ShapConfig. In the ShapConfig I need to provide the SHAP baseline which I am computing by taking mean of features as suggested here. The problem is when I run the method suggest_baseline(), it starts the SageMaker processing job creates the shadow endpoint but it gives endpoint retries error which is given below:

ClientError: An error occurred (ModelError) when calling the InvokeEndpoint operation (reached max retries: 0): Received server error (500) from primary with message "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <title>500 Internal Server Error</title> <h1>Internal Server Error</h1> <p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p> ". See https://eu-west-2.console.aws.amazon.com/cloudwatch/home?region=eu-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/sm-clarify-pipelines-xqkqe9pekm5f-MACEModel-2Al-1669644628-2315 in account 450538937006 for more information.

When I check the cloud watch logs of the shadow endpoint created by the baseline job it shows why the server was timed out which is given below:

ERROR - random_forest_training - Exception on /invocations [POST]

Traceback (most recent call last):
  File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
    return fn(*args, **kwargs)
  File "/opt/ml/code/random_forest_training.py", line 38, in predict_fn
    prediction = model[0].predict_proba(input_data)
  File "/miniconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 673, in predict_proba
    X = self._validate_X_predict(X)
  File "/miniconda3/lib/python3.7/site-packages/sklearn/ensemble/_forest.py", line 421, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "/miniconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py", line 388, in _validate_X_predict
    X = check_array(X, dtype=DTYPE, accept_sparse="csr")
  File "/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/miniconda3/lib/python3.7/site-packages/sklearn/utils/validation.py", line 623, in check_array
    "if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead: array=[-0.07272727 -0.538843    0.21109799 -0.11960932  0.23030303 -0.09173553
 -0.17808585 -0.19966942 -0.06921487  0.01707989  0.          0.
 -0.02214876 -0.17888805  0.00661157 -0.04977043  0.01818182  0.15619835
  0.39504132 -0.05785124  0.01157025].

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

The array that it is expecting in 2D is my shap baseline which I have computed by taking mean of features. The problem is I am already sending the baseline in 2D i.e. as List[List[float]]. But when I try to wrap it in another list i.e. as List[List[List[float]]] the schema validation for baseline jobs fail as it expects the baseline in following format:

  1. str (the URI to S3 object i.e. a CSV file having the shap values).
  2. List[List[float | int]]
  3. List[Dict[name_of_column: shap_value_for_column]]

I have tried all these three but each method yields the same error. Apart from that I am not able to find a way where I can transform these shap baseline.

Any help is appriciated.

0

There are 0 answers