SageMaker Clarify Bias Detection for multiple facets and labels

306 views Asked by At

In the Fairness and Explainability with SageMaker Clarify example, I am running a bias analysis on the 'Sex' facet ,where the facet value is 0, and the label is 0:

bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
                            facet_name='Sex',
                            facet_values_or_threshold=[0],
                            group_name='Age')

This raises 2 questions:

  1. How would I use it to detect bias on multi-label dataset? (I tried label_values_or_threshold=[0,1] but it didn't work). Would I need to re-run the job, each time for a different label?
  2. Similarly, if I want to detect bias in for multiple facets (i.e 'Sex' and 'Age'), would I need to run the bias detection job for each facet_name?
1

There are 1 answers

0
Marc Karp On
  1. How would we use it to detect bias on multi-label dataset? (I tried label_values_or_threshold=[0,1] but it didn't work). Would we need to re-run the job, each time for a different label?

By "multi-label" do you mean "categorical" label or "multi-tags" label?

Clarify supports categorical label, for example, if label value is one of enums "Dog", "Cat", "Fish", then you can specify label_values_or_threshold=["Dog", "Cat"] and Clarify will split the dataset into advantaged group (samples with label value "Dog" or "Cat") and disadvantaged group (samples with label value "Fish").

Clarify doesn't support multi-tags label. By multi-tags I mean, for example, a dataset like below.

features are N sentences extracted from a web page label is N tags to describe the web page is about. Like, feature1, feature2, feature3, ..., label "pop", "beatles", "jazz", ..., "music, beatles" “iphone”, “android”, “browser”, ..., “computer, internet, design” “php”, “python”, “java”, ... , “programming ,java, web, internet” Similarly, if we wanted to detect bias in for multiple facets (i.e 'Sex' and 'Age'), would we need to run the bias detection job for each facet_name?

  1. Similarly, if I want to detect bias in for multiple facets (i.e 'Sex' and 'Age'), would I need to run the bias detection job for each facet_name?

Clarify supports multiple facets in a single run, although the configuration is not exposed by the SageMaker Python SDK API.

If you use Processing Job API and compose the analysis_config.json by yourself, you can append a list of facet objects to the facet configuration entry (see Configure the Analysis). Example,

...
"facet": [
    {
        "name_or_index" : "Sex",
        "value_or_threshold": [0]
    },
    {
        "name_or_index" : "Age",
        "value_or_threshold": [40]
    }
],
...

If you have to use SageMaker Python SDK API, then a workaround is appending additional facets to the analysis config (not recommended but currently there is no better way),


    bias_config = clarify.BiasConfig(label_values_or_threshold=[0],
        facet_name='Sex',
        facet_values_or_threshold=[0])
    bias_config.analysis_config['facet'].append({
        'name_or_index': 'Age',
        'value_or_threshold': [40],
    })