Is there a way to populate column descriptions specific to data set?

242 views Asked by At

Data set 1 and dataset 2 having same column names but different descriptions. In dataset 1 transformation, I would say I am working on data set 1 so it has to give preference to that data set 1 specific descriptions. If I am doing transformation for another data set, I want to give preference to that data set. Is there a way to populate column descriptions which are data set specific?

For example, the arguments in my_compute_function is there a way to pass the dataset name which has to be given priority Column1, Column Description for dataset 1, {Dataset 1 name}. Column1, Column Description for dataset 2, {Dataset 2 name}, ...

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("/my/output"),
    my_input=Input("/my/input"),
)

def my_compute_function(my_input, my_output):
    my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions={
            "col_1": "col 1 description"
        },
         ???  
    )
1

There are 1 answers

0
vanhooser On BEST ANSWER

One way to do this is to provide a 'override dictionary' for all your datasets, where dataset-specific descriptions could take precedence.

i.e. you have :

from transforms.api import transform, Input, Output

GENERAL_DESCRIPTIONS = {
  "col_1": "my general description"
}

LOCAL_DESCRIPTIONS = {
  "/path/to/my/dataset": {
    "col_1": "my override description"
  }
}

@transform(
  my_output=Output("/path/to/my/dataset"),
  my_input=Input("/path/to/input"),
)
def my_compute_function(my_output, my_input):
  local_updates = LOCAL_DESCRIPTIONS.get(my_output.path, {})
  local_descriptions = GENERAL_DESCRIPTIONS.copy()
  local_descriptions.update(local_updates)
  my_output.write_dataframe(
    my_input.dataframe(),
    column_descriptions=local_descriptions
  )

This would then allow you to put GENERAL_DESCRIPTIONS at the root of your module and override in each transformation .py file at the top with your 'local' descriptions. You could even put the 'local' descriptions above a group of transformations so you don't have to inspect each and every file to specify overrides.

The most granular way to update the description dictionary will be to simply:

...
GENERAL_DESCRIPTIONS = {
  "col_1": "my general description"
}

LOCAL_DESCRIPTIONS = {
  "col_1": "my override description"
}

...
def my_compute_function(my_output, my_input):
  local_descriptions = GENERAL_DESCRIPTIONS.copy()
  local_descriptions.update(LOCAL_DESCRIPTIONS)
  ...