How do I call ExampleValidator to analyze split data sets?

187 views Asked by At

Using:

Tensorflow version: 2.3.1
TFX version: 0.23.1
TFDV version: 0.24.0
TFMA version: 0.24.0

with an interactive context like so:

from tfx.orchestration.experimental.interactive.interactive_context import \
    InteractiveContext
context = InteractiveContext(
    pipeline_root=os.path.join(os.getcwd(), "pipeline")
)

I created an ExampleGen using:

output = example_gen_pb2.Output(
             split_config=example_gen_pb2.SplitConfig(splits=[
                 example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=7),
                 example_gen_pb2.SplitConfig.Split(name='test', hash_buckets=2),
                 example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=1)
             ]))

example_gen = CsvExampleGen(input_base=os.path.join(base_dir, data_dir), output_config=output)
context.run(example_gen)

and later in the code, I tried evaluating the data using an ExampleValidator but it seems the ExampleValidator doesn't resolve the proper paths to the split data sets.

Creation of the validator works as expected:

example_validator = ExampleValidator(
             statistics=statistics_gen.outputs['statistics'],
             schema=schema_gen.outputs['schema'])
context.run(example_validator)

No warning or errors were had, but attempting to show the results, error on the paths not being correct:

context.show(example_validator.outputs['anomalies'])

NotFoundError: /home/jovyan/pipeline/ExampleValidator/anomalies/16/anomalies.pbtxt; No such file or directory

The actual directory structure was like so:

.
└── anomalies
    └── 16
        ├── eval
        │   └── anomalies.pbtxt
        ├── test
        │   └── anomalies.pbtxt
        └── train
            └── anomalies.pbtxt

5 directories, 3 files

but the code seemed to expect:

└── anomalies
    └── 16
        └── anomalies.pbtxt

How do I call ExampleValidator to analyze split data sets?

1

There are 1 answers

0
AudioBubble On BEST ANSWER

Thanks @Lorin S., for sharing the solution reference. For the benefit of community I am providing solution here (answer section) given by 1025KB in github.

Added split in TFX 0.23 version, but Colab is not updated in 0.23. Colab is fixed in 0.24 here

Issue was resolved by upgrading tfx to 0.24