How do I call ExampleValidator to analyze split data sets?

Question

How do I call ExampleValidator to analyze split data sets?

188 views Asked by Lorin S. At 26 September 2020 at 16:45

Using:

Tensorflow version: 2.3.1
TFX version: 0.23.1
TFDV version: 0.24.0
TFMA version: 0.24.0

with an interactive context like so:

from tfx.orchestration.experimental.interactive.interactive_context import \
    InteractiveContext
context = InteractiveContext(
    pipeline_root=os.path.join(os.getcwd(), "pipeline")
)

I created an ExampleGen using:

output = example_gen_pb2.Output(
             split_config=example_gen_pb2.SplitConfig(splits=[
                 example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=7),
                 example_gen_pb2.SplitConfig.Split(name='test', hash_buckets=2),
                 example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=1)
             ]))

example_gen = CsvExampleGen(input_base=os.path.join(base_dir, data_dir), output_config=output)
context.run(example_gen)

and later in the code, I tried evaluating the data using an ExampleValidator but it seems the ExampleValidator doesn't resolve the proper paths to the split data sets.

Creation of the validator works as expected:

example_validator = ExampleValidator(
             statistics=statistics_gen.outputs['statistics'],
             schema=schema_gen.outputs['schema'])
context.run(example_validator)

No warning or errors were had, but attempting to show the results, error on the paths not being correct:

context.show(example_validator.outputs['anomalies'])

NotFoundError: /home/jovyan/pipeline/ExampleValidator/anomalies/16/anomalies.pbtxt; No such file or directory