Exporting a filtered subset of GCP Natural Language annotator

Question

Exporting a filtered subset of GCP Natural Language annotator

122 views Asked by deppen8 At 13 October 2020 at 18:17

I am building a training dataset on GCP's Natural Language AutoML Entity Extraction service. I have a fraction of my documents labeled and I want to export them to do some preliminary exploratory data analysis. I can add a filter to display "Labeled" docs, but if I try to export, it exports all my docs.

Is there any way to export only those that fit the filter criteria? Via Python API would be fine too.

Original Q&A

There are 1 answers

**slakov** · Answer 1 · 2020-11-02T16:48:33+00:00

Indeed, the Export Data link in the AutoML console will always export the complete dataset. There is no option to export selected items only, however, there is an option to Delete selected items. A workaround that I suggest in order to achieve your task is to delete the ‘unwanted’ items. Let me explain.

I suggest you perform the following steps.

Export the complete dataset (so you don't delete anything from your production dataset).
Create a New Dataset in your AutoML project, by importing the complete dataset from Step 1.
Filter the unlabeled documents.
Select all and delete (by doing this you delete the unlabeled data in your copy dataset).

This way, your new dataset will contain only the labeled documents and you can do Export Data and use the resulting set for your EDA.

Best regards!

TechQA.

Exporting a filtered subset of GCP Natural Language annotator

There are 1 answers

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in GOOGLE-CLOUD-AUTOML

Related Questions in GOOGLE-CLOUD-PYTHON

Popular Questions

Popular Tags

Trending Questions