I am building a training dataset on GCP's Natural Language AutoML Entity Extraction service. I have a fraction of my documents labeled and I want to export them to do some preliminary exploratory data analysis. I can add a filter to display "Labeled" docs, but if I try to export, it exports all my docs.
Is there any way to export only those that fit the filter criteria? Via Python API would be fine too.
Indeed, the Export Data link in the AutoML console will always export the complete dataset. There is no option to export selected items only, however, there is an option to Delete selected items. A workaround that I suggest in order to achieve your task is to delete the ‘unwanted’ items. Let me explain.
I suggest you perform the following steps.
This way, your new dataset will contain only the labeled documents and you can do Export Data and use the resulting set for your EDA.
Best regards!