Suppose I wanted to split my NER dataset that looks like this:
Data: "Jokowi is the president of Indonesia"
Label: ['B-Person', 'O', 'O', 'O', 'O', 'Country']
Is there any python library or algorithm that makes sure that each class distribution for the train and test dataset is the same? any suggestions would be appreciated
You can explore StratifiedShuffleSplit available in Scikit learn library.