I'm trying to implement ML models with Amazon SageMaker Studio, the thing is that the model that I want to implement is from hugging face and It uses a Dataset from CONLL Corpora.
Following the instructions from the Hugging Face documentation, I have to read a csv file with this instruction: train = pd.read_csv. But the problem comes with the dataset file extension because it's a .train and .test extension. The error I'm getting is: "ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3"
Is there a way to convert .test files to csv files? Or how should I read these files extensions?
Links
Dataset: https://www.kaggle.com/nltkdata/conll-corpora
Model: https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner
The dataset in your link seem to be tab separated, not comma separated.
You can read it using the right delimiter, like
df = pd.read_csv("<filename>", sep="\t")