How to train a model in SageMaker Studio with .train and .test extension dataset files?

Question

How to train a model in SageMaker Studio with .train and .test extension dataset files?

116 views Asked by Jairo At 10 May 2022 at 17:37

I'm trying to implement ML models with Amazon SageMaker Studio, the thing is that the model that I want to implement is from hugging face and It uses a Dataset from CONLL Corpora.

Following the instructions from the Hugging Face documentation, I have to read a csv file with this instruction: train = pd.read_csv. But the problem comes with the dataset file extension because it's a .train and .test extension. The error I'm getting is: "ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3"

Is there a way to convert .test files to csv files? Or how should I read these files extensions?

Links

Dataset: https://www.kaggle.com/nltkdata/conll-corpora

Model: https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner

Original Q&A

There are 1 answers

**durga_sury** · Answer 1 · 2022-05-17T23:16:02+00:00

durga_sury On 17 May 2022 at 23:16

The dataset in your link seem to be tab separated, not comma separated.

You can read it using the right delimiter, like df = pd.read_csv("<filename>", sep="\t")

TechQA.

How to train a model in SageMaker Studio with .train and .test extension dataset files?

There are 1 answers

Related Questions in PYTHON

Related Questions in NLP

Related Questions in AMAZON-SAGEMAKER

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in CONLL

Popular Questions

Popular Tags

Trending Questions