NameError: name 'tokenize_and_split_data' is not defined in Python code

Question

NameError: name 'tokenize_and_split_data' is not defined in Python code

436 views Asked by Gha At 25 October 2023 at 10:15

I want to divide the data into train_dataset and test_dataset variables. The function tokenize_and_split_data did not work and utilities library did not define. I am working on Python google colab.

import datasets
import tempfile
import logging
import random
import config
import os
import yaml
import time
import torch
import transformers
import pandas as pd
import jsonlines

#from utilities import *
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import TrainingArguments
from transformers import AutoModelForCausalLM

logger = logging.getLogger(__name__)
global_config = None

model_name = "EleutherAI/pythia-70m"

training_config = {
    "model": {
        "pretrained_name": model_name,
        "max_length" : 2048
    },
    "datasets": {
        "use_hf": use_hf,
        "path": dataset_path
    },
    "verbose": True
}

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
train_dataset, test_dataset = tokenize_and_split_data(training_config, tokenizer)

print(train_dataset)
print(test_dataset)

Above, is the code, I cannot install utilities library, and this function tokenize_and_split_data did not defined. Can you help me please.

Original Q&A

There are 2 answers

**mike jay** · Answer 1 · 2023-10-26T05:50:29+00:00

mike jay On 26 October 2023 at 05:50

Download "utilities.py" from here and paste it in your python folder which named "...\Lib\site-packages", you can find this path by 'cmd' command "python -v".

**Kim Noël** · Answer 2 · 2023-11-28T19:06:29+00:00

If you are running the collab from Lamini on finetuning, there is a python file utilities.py that contains this method. Just recreate this file or copy paste all the methods in a cell.

$ ls
05_Training_lab_student.ipynb  lamini_docs.jsonl    utilities.py
__pycache__                    lamini_docs_3_steps

$ cat utilities.py

TechQA.

NameError: name 'tokenize_and_split_data' is not defined in Python code

There are 2 answers

Related Questions in PYTHON

Related Questions in GOOGLE-COLABORATORY

Related Questions in TRAINING-DATA

Related Questions in HUGGINGFACE

Related Questions in HUGGINGFACE-TOKENIZERS

Popular Questions

Popular Tags

Trending Questions