how to use transformer in huggingface without tokenization?

Question

how to use transformer in huggingface without tokenization?

525 views Asked by Imrul Huda At 08 December 2024 at 18:19

I have the following code:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
pipeline = pipeline('ner', model=model, tokenizer=tokenizer)
sentence = "some example sentence here"
results = pipeline(sentence)

this works fine. But instead of a str, I wan't to pass a list of tokens. How do I do that?

The reason I want to do that is, my sentences are already tokenized and simple " ".join() does not reproduce the sentence correctly. For example, isn't has been tokenized into is and n't. But a simple " ".join() will produce is n't

Original Q&A

There are 1 answers

**joe32140** · Answer 1 · 2022-01-18T21:43:50+00:00

I assume the original data is tokenized by NLTK, so try NLTK detokenizer:

from nltk.tokenize.treebank import TreebankWordDetokenizer
toks = ['hello', ',', 'i', 'ca', "n't", 'feel', 'my', 'feet', '!', 'Help', '!', '!']
twd = TreebankWordDetokenizer()
twd.detokenize(toks)
# "hello, i can't feel my feet! Help!!"

TechQA.

how to use transformer in huggingface without tokenization?

There are 1 answers

Related Questions in NLP

Related Questions in TOKENIZE

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in HUGGINGFACE-TOKENIZERS

Related Questions in HUGGINGFACE-DATASETS

Popular Questions

Popular Tags

Trending Questions