'Unknown Encoding' TikToken Error in exe that is compiled with Nuitka

444 views Asked by At

I'm working with the tiktoken library in Python to count the number of tokens in a series of messages. My code runs well when executed as a Python script. However, when I compile it into an executable, I encounter an 'unknown encoding' error. Attempts to debug the library or replace it with custom code have so far been unsuccessful.

In both scenario in code below failed to get the encoding model from tiktoken for unknown reason. I've tried to load the cl100k_based.tiktoken file from local PC rather than sending the link to openai in the openai_public.py in tiktoken_ext. For pyinstaller we only need to add this line --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext to solve the problem. I've used below command

python -m nuitka script.py --onefile --show-modules --include-package=tiktoken --include-package=tiktoken_ext --include-package=blobfile

But the problem still persist.

def num_tokens_from_messages(messages, model):
    """Returns the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
        print("getting encoding_for_mdel",encoding)
    except Exception as e:
        print("Warning: model not found. Using cl100k_base encoding.",e)
        encoding = tiktoken.get_encoding("cl100k_base")
        print("in exception",encoding)

Any pointers or insights would be greatly appreciated.

0

There are 0 answers