Does the huggingface pipeline function upload my data to their cloud?

206 views Asked by At

I am using the huggingface pipeline function on my local machine. I got a crash when I was connected to VPN, but it works when I turn off VPN. That leads me to wonder what information is being transmitted to huggingface? I know openAI may use any queries I send to them. Does huggingface upload my queries, or is it the model that is being downloaded to my machine when I run the function locally?

Do the terms of use for the Facebook/Huggingface models include any use of one's data by Facebook/Huggingface? I was particularly looking for clauses on data use/ownership, but I would like your opinion.

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    ["This is a course about the Transformers library",
    "This is a movie about the Transformers action figures"],
    candidate_labels=["education", "politics", "business"],
)
1

There are 1 answers

0
cronoik On

The code you are executing isn't sending the strings to a server, it is downloading a model + tokenizer. After the first execution, the model and the tokenizer are cached locally and you can run the code without any network connection. You can verify it by disconnecting and running the code again.

Cache location:

from transformers import TRANSFORMERS_CACHE

print(TRANSFORMERS_CACHE)

Output:

/home/YOU/.cache/huggingface/hub

You could also copy the cache to a machine without any network connection and run your code without an issue. I do not recommend copying the cache, but instead saving the model to a separate directory with save_pretrained of the pipeline and only copying this directory:

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier.save_pretrained("./bla")

You can load it again by passing the path to the directory:

classifier = pipeline("zero-shot-classification", "./bla")