I am using the huggingface pipeline function on my local machine. I got a crash when I was connected to VPN, but it works when I turn off VPN. That leads me to wonder what information is being transmitted to huggingface? I know openAI may use any queries I send to them. Does huggingface upload my queries, or is it the model that is being downloaded to my machine when I run the function locally?
Do the terms of use for the Facebook/Huggingface models include any use of one's data by Facebook/Huggingface? I was particularly looking for clauses on data use/ownership, but I would like your opinion.
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
["This is a course about the Transformers library",
"This is a movie about the Transformers action figures"],
candidate_labels=["education", "politics", "business"],
)
The code you are executing isn't sending the strings to a server, it is downloading a model + tokenizer. After the first execution, the model and the tokenizer are cached locally and you can run the code without any network connection. You can verify it by disconnecting and running the code again.
Cache location:
Output:
You could also copy the cache to a machine without any network connection and run your code without an issue. I do not recommend copying the cache, but instead saving the model to a separate directory with save_pretrained of the pipeline and only copying this directory:
You can load it again by passing the path to the directory: