Setup
I've created a HF Inference Endpoints for translation, in particular french->english. This is the setup
- Instance type: GPU · Nvidia A10G · 1x GPU · 24 GB
- Model: Helsinki-NLP/opus-mt-fr-en
- Container: default
I have 2k documents to translate and I'm using python package request, together with concurrent to fire multiple HTTP POST request to my endpoint. Each document should have 100-300 sentences and I wasn't able to translate them as they were, so I split each document in 6 section.
Checking the usage of the machine, I realise I barely use any resource in terms of CPU/GPU. Also, after a while, some requests start failing.
I'm quite sure this could be much faster and could process more text all together, but I can't understand how to do so. My data is in a pyspark dataframe: I first tried with an UDF (and it was a terrible idea), now I'm creating a list of documents and use that as input.
Here's the code I'm using
def translate_text(text):
payload = {"inputs": text}
headers = {"Authorization": f"Bearer {API_TOKEN}", "Content-Type": "application/json"}
try:
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
if response.status_code == 200 and response.text:
response_data = response.json()
translated_text = list(map(lambda x: x.get("translation_text"), response_data))
return translated_text
else:
print("Translation response is empty or not in JSON format.")
return None
except requests.exceptions.RequestException as e:
print("Request error:", e)
return None
except requests.exceptions.JSONDecodeError as e:
print("Failed to decode JSON response:", e)
return None
def translate_text_dict(text_dict: dict) -> dict:
output_dict = {}
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(
lambda args: (args[0], list(map(translate_text, args[1]))), text_dict.items()
)
output_dict = dict(results)
return output_dict
the input dictionary has, as key, an unique id and, as value, the list of sentences to translate. Could this be improved?