I'm currently facing a challenge with querying a model deployed on Databricks using MLflow. The model which is implemented as a Python function reads a JSON file, performs operations leveraging libraries like Spacy and NLTK, and returns an output.
Here's a simplified version of the deployment code:
def query_process_helper(input):
with open('my_file.json', 'r') as j:
data = json.loads(j.read())
# Perform some processing and store the result in output
return output
def query_processing(model_input):
return query_process_helper(model_input)
mlflow_artifact_path = 'my_files' # I've ensured this location has my_file.json
with mlflow.start_run():
mlflow.pyfunc.log_model(
artifact_path=mlflow_artifact_path,
python_model=query_processing,
input_example=pd.DataFrame({'text': [sample_input]}),
pip_requirements=['spacy==3.7.2','nltk==3.8.1'],
artifacts={
"my_file.json": mlflow_artifact_path
}
)
Error Message: [Errno 2] No such file or directory: 'my_files/my_file.json'
.
The model registration and the endpoint creation seem to be successful; no errors are reported in the build and service logs on Databricks.
I've also looked at this post but the accepted solution provided did not resolve my issue. I'm aiming to receive a direct response from the endpoint without explicitly loading the model again.
Any insights, suggestions, or alternative approaches to this would be greatly appreciated.