Latency metric failing when evaluating a model with multiple output columns

71 views Asked by At

I am building a RAG system on Azure Databricks and having trouble evaluating the pyfunc models we are saving to MLflow. The predict method of the model class outputs a pandas dataframe with three columns: answers , sources and prompts for auditability:

return pd.DataFrame({'answers': answers, 'sources': sources, 'prompts': prompts})

However, I am having some issues with using mlflow.evaluate() on these model versions.

Issue: this model will be used as a chatbot so latency and response size are key metrics to evaluate. As such, we specify latency and token_count as extra metrics. This results in the following error:

ValueError: cannot reindex on an axis with duplicate labels

evaluation code:

evaluation_results = mlflow.evaluate(
    model=f'models:/{model_name}/{model_version}',
    data=data,
    predictions="answers",
    extra_metrics=[
        mlflow.metrics.latency(),
        mlflow.metrics.token_count()
    ]
)

I am using mlflow==2.8.0. The key goal I would like is to be able to see in the mlflow evaluation UI a comparison of answers, sources, prompts, latency and token count for different experiment runs.

0

There are 0 answers