Mlflow not logging all data to Dagshub when using Github Actions

36 views Asked by At

I've set up a script to get data from my Dagshub repo, train some models on it, and use mlflow to log the training and evaluation data to the mlflow server associated with the Dagshub repo. This works locally, but when I try to run it through Github Actions it only logs some training parameters and none of the model-specific parameters or test metrics.

Looking through the logs, I notice an exception and a warning which I'm not sure how to fix. Any advice would be greatly appreciated.

Exception: The following failures occurred while performing one or more logging operations: [MlflowException('Failed to perform one or more operations on the run with ID f72e708e6f7b43c49e88769357547b54. Failed operations: [RestException("INVALID_PARAMETER_VALUE: Response: {\'error_code\': \'INVALID_PARAMETER_VALUE\'}")]')]
2024-03-16T11:05:16.7804154Z 2024/03/16 11:05:16 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: The following failures occurred while performing one or more logging operations: [MlflowException('Failed to perform one or more operations on the run with ID f72e708e6f7b43c49e88769357547b54. Failed operations: [RestException("INVALID_PARAMETER_VALUE: Response: {\'error_code\': \'INVALID_PARAMETER_VALUE\'}")]')]

2024-03-16T11:05:16.5074723Z 2024/03/16 11:05:16 WARNING mlflow.models.model: Logging model metadata to the tracking server has failed. The model artifacts have been logged successfully under mlflow-artifacts:/44ab2167890f4d81a6a74d258b2e05f0/f72e708e6f7b43c49e88769357547b54/artifacts. Set logging level to DEBUG via `logging.getLogger("mlflow").setLevel(logging.DEBUG)` to see the full traceback.
2024-03-16T11:05:16.5092754Z 2024/03/16 11:05:16 DEBUG mlflow.models.model: 
2024-03-16T11:05:16.5093642Z urllib3.exceptions.ResponseError: too many 500 error responses
2024-03-16T11:05:16.5094263Z 
2024-03-16T11:05:16.5094676Z The above exception was the direct cause of the following exception:
2024-03-16T11:05:16.5095344Z 
2024-03-16T11:05:16.5095554Z Traceback (most recent call last):
2024-03-16T11:05:16.5104112Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2024-03-16T11:05:16.5104940Z     resp = conn.urlopen(
2024-03-16T11:05:16.5105853Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5106931Z     return self.urlopen(
2024-03-16T11:05:16.5107877Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5108643Z     return self.urlopen(
2024-03-16T11:05:16.5109452Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5110215Z     return self.urlopen(
2024-03-16T11:05:16.5110524Z   [Previous line repeated 2 more times]
2024-03-16T11:05:16.5111375Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 938, in urlopen
2024-03-16T11:05:16.5112261Z     retries = retries.increment(method, url, response=response, _pool=self)
2024-03-16T11:05:16.5113237Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
2024-03-16T11:05:16.5114221Z     raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
2024-03-16T11:05:16.5116015Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
2024-03-16T11:05:16.5117136Z 
2024-03-16T11:05:16.5117395Z During handling of the above exception, another exception occurred:
2024-03-16T11:05:16.5117767Z 
2024-03-16T11:05:16.5117889Z Traceback (most recent call last):
2024-03-16T11:05:16.5118782Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 108, in http_request
2024-03-16T11:05:16.5119578Z     return _get_http_response_with_retries(
2024-03-16T11:05:16.5120873Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/request_utils.py", line 212, in _get_http_response_with_retries
2024-03-16T11:05:16.5121907Z     return session.request(method, url, allow_redirects=allow_redirects, **kwargs)
2024-03-16T11:05:16.5122998Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2024-03-16T11:05:16.5123746Z     resp = self.send(prep, **send_kwargs)
2024-03-16T11:05:16.5124557Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2024-03-16T11:05:16.5125264Z     r = adapter.send(request, **kwargs)
2024-03-16T11:05:16.5126058Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/adapters.py", line 510, in send
2024-03-16T11:05:16.5126785Z     raise RetryError(e, request=request)
2024-03-16T11:05:16.5128220Z requests.exceptions.RetryError: HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
2024-03-16T11:05:16.5129323Z 
2024-03-16T11:05:16.5129563Z During handling of the above exception, another exception occurred:
2024-03-16T11:05:16.5129939Z 
2024-03-16T11:05:16.5130056Z Traceback (most recent call last):
2024-03-16T11:05:16.5131031Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/models/model.py", line 625, in log
2024-03-16T11:05:16.5131858Z     mlflow.tracking.fluent._record_logged_model(mlflow_model, run_id)
2024-03-16T11:05:16.5132906Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1413, in _record_logged_model
2024-03-16T11:05:16.5133789Z     MlflowClient()._record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5134816Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1831, in _record_logged_model
2024-03-16T11:05:16.5135744Z     self._tracking_client._record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5136843Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 524, in _record_logged_model
2024-03-16T11:05:16.5137764Z     self.store.record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5138799Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 344, in record_logged_model
2024-03-16T11:05:16.5139668Z     self._call_endpoint(LogModel, req_body)
2024-03-16T11:05:16.5140608Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 60, in _call_endpoint
2024-03-16T11:05:16.5141619Z     return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
2024-03-16T11:05:16.5142714Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 219, in call_endpoint
2024-03-16T11:05:16.5143500Z     response = http_request(**call_kwargs)
2024-03-16T11:05:16.5144376Z   File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 130, in http_request
2024-03-16T11:05:16.5145312Z     raise MlflowException(f"API request to {url} failed with exception {e}")
2024-03-16T11:05:16.5147853Z mlflow.exceptions.MlflowException: API request to https://dagshub.com/***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model failed with exception HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
1

There are 1 answers

1
Tolstoyevsky On BEST ANSWER

It looks like this is an MLflow version compatibility issue.

DagsHub at time of writing has v2.7 , and in v2.10 a compatibility breaking feature called Model Signature Supports Objects and Arrays was released: https://github.com/mlflow/mlflow/releases/tag/v2.10.0

It seems likely that in your environment that works, you use an MLflow client < 2.10 and that using the same version in your Github action will solve the issue for now.

For fast support on DagsHub, I'd recommend joining the community Discord to get direct support from the team: https://discord.com/invite/9gU36Y6