ClearML webapp is slow

227 views Asked by At

We manage our own ClearML server, on an EC2 instance AWS cloud. Instance type: t3.xlarge (4 vCPUs, 16 GiB Memory). Data disk: gp3 (size: 200 GB, IOPS: 3,000, Throughput: 125).

We have 3 ClearML projects, one with 643,000 experiments, another with 151,000 and the small one with 25,000. Total experiments in all projects: 819,000

ClearMLwebapp is very slow. For example, it takes about 30 seconds just to load the main dashboard. Searching a specific experiment by ID is also very slow.

What can we do to improve the performance?

We tried to add more memory, and it improved the performance, but only a little. It is still to slow.

1

There are 1 answers

1
Martin.B On

Disclaimer: I'm a member of the ClearML team (formerly Trains)

I think your issue is simply caused by the number of serving processes in the server's apiserver component (probably 1 process at the moment).

Assuming you are using the docker-compose deployment of ClearML Server, in order to increase the number of processes add the CLEARML_USE_GUNICORN=1 environment variable to the apiserver service.

This would run the apiserver component with 8 processes by default. To specify a different number of processes, add the CLEARML_GUNICORN_WORKERS=12 environment variable (for 12 processes, for example).

Please note that this mode (and of course, more processes) required more CPU and RAM resources. I believe your current setup should be enough for 8 processes, but I would recommend to monitor the machine's CPU and RAM usage and upgrade as required.