I am trying to use Cloud Run to run a microservice connected to Firestore. The microservice creates objects based on s2geometry to create multiple geographical zones with specific attributes and thus help localizing users to send them information according to the zone I locate them in.
I used Python 3.7 and FastAPI to create the microservice and the routes to communicate with it.
The microservice runs smoothly on my local machine and on Compute Engines as most of my routes takes less than 150 ms to answer when I test them. However I have a latency issue when I deploy it with Cloud Run. From time to time the microservice takes a really long time to answer (up to 15 mins) and I can't pin point what exactly causes it.
Here is a screen shot where we can see the Request Count and the Request Latency :
Request Count and Request Latency
There are no real correlations between the requests latency and the number of requests or at least no trivial ones. I also looked at the memory usage of the service and the memory usage is at 30% at most. The CPU usage however some times hit 100% but not necessarily when requests are slow.
Finally when I explored the Trace List and compared requests that have high latency I noticed the following difference
Trace of slow request
Trace of fast request
Fast requests seem to call themselves whereas slow requests don't and I do not know why.
For now we do not really have a lot of users so I thought that it could be a cold start issue but slow requests are not necessarily the first ones.
Now, to be honest I don't know what's going on here and what Cloud Run does (or what I did wrong) and I also find it pretty difficult to find a thorough explanation on how Cloud Run actually works so if you have one (other than the google one) I would gladly dive into it.
Thank your very much for you help
After several tests it seems that it was a cold start issue. Cloud Run containers are stoped after a certain period if they are not being used and as we did not have a lot of traffic the container had to reboot every time a user wanted to access the app.
Solution :
I created a Cloud Function that sends a request to the container when triggered and then created a Cloud Scheduler job that runs the function every minute.
Note :
If different revisions are routed to your service you need to create a Cloud Scheduler job for each of the revision. To do so you have to create a Revision URL (tag) for each of the routed revision (currently beta).
Edit :
Now as @Jofre mentioned, you can chose to always have an instance of your service running by setting the "Minimum number of instances" to 1. If you are using the console GCP even tells you to "Set to 1 to reduce cold starts".