I’m running a Google Cloud Run service that exposes an API endpoint performing heavy computational tasks (CPU-bound). The API will be called very occasionally and for now there will never be concurrent requests. I want the execution as fast as possible.
I’ve configured the service with the maximum available CPU and memory settings (8 vCPU, 32 GB memory), but the overall execution time of the API has not improved compared to smaller configurations. From the metrics in Cloud Monitoring, both CPU utilization and memory usage remain consistently low (around 10–20%), even though the process should be CPU-intensive. This makes me suspect the container isn’t getting full CPU utilization or there might be throttling or configuration limiting performance.
Or I am missing something completely?
Expected behavior:
- When given a larger CPU and memory configuration, the API process should execute faster and utilize more CPU resources proportionally (ideally 80–90% CPU utilization during heavy computation).
Current configuration:
- Platform: Cloud Run (fully managed) - Concurrency: 1 (to isolate computation per request)
- CPU allocation: 8 vCPUs - Memory allocation: 32 GB
- Execution timeout: 60 minutes
- Request duration: ~30–45 minutes
- CPU utilization: 10–20% Questions:
- Is Cloud Run limiting CPU usage during request processing even when configured with max vCPUs?
- Is there a better setup to ensure full CPU utilization for heavy computation workloads?
- Are there recommended configurations or patterns for CPU-bound workloads that need to complete large computations under the 60-minute limit?
Additional information:
- The code performs multi-threaded or parallelized computation in Python.
- No I/O bottleneck observed.
- Tested with different machine configurations — performance is mostly unchanged.
I'd me most grateful for pointers as to what I am doing wrong...