Alias
You can usesls as a shorthand for serverless:
Subcommands
List endpoints
List all your Serverless endpoints:List flags
Include template information in the output.
Include workers information in the output.
Get endpoint details
Get detailed information about a specific endpoint:Get flags
Include template information in the output.
Include workers information in the output.
Create an endpoint
Create a new Serverless endpoint from a template or from a Hub repo:--hub-id, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with --gpu-id.
Serverless templates vs Pod templates: Serverless endpoints require a Serverless-specific template. Pod templates (like
runpod-torch-v21) cannot be used because they include configuration, which Serverless does not support. When creating a template with runpodctl template create, use the --serverless flag to create a Serverless template.Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each.Create flags
Name for the endpoint.
Template ID to use (required if
--hub-id is not specified). Use runpodctl template search to find templates.Hub listing ID to deploy from (alternative to
--template-id). Use runpodctl hub search to find repos.GPU type for workers. Use
runpodctl gpu list to see available GPUs.Number of GPUs per worker.
Compute type (
GPU or CPU).Minimum number of workers.
Maximum number of workers.
Comma-separated list of preferred datacenter IDs. Use
runpodctl datacenter list to see available datacenters.Network volume ID to attach. Use
runpodctl network-volume list to see available network volumes.Comma-separated list of network volume IDs to attach. Use this when attaching multiple network volumes to an endpoint.
Minimum CUDA version required for workers (e.g.,
12.4). Workers will only be scheduled on machines that meet this CUDA version requirement.Autoscaler type (
QUEUE_DELAY or REQUEST_COUNT). QUEUE_DELAY scales based on queue wait time; REQUEST_COUNT scales based on concurrent requests.Scaler threshold value. For
QUEUE_DELAY, this is the target delay in seconds. For REQUEST_COUNT, this is the number of concurrent requests per worker before scaling.Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 5-3600 seconds.
Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images.
Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally.
Update an endpoint
Update endpoint configuration:Update flags
New name for the endpoint.
New minimum number of workers.
New maximum number of workers.
New idle timeout in seconds.
Scaler type (
QUEUE_DELAY or REQUEST_COUNT).Scaler value.
Enable or disable flash boot for faster worker startup.
Execution timeout in seconds. Jobs that exceed this duration are terminated.
Delete an endpoint
Delete an endpoint:Serverless URLs
Access your Serverless endpoint using these URL patterns:| Operation | URL |
|---|---|
| Async request | https://api.runpod.ai/v2/<endpoint-id>/run |
| Sync request | https://api.runpod.ai/v2/<endpoint-id>/runsync |
| Health check | https://api.runpod.ai/v2/<endpoint-id>/health |
| Job status | https://api.runpod.ai/v2/<endpoint-id>/status/<job-id> |