Skip to main content
Manage Serverless endpoints, including creating, listing, updating, and deleting endpoints.
runpodctl serverless <subcommand> [flags]

Alias

You can use sls as a shorthand for serverless:
runpodctl sls list

Subcommands

List endpoints

List all your Serverless endpoints:
runpodctl serverless list

List flags

--include-template
bool
Include template information in the output.
--include-workers
bool
Include workers information in the output.

Get endpoint details

Get detailed information about a specific endpoint:
runpodctl serverless get <endpoint-id>

Get flags

--include-template
bool
Include template information in the output.
--include-workers
bool
Include workers information in the output.

Create an endpoint

Create a new Serverless endpoint from a template or from a Hub repo:
# Create from a template
runpodctl serverless create --name "my-endpoint" --template-id "tpl_abc123"

# Create from a Hub repo
runpodctl hub search vllm                                         # Find the hub ID
runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm"
When using --hub-id, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with --gpu-id.
Serverless templates vs Pod templates: Serverless endpoints require a Serverless-specific template. Pod templates (like runpod-torch-v21) cannot be used because they include configuration, which Serverless does not support. When creating a template with runpodctl template create, use the --serverless flag to create a Serverless template.Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each.

Create flags

--name
string
Name for the endpoint.
--template-id
string
Template ID to use (required if --hub-id is not specified). Use runpodctl template search to find templates.
--hub-id
string
Hub listing ID to deploy from (alternative to --template-id). Use runpodctl hub search to find repos.
--gpu-id
string
GPU type for workers. Use runpodctl gpu list to see available GPUs.
--gpu-count
int
default:"1"
Number of GPUs per worker.
--compute-type
string
default:"GPU"
Compute type (GPU or CPU).
--workers-min
int
default:"0"
Minimum number of workers.
--workers-max
int
default:"3"
Maximum number of workers.
--data-center-ids
string
Comma-separated list of preferred datacenter IDs. Use runpodctl datacenter list to see available datacenters.
--network-volume-id
string
Network volume ID to attach. Use runpodctl network-volume list to see available network volumes.
--network-volume-ids
string
Comma-separated list of network volume IDs to attach. Use this when attaching multiple network volumes to an endpoint.
--min-cuda-version
string
Minimum CUDA version required for workers (e.g., 12.4). Workers will only be scheduled on machines that meet this CUDA version requirement.
--scaler-type
string
default:"QUEUE_DELAY"
Autoscaler type (QUEUE_DELAY or REQUEST_COUNT). QUEUE_DELAY scales based on queue wait time; REQUEST_COUNT scales based on concurrent requests.
--scaler-value
int
Scaler threshold value. For QUEUE_DELAY, this is the target delay in seconds. For REQUEST_COUNT, this is the number of concurrent requests per worker before scaling.
--idle-timeout
int
Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 5-3600 seconds.
--flash-boot
bool
Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images.
--execution-timeout
int
Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally.

Update an endpoint

Update endpoint configuration:
runpodctl serverless update <endpoint-id> --workers-max 5

Update flags

--name
string
New name for the endpoint.
--workers-min
int
New minimum number of workers.
--workers-max
int
New maximum number of workers.
--idle-timeout
int
New idle timeout in seconds.
--scaler-type
string
Scaler type (QUEUE_DELAY or REQUEST_COUNT).
--scaler-value
int
Scaler value.
--flash-boot
bool
Enable or disable flash boot for faster worker startup.
--execution-timeout
int
Execution timeout in seconds. Jobs that exceed this duration are terminated.

Delete an endpoint

Delete an endpoint:
runpodctl serverless delete <endpoint-id>

Serverless URLs

Access your Serverless endpoint using these URL patterns:
OperationURL
Async requesthttps://api.runpod.ai/v2/<endpoint-id>/run
Sync requesthttps://api.runpod.ai/v2/<endpoint-id>/runsync
Health checkhttps://api.runpod.ai/v2/<endpoint-id>/health
Job statushttps://api.runpod.ai/v2/<endpoint-id>/status/<job-id>