serverless

Manage Serverless endpoints, including creating, listing, updating, and deleting endpoints.

runpodctl serverless <subcommand> [flags]

Alias

You can use sls as a shorthand for serverless:

runpodctl sls list

Subcommands

List endpoints

List all your Serverless endpoints:

runpodctl serverless list

List flags

--include-template

bool

Include template information in the output.

--include-workers

bool

Include workers information in the output.

Get endpoint details

Get detailed information about a specific endpoint:

runpodctl serverless get <endpoint-id>

Get flags

--include-template

bool

Include template information in the output.

--include-workers

bool

Include workers information in the output.

Create an endpoint

Create a new Serverless endpoint from a template or from a Hub repo:

# Create from a template
runpodctl serverless create --name "my-endpoint" --template-id "tpl_abc123"

# Create from a Hub repo
runpodctl hub search vllm                                         # Find the hub ID
runpodctl serverless create --hub-id cm8h09d9n000008jvh2rqdsmb --name "my-vllm"

When using --hub-id, GPU IDs and container disk size are automatically pulled from the Hub release config. You can override the GPU type with --gpu-id.

Serverless templates vs Pod templates: Serverless endpoints require a Serverless-specific template. Pod templates (like runpod-torch-v21) cannot be used because they include configuration, which Serverless does not support. When creating a template with runpodctl template create, use the --serverless flag to create a Serverless template.Each Serverless template can only be bound to one endpoint at a time. To create multiple endpoints with the same configuration, create separate templates for each.

Create flags

--name

string

Name for the endpoint.

--template-id

string

Template ID to use (required if --hub-id is not specified). Use runpodctl template search to find templates.

--hub-id

string

Hub listing ID to deploy from (alternative to --template-id). Use runpodctl hub search to find repos.

--gpu-id

string

GPU type for workers. Use runpodctl gpu list to see available GPUs.

--gpu-count

int

default:"1"

Number of GPUs per worker.

--compute-type

string

default:"GPU"

Compute type (GPU or CPU).

--workers-min

int

default:"0"

Minimum number of workers.

--workers-max

int

default:"3"

Maximum number of workers.

--data-center-ids

string

Comma-separated list of preferred datacenter IDs. Use runpodctl datacenter list to see available datacenters.

--network-volume-id

string

Network volume ID to attach. Use runpodctl network-volume list to see available network volumes.

--network-volume-ids

string

Comma-separated list of network volume IDs to attach. Use this when attaching multiple network volumes to an endpoint.

--min-cuda-version

string

Minimum CUDA version required for workers (e.g., 12.4). Workers will only be scheduled on machines that meet this CUDA version requirement.

--scaler-type

string

default:"QUEUE_DELAY"

Autoscaler type (QUEUE_DELAY or REQUEST_COUNT). QUEUE_DELAY scales based on queue wait time; REQUEST_COUNT scales based on concurrent requests.

--scaler-value

int

Scaler threshold value. For QUEUE_DELAY, this is the target delay in seconds. For REQUEST_COUNT, this is the number of concurrent requests per worker before scaling.

--idle-timeout

int

Idle timeout in seconds. Workers shut down after being idle for this duration. Valid range: 5-3600 seconds.

--flash-boot

bool

Enable or disable flash boot for faster worker startup. When enabled, workers start from cached container images.

--execution-timeout

int

Execution timeout in seconds. Jobs that exceed this duration are terminated. The CLI accepts seconds but converts to milliseconds internally.

Update an endpoint

Update endpoint configuration:

runpodctl serverless update <endpoint-id> --workers-max 5

Update flags

--name

string

New name for the endpoint.

--workers-min

int

New minimum number of workers.

--workers-max

int

New maximum number of workers.

--idle-timeout

int

New idle timeout in seconds.

--scaler-type

string

Scaler type (QUEUE_DELAY or REQUEST_COUNT).

--scaler-value

int

Scaler value.

--flash-boot

bool

Enable or disable flash boot for faster worker startup.

--execution-timeout

int

Execution timeout in seconds. Jobs that exceed this duration are terminated.

Delete an endpoint

Delete an endpoint:

runpodctl serverless delete <endpoint-id>

Serverless URLs

Access your Serverless endpoint using these URL patterns:

Operation	URL
Async request	`https://api.runpod.ai/v2/<endpoint-id>/run`
Sync request	`https://api.runpod.ai/v2/<endpoint-id>/runsync`
Health check	`https://api.runpod.ai/v2/<endpoint-id>/health`
Job status	`https://api.runpod.ai/v2/<endpoint-id>/status/<job-id>`

Flash CLI

Runpod CLI

Alias

Subcommands

List endpoints

List flags

Get endpoint details

Get flags

Create an endpoint

Create flags

Update an endpoint

Update flags

Delete an endpoint

Serverless URLs

Flash CLI

Runpod CLI

​Alias

​Subcommands

​List endpoints

​List flags

​Get endpoint details

​Get flags

​Create an endpoint

​Create flags

​Update an endpoint

​Update flags

​Delete an endpoint

​Serverless URLs

​Related commands

Alias

Subcommands

List endpoints

List flags

Get endpoint details

Get flags

Create an endpoint

Create flags

Update an endpoint

Update flags

Delete an endpoint

Serverless URLs

Related commands