If you’re using an AI coding agent like Claude Code, Cline, or Cursor, you can install the Flash skill package to give your agent detailed context about the Flash SDK:
npx skills add runpod/skills
This enables your coding agent to provide more accurate Flash code suggestions and troubleshooting help.
Create a file called gpu_demo.py and paste this code into it:
import asynciofrom runpod_flash import Endpoint, GpuGroup@Endpoint( name="flash-quickstart", gpu=GpuGroup.ANY, # Use any available GPU workers=3, idle_timeout=300, # Keep worker running for 5 minutes dependencies=["numpy", "torch"])def gpu_matrix_multiply(size): # IMPORTANT: Import packages INSIDE the function import numpy as np import torch # Get GPU name device_name = torch.cuda.get_device_name(0) # Create random matrices A = np.random.rand(size, size) B = np.random.rand(size, size) # Multiply matrices C = np.dot(A, B) return { "matrix_size": size, "result_mean": float(np.mean(C)), "gpu": device_name }# Call the functionasync def main(): print("Running matrix multiplication on Runpod GPU...") result = await gpu_matrix_multiply(1000) print(f"\n✓ Matrix size: {result['matrix_size']}x{result['matrix_size']}") print(f"✓ Result mean: {result['result_mean']:.4f}") print(f"✓ GPU used: {result['gpu']}")if __name__ == "__main__": asyncio.run(main())
Make sure you activate your virtual environment in the same directory where you created the gpu_demo.py file. If you open a new terminal, run source .venv/bin/activate before executing the script.
You’ll see Flash provision a GPU worker and execute your function:
Running matrix multiplication on Runpod GPU...Creating endpoint: flash-quickstartProvisioning Serverless endpoint...Endpoint readyExecuting function on RunPod endpoint ID: xvf32dan8rcilpInitial job status: IN_QUEUEJob completed, output received✓ Matrix size: 1000x1000✓ Result mean: 249.8286✓ GPU used: NVIDIA RTX A5000
The first run takes 30-60 seconds, while Runpod provisions the endpoint, installs dependencies, and starts a worker. Subsequent runs take 2-3 seconds (because the worker is already running).
If you’re having authorization issues, you can set your API key directly in your terminal:
export RUNPOD_API_KEY="your_key"
Replace your_key with your actual API key from the Runpod console.
With your endpoint running, make a change and run the script again:
Open gpu_demo.py and change the matrix size from 1000 to 2000:
result = await gpu_matrix_multiply(2000)
Run the script again:
python gpu_demo.py
This time, the result should appear in 1-3 seconds instead of 30-60 seconds, injects the code into the running worker so code changes take effect immediately without reprovisioning.This instant iteration is one of Flash’s key features. You can develop and test GPU code as quickly as local development, even though it runs on remote hardware.
gpu: Which GPU to use (GpuGroup.ANY accepts any available GPU for faster provisioning).
workers: Maximum parallel workers (allows 3 concurrent executions).
idle_timeout: Seconds a worker stays active after completing a request before scaling down. Setting this to 300 (5 minutes) gives you more time to iterate on your code while the worker remains warm.
dependencies: Python packages to install on the worker.
Function body: The matrix multiplication code runs on the remote GPU, not your local machine.
Return value: The result is returned to your local machine as a Python dictionary.
Flash makes it easy to run multiple GPU operations concurrently. Replace your main() function with the code below:
async def main(): print("Running 3 matrix operations in parallel...") # Run all three operations at once results = await asyncio.gather( gpu_matrix_multiply(500), gpu_matrix_multiply(1000), gpu_matrix_multiply(2000) ) # Print results for i, result in enumerate(results, 1): print(f"\n{i}. Size: {result['matrix_size']}x{result['matrix_size']}") print(f" Mean: {result['result_mean']:.4f}") print(f" GPU: {result['gpu']}")
# List all endpointsflash undeploy list# Remove the quickstart endpointflash undeploy flash-quickstart# Or remove all endpointsflash undeploy --all# If using uv:uv run flash undeploy listuv run flash undeploy flash-quickstartuv run flash undeploy --all