Skip to content

Modal Quickstart Guide for Ultralytics

This guide provides a comprehensive introduction to running Ultralytics YOLO26 on Modal, covering serverless GPU inference and model training.

What is Modal?

Modal is a serverless cloud computing platform for AI and machine learning workloads. It handles provisioning, scaling, and execution automatically — you write Python code locally and Modal runs it in the cloud with GPU access. This makes it ideal for running deep learning models like YOLO26 without managing infrastructure.

What You Will Learn

  • Setting up Modal and authenticating
  • Running YOLO26 inference on Modal
  • Using GPUs for faster inference
  • Training YOLO26 models on Modal

Prerequisites

  • A Modal account (sign up for free at modal.com)
  • Python 3.9 or later installed on your local machine

Installation

Install the Modal Python package and authenticate:

pip install modal
modal token new

Authentication

The modal token new command will open a browser window to authenticate your Modal account. After authentication, you can run Modal commands from the terminal.

Running YOLO26 Inference

Create a new Python file called modal_yolo.py with the following code:

"""
Modal + Ultralytics YOLO26 Quickstart
Run: modal run modal_yolo.py.
"""

import modal

app = modal.App("ultralytics-yolo")

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")


@app.function(image=image)
def predict(image_url: str):
    """Run YOLO26 inference on an image URL."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)

    for r in results:
        print(f"Detected {len(r.boxes)} objects:")
        for box in r.boxes:
            print(f"  - {model.names[int(box.cls)]}: {float(box.conf):.2f}")


@app.local_entrypoint()
def main():
    """Test inference with sample image."""
    predict.remote("https://ultralytics.com/images/bus.jpg")

Run the inference:

modal run modal_yolo.py

Expected output:

✓ Initialized. View run at https://modal.com/apps/your-username/main/ap-xxxxxxxx
✓ Created objects.
├── 🔨 Created mount modal_yolo.py
└── 🔨 Created function predict.
Downloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n.pt to 'yolo26n.pt'...
Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
image 1/1 /root/bus.jpg: 640x480 4 persons, 1 bus, 377.8ms
Speed: 5.8ms preprocess, 377.8ms inference, 0.3ms postprocess per image at shape (1, 3, 640, 480)

Detected 5 objects:
  - bus: 0.92
  - person: 0.91
  - person: 0.91
  - person: 0.87
  - person: 0.53
✓ App completed.

You can monitor your function execution in the Modal dashboard:

Modal Dashboard Function Calls

Using GPU for Faster Inference

Add a GPU to your function by specifying the gpu parameter:

@app.function(image=image, gpu="T4")  # Options: "T4", "A10G", "A100", "H100"
def predict_gpu(image_url: str):
    """Run YOLO26 inference on GPU."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)
    print(results[0].boxes)
GPUMemoryBest For
T416 GBInference, small model training
A10G24 GBMedium training jobs
A10040 GBLarge-scale training
H10080 GBMaximum performance

Training YOLO26 on Modal

For training, use a GPU and Modal Volumes for persistent storage. Create a new Python file called train_yolo.py:

import modal

app = modal.App("ultralytics-training")

volume = modal.Volume.from_name("yolo-training-vol", create_if_missing=True)

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")


@app.function(image=image, gpu="T4", timeout=3600, volumes={"/data": volume})
def train():
    """Train YOLO26 model on Modal."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    model.train(data="coco8.yaml", epochs=3, imgsz=640, project="/data/runs")


@app.local_entrypoint()
def main():
    train.remote()

Run training:

modal run train_yolo.py

Volume Persistence

Modal Volumes persist data between function runs. Trained weights are saved to /data/runs/detect/train/weights/.

Congratulations! You have successfully set up Ultralytics YOLO26 on Modal. For further learning:

FAQ

How do I choose the right GPU for my YOLO26 workload?

For inference, an NVIDIA T4 (16 GB) is typically sufficient and cost-effective. For training or larger models like YOLO26x, consider A10G or A100 GPUs.

How much does it cost to run YOLO26 on Modal?

Modal uses pay-per-second pricing. Approximate rates: CPU ~$0.05/hr, T4 ~$0.59/hr, A10G ~$1.10/hr, A100 ~$2.10/hr. Check Modal pricing for current rates.

Can I use my own custom-trained YOLO model?

Yes! Load custom models from a Modal Volume:

model = YOLO("/data/my_custom_model.pt")

For more information on training custom models, see the training guide.



📅 Created 1 day ago ✏️ Updated 1 day ago
raimbekovm

Comments