Modal Quickstart Guide for Ultralytics

This guide provides a comprehensive introduction to running Ultralytics YOLO26 on Modal, covering serverless GPU inference and model training.

Modal is a serverless cloud computing platform for AI and machine learning workloads. It handles provisioning, scaling, and execution automatically — you write Python code locally and Modal runs it in the cloud with GPU access. This makes it ideal for running deep learning models like YOLO26 without managing infrastructure.

What You Will Learn

Setting up Modal and authenticating
Running YOLO26 inference on Modal
Using GPUs for faster inference
Training YOLO26 models on Modal

Prerequisites

A Modal account (sign up for free at modal.com)
Python 3.9 or later installed on your local machine

Installation

Install the Modal Python package and authenticate:

pip install modal

modal token new

Authentication

The modal token new command will open a browser window to authenticate your Modal account. After authentication, you can run Modal commands from the terminal.

Running YOLO26 Inference

Create a new Python file called modal_yolo.py with the following code:

"""
Modal + Ultralytics YOLO26 Quickstart
Run: modal run modal_yolo.py.
"""

import modal

app = modal.App("ultralytics-yolo")

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")


@app.function(image=image)
def predict(image_url: str):
    """Run YOLO26 inference on an image URL."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)

    for r in results:
        print(f"Detected {len(r.boxes)} objects:")
        for box in r.boxes:
            print(f"  - {model.names[int(box.cls)]}: {float(box.conf):.2f}")


@app.local_entrypoint()
def main():
    """Test inference with sample image."""
    predict.remote("https://ultralytics.com/images/bus.jpg")

Run the inference:

modal run modal_yolo.py

Expected output:

✓ Initialized. View run at https://modal.com/apps/your-username/main/ap-xxxxxxxx
✓ Created objects.
├── 🔨 Created mount modal_yolo.py
└── 🔨 Created function predict.
Downloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n.pt to 'yolo26n.pt'...
Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
image 1/1 /root/bus.jpg: 640x480 4 persons, 1 bus, 377.8ms
Speed: 5.8ms preprocess, 377.8ms inference, 0.3ms postprocess per image at shape (1, 3, 640, 480)

Detected 5 objects:
  - bus: 0.92
  - person: 0.91
  - person: 0.91
  - person: 0.87
  - person: 0.53
✓ App completed.

You can monitor your function execution in the Modal dashboard:

Modal Dashboard Function Calls

Using GPU for Faster Inference

Add a GPU to your function by specifying the gpu parameter:

@app.function(image=image, gpu="T4")  # Options: "T4", "A10G", "A100", "H100"
def predict_gpu(image_url: str):
    """Run YOLO26 inference on GPU."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)
    print(results[0].boxes)

GPU	Memory	Best For
T4	16 GB	Inference, small model training
A10G	24 GB	Medium training jobs
A100	40 GB	Large-scale training
H100	80 GB	Maximum performance

For training, use a GPU and Modal Volumes for persistent storage. Create a new Python file called train_yolo.py:

import modal

app = modal.App("ultralytics-training")

volume = modal.Volume.from_name("yolo-training-vol", create_if_missing=True)

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")


@app.function(image=image, gpu="T4", timeout=3600, volumes={"/data": volume})
def train():
    """Train YOLO26 model on Modal."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    model.train(data="coco8.yaml", epochs=3, imgsz=640, project="/data/runs")


@app.local_entrypoint()
def main():
    train.remote()

Run training:

modal run train_yolo.py

Volume Persistence

Modal Volumes persist data between function runs. Trained weights are saved to /data/runs/detect/train/weights/.

Congratulations! You have successfully set up Ultralytics YOLO26 on Modal. For further learning:

Explore the Ultralytics YOLO26 documentation for advanced features
Learn about training custom models with your own datasets
Visit the Modal documentation for advanced platform features

FAQ

How do I choose the right GPU for my YOLO26 workload?

For inference, an NVIDIA T4 (16 GB) is typically sufficient and cost-effective. For training or larger models like YOLO26x, consider A10G or A100 GPUs.

Modal uses pay-per-second pricing. Approximate rates: CPU ~$0.05/hr, T4 ~$0.59/hr, A10G ~$1.10/hr, A100 ~$2.10/hr. Check Modal pricing for current rates.

Can I use my own custom-trained YOLO model?

Yes! Load custom models from a Modal Volume:

model = YOLO("/data/my_custom_model.pt")

For more information on training custom models, see the training guide.

📅 Created 1 day ago ✏️ Updated 1 day ago

Modal Quickstart Guide for Ultralytics

What is Modal?