Modal Quickstart Guide for Ultralytics

Q: How much does it cost to run YOLO26 on Modal?

Modal uses pay-per-second pricing. Approximate rates: CPU $0.05/hr, T4 $0.59/hr, A10G $1.10/hr, A100 $2.10/hr. Check Modal pricing for current rates.

This guide provides a comprehensive introduction to running Ultralytics YOLO26 on Modal, covering serverless GPU inference and model training.

Modal is a serverless cloud computing platform for AI and machine learning workloads. It handles provisioning, scaling, and execution automatically — you write Python code locally and Modal runs it in the cloud with GPU access. This makes it ideal for running deep learning models like YOLO26 without managing infrastructure.

What You Will Learn#

Setting up Modal and authenticating
Running YOLO26 inference on Modal
Using GPUs for faster inference
Training YOLO26 models on Modal

Prerequisites#

A Modal account (sign up for free at modal.com)
Python 3.9 or later installed on your local machine

Installation#

Install the Modal Python package:

pip install modal

Then authenticate the CLI with your Modal account:

modal token new

Authentication

The modal token new command will open a browser window to authenticate your Modal account. After authentication, you can run Modal commands from the terminal.

Running YOLO26 Inference#

Create a new Python file called modal_yolo.py to run inference with the following code:

"""
Modal + Ultralytics YOLO26 Quickstart
Run: modal run modal_yolo.py.
"""

import modal

app = modal.App("ultralytics-yolo")

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")

@app.function(image=image)
def predict(image_url: str):
    """Run YOLO26 inference on an image URL."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)

    for r in results:
        print(f"Detected {len(r.boxes)} objects:")
        for box in r.boxes:
            print(f"  - {model.names[int(box.cls)]}: {float(box.conf):.2f}")

@app.local_entrypoint()
def main():
    """Test inference with sample image."""
    predict.remote("https://ultralytics.com/images/bus.jpg")

Run the inference:

modal run modal_yolo.py

Expected output:

✓ Initialized. View run at https://modal.com/apps/your-username/main/ap-xxxxxxxx
✓ Created objects.
├── 🔨 Created mount modal_yolo.py
└── 🔨 Created function predict.
Downloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26n.pt to 'yolo26n.pt'...
Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
image 1/1 /root/bus.jpg: 640x480 4 persons, 1 bus, 377.8ms
Speed: 5.8ms preprocess, 377.8ms inference, 0.3ms postprocess per image at shape (1, 3, 640, 480)

Detected 5 objects:
  - bus: 0.92
  - person: 0.91
  - person: 0.91
  - person: 0.87
  - person: 0.53
✓ App completed.

You can monitor your function execution in the Modal dashboard:

Modal Dashboard Function Calls

Using GPU for Faster Inference#

Add a GPU to your function by specifying the gpu parameter:

@app.function(image=image, gpu="T4")  # Options: "T4", "A10G", "A100", "H100"
def predict_gpu(image_url: str):
    """Run YOLO26 inference on GPU."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    results = model(image_url)
    print(results[0].boxes)

GPU	Memory	Best For
T4	16 GB	Inference, small model training
A10G	24 GB	Medium training jobs
A100	40 GB	Large-scale training
H100	80 GB	Maximum performance

For training, use a GPU and Modal Volumes for persistent storage. Create a new Python file called train_yolo.py:

import modal

app = modal.App("ultralytics-training")

volume = modal.Volume.from_name("yolo-training-vol", create_if_missing=True)

image = modal.Image.debian_slim(python_version="3.11").apt_install("libgl1", "libglib2.0-0").pip_install("ultralytics")

@app.function(image=image, gpu="T4", timeout=3600, volumes={"/data": volume})
def train():
    """Train YOLO26 model on Modal."""
    from ultralytics import YOLO

    model = YOLO("yolo26n.pt")
    model.train(data="coco8.yaml", epochs=3, imgsz=640, project="/data/runs")

@app.local_entrypoint()
def main():
    train.remote()

Run training:

modal run train_yolo.py

Volume Persistence

Modal Volumes persist data between function runs. Trained weights are saved to /data/runs/train/weights/.

Congratulations! You have successfully set up Ultralytics YOLO26 on Modal. For further learning:

Explore the Ultralytics YOLO26 documentation for advanced features
Learn about training custom models with your own datasets
Try the Docker Quickstart for containerized deployment
Visit the Modal documentation for advanced platform features

FAQ#

How do I choose the right GPU for my YOLO26 workload?#

For inference, an NVIDIA T4 (16 GB) is typically sufficient and cost-effective. For training or larger models like YOLO26x, consider A10G or A100 GPUs.

Modal uses pay-per-second pricing. Approximate rates: CPU ~$0.05/hr, T4 ~$0.59/hr, A10G ~$1.10/hr, A100 ~$2.10/hr. Check Modal pricing for current rates.

Can I use my own custom-trained YOLO model?#

Yes, you can run your own custom-trained YOLO model on Modal by loading the weights file from a Modal Volume:

model = YOLO("/data/my_custom_model.pt")

For more information on training custom models, see the training guide.

Contributors

RAraimbekovm² GLglenn-jocher¹

Created 3 months agoUpdated 2 weeks ago

Modal Quickstart Guide for Ultralytics#

What is Modal?#