Skip to content

Installation

Prerequisites

  • Python 3.12+
  • CUDA-capable GPU (tested on RTX 3080, RTX 4090, RTX 5090)
  • For live capture: iPhone with Calabi Lens or Intel RealSense D435i + RTAB-Map
  • For demo/replay: no hardware needed

Install RTSM

# Core only (query client, REST API, data contracts — no GPU needed)
pip install rtsm

# With GPU perception pipeline (default: Grounding DINO + SAM2, Apache 2.0)
pip install "rtsm[gpu]" --extra-index-url https://download.pytorch.org/whl/cu128

# With 3D visualization
pip install "rtsm[gpu,viz]" --extra-index-url https://download.pytorch.org/whl/cu128

# Everything
pip install "rtsm[all]" --extra-index-url https://download.pytorch.org/whl/cu128

CUDA Version

The --extra-index-url flag tells pip which PyTorch build to use. Replace cu128 with your CUDA version:

  • CUDA 11.8: https://download.pytorch.org/whl/cu118
  • CUDA 12.1: https://download.pytorch.org/whl/cu121
  • CUDA 12.8: https://download.pytorch.org/whl/cu128

Option B: from source (development)

git clone https://github.com/calabi-inc/rtsm.git
cd rtsm
pip install -e ".[gpu,viz]" --extra-index-url https://download.pytorch.org/whl/cu128

Option C: faster backends (AGPL, opt-in)

The default backends (Grounding DINO + SAM2) are Apache 2.0 licensed. For faster inference using FastSAM + YOLOE (AGPL-3.0 via ultralytics), install the opt-in extra:

pip install "rtsm[gpu-ultralytics]" --extra-index-url https://download.pytorch.org/whl/cu128

Then set backend: dual in your config. See Configuration for details.


Install Extras

Extra What it adds License
gpu Grounding DINO, SAM2, CLIP, torch, FAISS Apache 2.0
gpu-ultralytics FastSAM, YOLOE (via ultralytics) AGPL-3.0
viz Open3D, matplotlib (3D visualization) MIT
mcp MCP server + httpx (AI agent integration) Apache 2.0
all gpu + viz + mcp Mixed

Model Weights

Permissive backends (default)

All models auto-download on first run via HuggingFace Hub. No manual setup needed.

Model HuggingFace ID Size Auto-download
Grounding DINO (tiny) IDEA-Research/grounding-dino-tiny ~340MB Yes
SAM2.1 Hiera (small) facebook/sam2.1-hiera-small ~160MB Yes
CLIP ViT-B/32 openai/clip-vit-base-patch32 (via open_clip) ~340MB Yes

First run will download ~840MB of model weights to your HuggingFace cache (~/.cache/huggingface/). Subsequent runs use the cache.

AGPL backends (opt-in)

Model Source Size Auto-download
FastSAM-x ultralytics (auto-downloads) ~140MB Yes
YOLOE-26s-seg ultralytics (auto-downloads) ~50MB Yes

Pre-download (optional)

To avoid download delays on first run:

# Pre-download permissive models
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
AutoModelForZeroShotObjectDetection.from_pretrained("IDEA-Research/grounding-dino-tiny")

from sam2.build_sam import build_sam2
# SAM2 downloads automatically via sam2 package

import open_clip
open_clip.create_model_and_transforms("ViT-B-32", pretrained="openai")

Verify Installation

# Check import works
python -c "import rtsm; print('RTSM ready')"

# Check GPU pipeline loads (requires [gpu] extra)
python -c "from rtsm.models.segmentation import get_segmenter; print('Segmentation ready')"

Docker

# GPU pipeline (requires nvidia-docker)
docker run --gpus all calabi/rtsm demo

Docker Images

Pre-built Docker images are planned for a future release. For now, use the Dockerfiles in tests/ for reference:

  • tests/Dockerfile.deptest — Dependency verification (core, GPU, full)
  • tests/Dockerfile.gpu-test — GPU pipeline replay test

Install MCP Support (Optional)

To expose RTSM as an MCP tool server for AI agents (Claude, Cursor, LangGraph, etc.):

pip install "rtsm[gpu,mcp]" --extra-index-url https://download.pytorch.org/whl/cu128

See MCP (AI Agents) for setup and configuration.


Next Steps